Imputing Missing Values Using Support Variables with Application to Barley Grain Yield

Authors
Department of Mathematics, Eastern Mediterranean University, Mağusa, North Cyprus.
Abstract
Missing values in a data set is a widely investigated problem. In this study, we propose the use of support variables that are closely associated with the variable of interest for the imputation of missing values. Level of association or relationship between the variable of interest and support variables is determined before they are included in the imputation process. In this study, the barley (Hordeum vulgare) grain yield in the semi-arid conditions of Cyprus was used as a case study. Monthly rain, monthly average temperature, and soil organic matter ratio were selected as support variables to be used. Multivariate regression employing support variables, bivariate, kernel regression and Markov Chain Monte Carlo techniques were employed for the imputation of missing values. Obtained results indicated a better performance using multivariate regression with support variables, compared with those obtained from other methods.

Keywords

Subjects


1. Adekanmbi, O. and Olugbarab, O. 2015. Multiobjective Optimization of Crop-Mix Planning Using Generalized Differential Evolution Algorithm. J. Agr. Sci. Tech., 17: 1103–1114.
2. Afifi, A. A. and Elashoff, R. M. 1966. Missing Observations in Multivariate Statistics. I: Review of the Literature. J. Amer. Stat. Assoc., 61: 595-604.
3. Anderson, T. W. 1957. Maximum Likelihood Estimates for a Multivariate Normal Distribution When some Observations Are Missing. J. Amer. Stat. Assoc., 52: 200-203.
4. Cantero-Martinez, C., Villar, J. M., Romagosa, I. and Fereres, E. 1995. Growth and Yield Responses of Two Contrasting Barley Cultivars in a Mediterranean Environment. Eur. J. Agron., 4(3): 317-326.
5. Chiu, S. T. 1991. Bandwidth Selection for Kernel Density Estimation. Annal. Stat., 19(4): 1883 – 1905.
6. Copt, S. and Feser, M. V. 2003. Fast Algorithms for Computing High Breakdown Covariance Matrices with Missing Data. Report No 2003.04. Cahiers du Département d’Econométrie Faculté des Sciences Economiques et Sociales Université de Genève.
7. Derici, M. R., Kapur, S.A., Kaya, Z., Gök, M. and Ortas, İ. 2000. Kuzey Kıbrıs Türk Cumhuriyeti Detaylı Toprak Etüd ve Haritalama Projesi. North Cyprus State Printing House.
8. Ebrahimian, H. and Playan, E. 2014. Optimum Management of Furrow Fertigation to Maximize Water and Fertilizer Application Efficiency and Uniformity. J. Ag. Sci. Tech., 16: 591 – 607.
9. Edgett, G. L. 1956. Multiple Regression with Missing Observations among the Independent Variables. J. Amer. Stat. Ass., 51: 122-131.
10. Haerdle, W. 2004. Applied Nonparametric Regression. Economic Society Monographs. Institut für Statistik und Ökonometrie, Wirtschaftswissenschaftliche Fakultat, Humboldt-Universitat zu Berlin, Spandauer Str. 1, D-10178 Berlin.
11. Hossain, A., Teixeira da Silvia, J. A., Lozovskaya, M. V., Zvolinsky, V. P. and Mukhortov, V. I. 2012. High Temperature Combined with Drought Affect Rainfed Spring Wheat and Barley in South-Eastern Russia: Yield, Relative Performance and Heat Susceptibility Index. J. Plant Breed. Crop Sci., 4(11): 184 -196.
12. Jinubala, V. and Lawrance, R. 2016. Analysis of Missing Data and Imputation on Agriculture Data Using Predictive Mean Matching Method. Int. J. Sci. App. Info. Tech., 5(1): 1-4.
13. Johnston, J. 2011. The Essential Role of Soil Organic Matter in Crop Production and the Efficient Use of Nitrogen and Phosphorus. Better Crops with Plant Food. Int. Plant Nutr. Inst., 95(4): 9 -11.
14. Little, J. A. 1992. Regression with Missing X’s: A Review. J. Amer. Stat. Assoc., 87(420): 1227 – 1237.
15. Lopez, M. V. and Arrue, J. L. 2005. Growth, Yield and Water Use Efficiency of Winter Barley in Response to Conservation Tillage in Semi-Arid Region of Spain. Spanish National Research Council, Departamento de Edafología, Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (CSIC), POB 202, 50080-Zaragoza (Spain)
16. Nahar, K., Ahamed, K. U. and Fujita, M. 2010. Phenological Variation and Its Relation with Yield in Several Wheat (Triticum aestivum L.) Cultivars under Normal and Heat Stress Condition. Notulae Scientia Biologicae, 2(3): 51 - 56.
17. Qin, J., Zhang B. and Leung H. Y. 2009. Empirical Likelihood in Missing Data Problems. J. Amer. Stat. Assoc., 104(488): 1492 – 1502.
18. Quiroga, A., Funaro, D., Noellemeyer, E. and Peinemann, N. 2005. Barley Yield Response to Soli Organic Matter and Texture in the Pampas of Argentina. Soil Till. Res., 90: 63-68.
19. Ramsey, J. O. and Silverman, B. W. 2006. Functional Data Analysis. 2nd Edition, Springer.
20. Rubbin, D. B. 1976. Inference and Missing Data. Biometrica, 63: 581 – 592.
21. Robbins, M. W., Ghosh S. K. and Habiger J. D. 2013. Imputation in High Dimensional Economic Data as Applied to the Agricultural Resource Management Survey. J. Amer. Stat. Assoc., 108(501): 81 – 95.
22. Samarah, N. H. 2005. Effects of Drought Stress on Growth and Yield of Barley. Agro. fSust. Dev., 25: 145-149.
23. Schunk, D. 2008. A Markov Chain Monte Carlo Algorithm for Multiple Imputation in Large Surveys. American Statistical Association.. Adv. Stat. Analysis, 92: 101 – 114.
24. Silverman, B. W. 1998. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability, Chapman & Hall, Kernel Regression in the Imputation process.
25. Stine, M. A. and Weil, R. R. 2002. The Relationship between Soil Quality and Crop Productivity across Three Tillage Systems in South Central Honduras. Am. J. Alter. Agri., 17 (1): 2 – 8.
26. Tandoğdu, Y. and Camgöz, T. O. 1999. An Experimental Approach for Estimating Evapotranspiration. CIM Bull., 92: 55-60.
27. Tanner, M. A. and Wong, W. H. 1987. The Calculation of Posterior Distributions by Data Augmentation. J. Amer. Stat. Assoc., 82(398): 528 – 540.
28. Templ, M., Filzmoser, P. and Horn, K. 2009. Robust Imputation of Missing Values in Compositional Data Using the R Package. http://cran.salud.gob.sv/web/packages/robCompositions/vignettes/imputation.pdf
29. Toutenburg, H., Srivastava, V. K., Shalabh, and Heumann, C. 2005. Estimation of Parameters in Multiple Regression with Missing Covariates Using a Modified First Order Regression Procedure. Annal. Econ. Fin., 6: 289-301.
30. Trawinski, I. M. and Bargmann, R. E. 1964. Maximum Likelihood Estimation with Incomplete Multivariate Data. Annal. Math. Stat., 35: 647-657
31. Yozgatlıgil, C., Aslan, S., Iyigun, C. and Batmaz, I. 2013. Comparison of Missing Value Imputation Methods in Time Series: The Case of Turkish Meteorological Data. Theor. App. Clim., 112: 143–167.
32. Zhang, X., Song, X., Wang, H. and Zhang, H. 2008. Sequential Local Least Squares Imputation Estimating Missing Value of Microarray Data. Comp. Biol. Med., 38: 1112–1120.