Document Type : Research Paper

Authors

1 Associate Professor, Department of Soil Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran

2 Professor, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

3 Researcher, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

Abstract

Introduction: Machine learning algorithms usually do not consider spatial autocorrelation in soil data, unless it is perspicuity specified. Machine learning algorithms that compute autocorrelated observations have been recently formulated, such as geographic random forest (Georganos et al., 2019), or spatial ensemble techniques (Jiang et al., 2017). In theory, if we include all relevant environmental variables to model a soil property or class, there should be no spatial autocorrelation in the residuals of the fitted models. If this happens, some important predictors are likely to be missed. Despite the availability of the data set and the care taken during modeling, residual autocorrelation is still likely to occur. Several researchers have suggested the use of spatial alternative covariates as an indicator of spatial location in the SCORPAN model. The most common alternative is to use geographic coordinates (east and north) as covariates in the model, which leads to synthetic maps, especially when used in combination with tree-based algorithms. On the other hand, distance maps from observation locations are proposed by Hengl et al. (2018). Distance maps to observation locations usually do not have a clear meaning in terms of soil processes in an area (e.g., distance from a river). In the field of digital soil mapping, the current use of distance maps is not satisfactory for several reasons. The presence of pseudo-covariates with a set of covariates related to pedology is not very useful because it prevents the analysis of residuals and the creation of new hypotheses from these residuals. It also hinders the interpretation of the most important key predictors. Finally, pseudo-distance covariates may be well integrated into multiple pedology-related covariates, making them better predictors or masking the effect of pedology-related covariates. In spatial ecology, spatial eigen-vector maps, spatial filters or trend level regression replace distance maps in reducing or eliminating spatial autocorrelation (Kuhn et al., 2009). The purpose of this study is first to detect and calculate the spatial autocorrelation in the soil data. In the second step, it is going to develop a non-spatial model without considering the spatial autocorrelation, then to extract the spatial eigenvectors as an index of the spatial autocorrelation, and finally to use them as independent variables in spatial modeling.
Materials and Methods: In this study, the soil salinity data utilized of 297 soil samples from a section of the Qazvin plain. The first and second derivatives of a digital elevation model as topography factors, remote sensing indices, parent material map, geoform map, and annual average temperature and rainfall maps were used to select the most important auxiliary variables. Finally, in order to select the best and most relevant environmental variables for modeling, the correlation between these variables and the dependent variable i.e. soil salinity in 297 study points was used using FSelector package of R software. Moran's I and Jerry's C indices were used to evaluate the spatial autocorrelation of soil data. First, the non-spatial ordinary least square (OLS) model was fitted to predict the spatial distribution of soil salinity. At this stage, spatial autocorrelation was not considered. Then spatial regression was fitted by calculating spatial filters through spatial eigenvectors as independent variables. Finally, the comparison of the outputs of the non-spatial OLS model and the spatial regression model was done with criteria such as R2, Akaike information criteria (AIC), autocorrelation of residuals and root mean of square error (RMSE).
Results and Discussion: Statistical analysis indicated the high variability of soil salinity in the study area (coefficient of variation or CV more than 35%). Also, soil salinity shows high skewness and kurtosis, indicating its abnormal distribution. The high variability of this soil characteristic emphasizes the interaction of complex and numerous factors, including soil forming processes and different management strategies. The most important variables selected based on the correlation analysis include elevation, Multi-resolution Valley Bottom Flatness (MrVBF), wetness index, drainage basin, greenness index, normalized differential vegetation index (NDVI) and the corrected and transformed vegetation index (CTVI). A total of 7 variables were selected, which include four topography variables and three remote sensing variables. Among the topographical variables, the MrVBF had the most importance (correlation: 0.70). The spatial distribution map of soil salinity shows that the soil salinity is low in the northern, northeastern and northwestern parts towards the center of the studied area. The highest amount of salinity is found in the southern and southeastern regions. Moran's I and Jerry's C indices were 0.57 and 0.4, respectively. Based on both indices, soil salinity in the study area exhibits spatial autocorrelation. In the spatial regression model, by considering spatial autocorrelation, compared to the non-spatial model, the results are improved. By considering the spatial autocorrelation, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals and RMSE decreased. The distribution maps of residuals from the non-spatial OLS model and the spatial regression model differ in terms of the spatial sign of the residuals and the spatial autocorrelation distribution that can be recognized in the form of clusters. Clusters (red or blue) indicate the presence of spatial autocorrelation in the residuals. In the distribution map of the residuals of the non-spatial model, more and larger clusters (marked with green ovals) are identified, indicating the existence of spatial autocorrelation in the residuals of the model. The presence of spatial autocorrelation in the residuals of a model shows that the model is not able to remove the spatial dependence, which may be due to not considering an important auxiliary variable in the modeling.
Conclusion: This study was conducted in order to investigate the effect of spatial autocorrelation on the results of soil salinity modeling. Soil salinity prediction was done by non-spatial OLS model (without considering spatial autocorrelation) and spatial regression model (with spatial autocorrelation considered). The results indicated the improvement of the performance of the spatial regression model compared to the non-spatial ordinary least squares model. In the spatial model, considering the spatial autocorrelation as a covariate, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals, and RMSE decreased.

Keywords

Main Subjects

1. Abdel-Kader, F.H. 2011. Digital soil mapping at pilot sites in the northwest coast of Egypt: a multinomial
logistic regression approach. Egypt. Journal of Remote Sensing, Space Science 29–40.
2. Allbed, A., and Kumar, L. 2013. Soil Salinity Mapping and Monitoring in Arid and Semi-Arid Regions
Using Remote Sensing Technology: A Review. Advances in Remote Sensing, 2, 373–385.
3. Behrens, T., Schmidt, K., Viscarra-Rossel, R. A., Gries, P., Scholten, T., and MacMillan, R. A. 2018b.
Spatial modelling with Euclidean distance fields and machine learning. European Journal of Soil Science,
69, 757-770.
4. Bjørn Møller A., Beucher A.M., Pouladi N., and Humlekrog Greve M. 2020. Oblique geographic
coordinates as covariates for digital soil mapping. SOIL, 6, 269–289.
5. Borcard, D., and Legendre, P. 2002. All-scale spatial analysis of ecological data by means of principal
coordinates of neighbour matrices. Ecological Modeling, 153, 51–68.
6. Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., and
Böhner, J. 2015. System for automated geoscientific analyses (SAGA) v. 2.1.4. Geoscientific Model
Development, 8(7), 1991-2007.
7. Davies B.E, and Gamm S.A. 1970. Trend surface analysis applied to soil reaction values from Kent,
England. Geoderma 3(3), 223-231.
8. Diniz-Filho, J.A., and Bini, L.M. 2005. Modelling geographical patterns in species richness using
eigenvector-based spatial filters. Global Ecology and Biogeography, 14, 177-185.
9. Dormann, C., F., McPherson, J.M., Arau´jo, M.B., Bivand, R., Bolliger, J., Carl, G., Davies, R.G., Hirzel,
A., Jetz, W., Kissling, W.D., Kuhn, I., Ohlemuller, R., Peres-Neto, P.R., Reineking, B., Schroder, B., Schurr,
F.M., and Wilson, R. 2007. Methods to account for spatial autocorrelation in the analysis of species
distributional data: a review. Ecography, 30, 609-628.
10. Gallant, J.C., and Dowling, T.I. 2003. A multiresolution index of valley bottom flatness for mapping
depositional areas. Water Resource Research. 39, 1347–1359.
11. Geary, R.C. 1954. The Contiguity Ratio and Statistical Mapping. The Incorporated Statistician. 5 (3), 115–
145.
12. Georganos, S., Grippa, T., Gadiaga, A.N., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N.O., Wol, E.,
and Kalogirou, S. 2019. Geographical random forests: A spatial extension of the random forest algorithm to
address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 1, 1-12.
13. Griffith, D.A. 2013. Spatial Autocorrelation and Spatial Filtering: Gaining Understanding through Theory
and Scientific Visualization. Springer Science and Business Media.
14. Haining, R.P. 2001. Spatial Autocorrelation. International Encyclopedia of the Social and Behavioral
Sciences. 14763-14768.
15. Hair, J.F., Hult, G.T.M., Ringle, C.M., and Sarstedt, M. 2022. A Primer on Partial Least Squares Structural
Equation Modeling (PLS-SEM) (3rd ed.). Thousand Oaks, CA: Sage.
16. Harnett, P.R., Mountain, G.D., Barnett, M. E. 1978. Spatial filtering applied to remote sensing imagery.
Optica Acta, 25, 801-809.
17. Hawkins, B.A. 2012. Eight (and a half) deadly sins of spatial analysis. Journal of Biogeography, 39:1-9.
18. Hengl, T., Nussbaum, M., Wright, M.N., Heuvelink, G.B.M., and Graler, B. 2018. Random forest as a
generic framework for predictive modeling of spatial and spatiotemporal variables. PeerJ, 6, e5518.
19. Hu, C., Wright, A.L., and Lian, G. 2019. Estimating the Spatial Distribution of Soil Properties Using
Environmental Variables at a Catchment Scale in the Loess Hilly Area, China. International Journal of
Environmental Research and Public Health, 16(3), 491.
20. Inakwu O.A.O., Crawford M., and McBratney A. 2006. Digital Mapping of Soil Attributes for Regional and
Catchment Modelling, using Ancillary Covariates, Statistical and Geostatistical Techniques. Developments
in Soil Science, Chapter 32. Volume 31, pp: 437-453.
21. Jiang, Z., Li, Y., Shekhar, S., Rampi, L., and Knight, J. 2017. Spatial ensemble learning for heterogeneous
geographic data with class ambiguity: A summary of results. In Proceedings of the 25th ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems 23 (pp. 1-10). ACM.
22. Khamoshi, S.E., Sarmadian, F., and Omid, M. 2023. Predicting and Mapping of Soil Organic Carbon Stock
Using Machin Learning Algorithm, Iranian Journal of Soil and Water Research, 53 (11), 2671-2681. (in
Persian with English abstract)
23. Khamoshi, S.E., Sarmadian, F., Keshavarzi, A. 2019. Digital soil mapping Using Random Forests and Land
Suitability Evaluation for Abyek Region, Qazvin Province. Journal of Range and Watershed Management.
71, 885-899. (in Persian with English abstract)
24. Kim, D., Hirmas, D.R., McEwan, R.W., Mueller, T.G., Park, S.J., Šamonil, P., Thompson, J.A., and
Wendroth, O. 2016. Predicting the influence of multi-scale spatial autocorrelation on soil–landform
modeling. Soil Science Society of American Journal, 80: 409–419.
25. Krivoruchko, K., and Gribov, A. 2019. Evaluation of empirical Bayesian kriging. Spatial Statistics,
32,100368.
26. Kuhn, I., and Dormann, C.F. 2012. Less than eight (and a half) misconceptions of spatial analysis. Journal of
Biogeography, 39: 995-998.
27. Kuhn, I., Nobis, M.P., and Durka, W. 2009. Combining spatial and phylogenetic eigenvector filtering in trait
analysis. Global Ecology and Biogeography, 18, 745-758.
28. Kpade O.L.H., Stendahl J., Lundblad M., and Karltun E. 2021. Predicting the spatial distribution of soil
organic carbon stock in Swedish forests using a group of covariates and site-specific data. SOIL, 7, 377–398.
29. Lagacherie, P., and McBratney, A. 2006. Spatial soil information systems and spatial soil inference systems:
perspectives for digital soil mapping. Developments in Soil Science, 31, 3-22.
30. Momtazi Burojeni, M., and Sarmadian, F. 2023. Spatial prediction of soil classes using C5.0 boosted
decision tree model Abyek Area. Journal of Range and Watershed Management, 75 (4), 553 – 572. (in
Persian with English abstract)
31. Moran, P.A.P. 1950. Notes on Continuous Stochastic Phenomena. Biometrika. 37 (1), 17–23.
32. Mousavi, S.R., Sarmadian, F., Omid, M., and P. Bogaert. 2021. Digital Modeling of Three-Dimensional Soil
Salinity Variation Using Machine Learning Algorithms in Arid and Semi-Arid lands of Qazvin Plain. Iranian
Journal of Soil Research, 52, 1915-1929. (in Persian with English abstract)
33. Mousavi, S.R., Sarmadian, F., Omid, M., and P. Bogaert. 2021. Modeling the Vertical Soil Calcium
Carbonate Equivalent Variation by Machine Learning Algorithms in Qazvin Plain. Journal of Water and
Soil. 35, 719-734. (in Persian with English abstract)
34. Mousavi, S.R., Sarmadian, F., Omid, M., and P. Bogaert. 2022. Application of Machine Learning Models in
Spatial Estimation of Soil Phosphorus and Potassium in Some Parts of Abyek Plain. Iranian Journal of Soil
Research, 35, 397-411. (in Persian with English abstract)
35. Mousavi, S.R., Sarmadian, F., Angelini, M. E., Bogaert, P., and Omid, M. 2023. Cause-effect relationships
using structural equation modeling for soil properties in arid and semi-arid regions. Catena, 232, 107392.
36. Mousavi, S.R., Sarmadian, F., Omid, M., and Bogaert, P. 2022. Three-dimensional mapping of soil organic
carbon using soil and environmental covariates in an arid and semi-arid region of Iran. Measurement, 201,
111706.
37. Mousavi, S.R., Sarmadian, F., Rahmani, A. 2020. Modelling and Prediction of Soil Classes Using Boosting
Regression Tree and Random Forests Machine Learning Algorithms in Some Part of Qazvin Plain. Iranian
Journal of Soil and Water Research, 50, 2525-2538. (in Persian with English abstract)
38. Neyestani, M., Sarmadian, F., Jafari, A., Keshavarzi, A., and Sharififar, A. 2021. Digital mapping of soil
classes using spatial extrapolation with imbalanced data. Geoderma Regional, 26, e00422.
39. Nield, S.J., Boettnger, J.L., and Ramsey, R.D. 2007. Digital mapping gypsic and nitric soil areas using
Landsat ETM data. Soil Science Society of America Journal 71:245–252.
40. Rahmani, A., Sarmadian, F., and Arefi, H. 2023. Digital modeling and prediction of soil subgroup classes
using deep learning approach in a part of arid and semi-arid lands of Qazvin Plain, Iranian Journal of Soil
and Water Research, 53 (11), 2477-2499. (in Persian with English abstract)
41. Rasaei, Z., Rossiter, D.G., and Farshad, A. 2020. Rescue and renewal of legacy soil resource inventories in
Iran as an input to digital soil mapping. Geoderma regional, 21, e00262.
42. Rezaie, G., Sarmadian, F., Mohammadi Torkashvand, A., Seyedmohammadi, J., and Marashi Aliabadi M.
2023. Digital Mapping of Surface and Subsurface Soil Organic Carbon and Soil Salinity Variation in a Part
of Qazvin Plain (Case Study: Abyek and Nazarabad Regions). Journal of Water and Soil, 37, 315-331. (in
Persian with English abstract)
43. Richardson, A. J., and Wiegand, C. L. 1977. Distinguishing vegetation from soil background information.
Photogrammetric engineering and remote sensing, 43(12), 1541-1552.
44. Romanski, P., Kotthoff, L., and Schratz, P. 2023. FSelector R Package. CRAN.
45. Schoeneberger, P.J., Wysocki, D.A., Benham, E.C., and Soil Survey Staff. 2021. Field book for describing
and sampling soils, Version 3.0. Natural Resources Conservation Service, National Soil Survey Center,
Lincoln, NE.
46. Selmy, S., Abd El-Aziz, S., El-Desoky, A., and El-Sayed, M. 2022. Characterizing, predicting, and mapping
of soil spatial variability in Gharb El-Mawhoub area of Dakhla Oasis using geostatistics and GIS
approaches. Journal of the Saudi Society of Agricultural Sciences, 21(6), 383-3967
47. Sinha, P., Gaughan, A.E., Stevens, F.R., Nieves, J.J., Sorichetta, A., and Tatem, A.J. 2019. Assessing the
spatial sensitivity of a random forest model: Application in gridded population modeling. Computers,
Environment and Urban Systems, 75, 132-145.
48. Soil Survey Staff. 2022. Keys to Soil Taxonomy, 13th ed. USDA-Natural Resources Conservation Service.
49. Van Wambeke, A.R. 2000. The Newhall Simulation Model for estimating soil moisture & temperature
regimes. Conservation Service: Department of Crop and Soil Sciences Cornell University, Ithaca, NY USA