Document Type : Applicable

Authors

1 Associate Professor, Department of Soil Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran

2 Professor, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

3 Researcher, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

Abstract

Introduction: Machine learning algorithms usually do not consider spatial autocorrelation in soil data, unless it is perspicuity specified. Machine learning algorithms that compute autocorrelated observations have been recently formulated, such as geographic random forest (Georganos et al., 2019), or spatial ensemble techniques (Jiang et al., 2017). In theory, if we include all relevant environmental variables to model a soil property or class, there should be no spatial autocorrelation in the residuals of the fitted models. If this happens, some important predictors are likely to be missed. Despite the availability of the data set and the care taken during modeling, residual autocorrelation is still likely to occur. Several researchers have suggested the use of spatial alternative covariates as an indicator of spatial location in the SCORPAN model. The most common alternative is to use geographic coordinates (east and north) as covariates in the model, which leads to synthetic maps, especially when used in combination with tree-based algorithms. On the other hand, distance maps from observation locations are proposed by Hengl et al. (2018). Distance maps to observation locations usually do not have a clear meaning in terms of soil processes in an area (e.g., distance from a river). In the field of digital soil mapping, the current use of distance maps is not satisfactory for several reasons. The presence of pseudo-covariates with a set of covariates related to pedology is not very useful because it prevents the analysis of residuals and the creation of new hypotheses from these residuals. It also hinders the interpretation of the most important key predictors. Finally, pseudo-distance covariates may be well integrated into multiple pedology-related covariates, making them better predictors or masking the effect of pedology-related covariates. In spatial ecology, spatial eigen-vector maps, spatial filters or trend level regression replace distance maps in reducing or eliminating spatial autocorrelation (Kuhn et al., 2009). The purpose of this study is first to detect and calculate the spatial autocorrelation in the soil data. In the second step, it is going to develop a non-spatial model without considering the spatial autocorrelation, then to extract the spatial eigenvectors as an index of the spatial autocorrelation, and finally to use them as independent variables in spatial modeling.
Methods and Materials: In this study, the soil salinity data utilized of 297 soil samples from a section of the Qazvin plain. The first and second derivatives of a digital elevation model as topography factors, remote sensing indices, parent material map, geoform map, and annual average temperature and rainfall maps were used to select the most important auxiliary variables. Finally, in order to select the best and most relevant environmental variables for modeling, the correlation between these variables and the dependent variable i.e. soil salinity in 297 study points was used using FSelector package of R software. Moran's I and Jerry's C indices were used to evaluate the spatial autocorrelation of soil data. First, the non-spatial ordinary least square (OLS) model was fitted to predict the spatial distribution of soil salinity. At this stage, spatial autocorrelation was not considered. Then spatial regression was fitted by calculating spatial filters through spatial eigenvectors as independent variables. Finally, the comparison of the outputs of the non-spatial OLS model and the spatial regression model was done with criteria such as R2, Akaike information criteria (AIC), autocorrelation of residuals and root mean of square error (RMSE).
Results and Discussion: Statistical analysis indicated the high variability of soil salinity in the study area (coefficient of variation or CV more than 35%). Also, soil salinity shows high skewness and kurtosis, indicating its abnormal distribution. The high variability of this soil characteristic emphasizes the interaction of complex and numerous factors, including soil forming processes and different management strategies. The most important variables selected based on the correlation analysis include elevation, Multi-resolution Valley Bottom Flatness (MrVBF), wetness index, drainage basin, greenness index, normalized differential vegetation index (NDVI) and the corrected and transformed vegetation index (CTVI). A total of 7 variables were selected, which include four topography variables and three remote sensing variables. Among the topographical variables, the MrVBF had the most importance (correlation: 0.70). The spatial distribution map of soil salinity shows that the soil salinity is low in the northern, northeastern and northwestern parts towards the center of the studied area. The highest amount of salinity is found in the southern and southeastern regions. Moran's I and Jerry's C indices were 0.57 and 0.4, respectively. Based on both indices, soil salinity in the study area exhibits spatial autocorrelation. In the spatial regression model, by considering spatial autocorrelation, compared to the non-spatial model, the results are improved. By considering the spatial autocorrelation, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals and RMSE decreased. The distribution maps of residuals from the non-spatial OLS model and the spatial regression model differ in terms of the spatial sign of the residuals and the spatial autocorrelation distribution that can be recognized in the form of clusters. Clusters (red or blue) indicate the presence of spatial autocorrelation in the residuals. In the distribution map of the residuals of the non-spatial model, more and larger clusters (marked with green ovals) are identified, indicating the existence of spatial autocorrelation in the residuals of the model. The presence of spatial autocorrelation in the residuals of a model shows that the model is not able to remove the spatial dependence, which may be due to not considering an important auxiliary variable in the modeling.
Conclusion: This study was conducted in order to investigate the effect of spatial autocorrelation on the results of soil salinity modeling. Soil salinity prediction was done by non-spatial OLS model (without considering spatial autocorrelation) and spatial regression model (with spatial autocorrelation considered). The results indicated the improvement of the performance of the spatial regression model compared to the non-spatial ordinary least squares model. In the spatial model, considering the spatial autocorrelation as a covariate, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals, and RMSE decreased.

Keywords

Main Subjects