نوع مقاله : مقاله پژوهشی

نویسندگان

1 دانشیار گروه علوم خاک، دانشکده کشاورزی، دانشگاه شهید باهنر کرمان، کرمان، ایران

2 استاد گروه علوم و مهندسی خاک، دانشکده کشاورزی، دانشکدگان کشاورزی و منابع طبیعی، دانشگاه تهران، کرج، ایران

3 پژوهشگر گروه علوم و مهندسی خاک، دانشکده کشاورزی، دانشکدگان کشاورزی و منابع طبیعی، دانشگاه تهران، کرج، ایران.

چکیده

داده‌های خاک مانند هر داده مکانی ممکن است دارای خودهمبستگی مکانی باشند. اگر این وابستگی مکانی در باقیمانده‌های یک مدل آماری مشاهده شود، یکی از فرضیه‌های کلیدی آنالیز آماری که شامل استقلال و توزیع یکنواخت باقیمانده‌ها است، نقض می‌شود. معمولا الگوریتم‌های یادگیری ماشین، خودهمبستگی مکانی در داده-های خاک را درنظر نمی‌گیرند. مطالعه حاضر سعی دارد خودهمبستگی مکانی را به عنوان یک متغیر مستقل در مدلسازی تخمین شوری خاک لحاظ کند و نتایج تخمین را بررسی کند. بدین منظور، شوری خاک سطحی در 297 نقطه در منطقه آبیک قزوین اندازه‌گیری شد و متغیرهای محیطی مهم انتخاب شدند. سپس یک مدل غیرمکانی حداقل مربعات معمولی و یک مدل رگرسیون مکانی بر داده‌های شوری خاک برازش داده شد. از دو شاخص موران و جری برای تشخیص خودهمبستگی مکانی استفاده گردید. نقشه توزیع مکانی شوری خاک در منطقه آبیک قزوین نشان می‌دهد در بخش‌های شمالی، شمال شرق و شمال غرب به سمت مرکز منطقه مورد مطالعه میزان شوری خاک کم بوده و بیشترین مقدار و محدودیت شوری در مناطق جنوب و جنوب شرقی دیده می‌شود. مقدار شاخص موران 57/0 و مقدار شاخص جری 4/0 به دست آمد که براساس هر دو شاخص، ویژگی شوری خاک در منطقه مورد نظر دارای خودهمبستگی مکانی است. با وارد کردن خودهمبستگی مکانی در مدل رگرسیون مکانی در مقایسه با مدل غیرمکانی نتایج پیش‌بینی بهبود یافت. با لحاظ کردن خودهمبستگی مکانی، مقدار R2 افزایش، درحالیکه مقادیر AIC، خودهمبستگی مکانی باقیمانده‌ها و RMSE کاهش یافت. به نظر می‌رسد ادغام خودهمبستگی مکانی در مدل‌سازی خصوصیات خاک امری ضروری است و باید در نظر گرفته شود.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Incorporation of spatial autocorrelation in soil salinity distribution modeling in Qazvin area

نویسندگان [English]

  • Azam Jafari 1
  • Fereydoon Sarmadian 2
  • Ahmad Heidari 2
  • Zahra Rasaei 3

1 Associate Professor, Department of Soil Science, Faculty of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran

2 Professor, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

3 Researcher, Soil Science Department, Faculty of Agricultural, University College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran

چکیده [English]

Introduction: Machine learning algorithms usually do not consider spatial autocorrelation in soil data, unless it is perspicuity specified. Machine learning algorithms that compute autocorrelated observations have been recently formulated, such as geographic random forest (Georganos et al., 2019), or spatial ensemble techniques (Jiang et al., 2017). In theory, if we include all relevant environmental variables to model a soil property or class, there should be no spatial autocorrelation in the residuals of the fitted models. If this happens, some important predictors are likely to be missed. Despite the availability of the data set and the care taken during modeling, residual autocorrelation is still likely to occur. Several researchers have suggested the use of spatial alternative covariates as an indicator of spatial location in the SCORPAN model. The most common alternative is to use geographic coordinates (east and north) as covariates in the model, which leads to synthetic maps, especially when used in combination with tree-based algorithms. On the other hand, distance maps from observation locations are proposed by Hengl et al. (2018). Distance maps to observation locations usually do not have a clear meaning in terms of soil processes in an area (e.g., distance from a river). In the field of digital soil mapping, the current use of distance maps is not satisfactory for several reasons. The presence of pseudo-covariates with a set of covariates related to pedology is not very useful because it prevents the analysis of residuals and the creation of new hypotheses from these residuals. It also hinders the interpretation of the most important key predictors. Finally, pseudo-distance covariates may be well integrated into multiple pedology-related covariates, making them better predictors or masking the effect of pedology-related covariates. In spatial ecology, spatial eigen-vector maps, spatial filters or trend level regression replace distance maps in reducing or eliminating spatial autocorrelation (Kuhn et al., 2009). The purpose of this study is first to detect and calculate the spatial autocorrelation in the soil data. In the second step, it is going to develop a non-spatial model without considering the spatial autocorrelation, then to extract the spatial eigenvectors as an index of the spatial autocorrelation, and finally to use them as independent variables in spatial modeling.
Methods and Materials: In this study, the soil salinity data utilized of 297 soil samples from a section of the Qazvin plain. The first and second derivatives of a digital elevation model as topography factors, remote sensing indices, parent material map, geoform map, and annual average temperature and rainfall maps were used to select the most important auxiliary variables. Finally, in order to select the best and most relevant environmental variables for modeling, the correlation between these variables and the dependent variable i.e. soil salinity in 297 study points was used using FSelector package of R software. Moran's I and Jerry's C indices were used to evaluate the spatial autocorrelation of soil data. First, the non-spatial ordinary least square (OLS) model was fitted to predict the spatial distribution of soil salinity. At this stage, spatial autocorrelation was not considered. Then spatial regression was fitted by calculating spatial filters through spatial eigenvectors as independent variables. Finally, the comparison of the outputs of the non-spatial OLS model and the spatial regression model was done with criteria such as R2, Akaike information criteria (AIC), autocorrelation of residuals and root mean of square error (RMSE).
Results and Discussion: Statistical analysis indicated the high variability of soil salinity in the study area (coefficient of variation or CV more than 35%). Also, soil salinity shows high skewness and kurtosis, indicating its abnormal distribution. The high variability of this soil characteristic emphasizes the interaction of complex and numerous factors, including soil forming processes and different management strategies. The most important variables selected based on the correlation analysis include elevation, Multi-resolution Valley Bottom Flatness (MrVBF), wetness index, drainage basin, greenness index, normalized differential vegetation index (NDVI) and the corrected and transformed vegetation index (CTVI). A total of 7 variables were selected, which include four topography variables and three remote sensing variables. Among the topographical variables, the MrVBF had the most importance (correlation: 0.70). The spatial distribution map of soil salinity shows that the soil salinity is low in the northern, northeastern and northwestern parts towards the center of the studied area. The highest amount of salinity is found in the southern and southeastern regions. Moran's I and Jerry's C indices were 0.57 and 0.4, respectively. Based on both indices, soil salinity in the study area exhibits spatial autocorrelation. In the spatial regression model, by considering spatial autocorrelation, compared to the non-spatial model, the results are improved. By considering the spatial autocorrelation, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals and RMSE decreased. The distribution maps of residuals from the non-spatial OLS model and the spatial regression model differ in terms of the spatial sign of the residuals and the spatial autocorrelation distribution that can be recognized in the form of clusters. Clusters (red or blue) indicate the presence of spatial autocorrelation in the residuals. In the distribution map of the residuals of the non-spatial model, more and larger clusters (marked with green ovals) are identified, indicating the existence of spatial autocorrelation in the residuals of the model. The presence of spatial autocorrelation in the residuals of a model shows that the model is not able to remove the spatial dependence, which may be due to not considering an important auxiliary variable in the modeling.
Conclusion: This study was conducted in order to investigate the effect of spatial autocorrelation on the results of soil salinity modeling. Soil salinity prediction was done by non-spatial OLS model (without considering spatial autocorrelation) and spatial regression model (with spatial autocorrelation considered). The results indicated the improvement of the performance of the spatial regression model compared to the non-spatial ordinary least squares model. In the spatial model, considering the spatial autocorrelation as a covariate, the value of R2 increased, while the values of AIC, spatial autocorrelation of the residuals, and RMSE decreased.

کلیدواژه‌ها [English]

  • covariates
  • machine learning
  • spatial and non-spatial modeling
  • spatial eigen vector mapping