Document Type : Applicable

Authors

1 Former PhD Student, Department of Soil Science, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Ahvaz, Iran

2 Professor,Department of Soil Science, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Ahvaz,, Iran

3 Professor,Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz,, Ahvaz, Iran

10.22055/agen.2024.47815.1740

Abstract

Introduction: Knowledge about the spatial distribution of soil organic carbon (SOC) is one of the practical tools in determining sustainable land management strategies. Estimation of carbon contents and stocks are important for carbon sequestration, greenhouse gas emissions and national carbon balance inventories. Accurate mapping of SOC’s spatial distribution is a key assumption for soil resource management and land use planning. During the last two decades, the utilization of data mining approaches in spatial modeling of SOC using machine learning algorithms have been widely taken into consideration. The digital environment needs to have soil continuous maps at local and regional scales. However, such information is always not available at the required scale. Therefore, DSM approach is a key solution for quantifying and assessing the variation of soil properties such as SOC using remotely sensed indices and digital elevation model (DEM) as the most commonly useful ancillary data for soil organic carbon prediction. In this way, the data mining techniques is the pathway to create digital soil maps. Therefore, this study was carried out to compare the two common machine learning algorithms including random forest and multiple linear regressions in digital mapping of surface SOC in the Semirom County, Isfahan province. The digital maps of SOC using the two above-mentioned algorithms were also created and the most important variables affecting the distribution of SOC in the study area reported.



Materials and Methods: A total number of 200 surface soil samples (0-10 cm) were collected from the Semirom area (51º 17' - 52º 3' E; 30º 42' - 31º 51' N), Isfahan, Iran. Based on the synoptic meteorological station reports, the annual average temperature was in the range of 7.5-12.5 ▫C, the annual precipitation ranged between 350-450 mm. Soil moisture and temperature regimes are Xeric and Mesic, respectively. Then, using the Global Positioning System (GPS), sampling was done from the soil surface layer (0-10 cm). The preparation of soil samples includes air drying, pounding and softening of the collected samples performed, and then the samples were passed through a 2 mm sieve. Then, the amount of organic carbon in the samples was determined utilizing the Walkley-Black method. Also, in order to evaluate the effect of other soil properties on the organic contents of the soils, laboratory analyzes including saturated soil moisture content, soil texture, soil pH in saturated pastes, electrical conductivity of the soil saturation extracts and the calcium carbonate equivalent of the soils were measured utilizing standard laboratory protocols.

In this research, auxiliary variables including terrain parameters and vegetation indices were derived from digital elevation model (DEM) and the Landsat 8 OLI satellite images employing ArcMap version 10.4.10 and SAGAGIS version 6.0.4. Then, all auxiliary layers were converted to raster format using the “raster” package and merged with each other using the “Covstack” function. Afterwards, the values of the all environmental covariates at each sampling point were extracted in a single file using the “extract” function of the “sp” package in the RStudio environment. Then, using SPSS software v.19 and the principal component analysis (PCA) method, among the 29 auxiliary variables used in this research, the most important auxiliary covariates were used in the modeling process. The dataset were then split into two groups referred to as calibration (80%) and validation (20%) subsets. Finally, SOC contents of the soils were predicted and mapped using multiple linear regression (MLR) and random forest (RF) algorithms in RStudio environment. MLR and RF algorithms were run employing “lm” and “randomForest” packages, respectively. Five different statistics was used for evaluating the performance of each model including the coefficient of determination (R2), bias, root mean square error (RMSE), nRMSE, and mean bias error (MBE).



Results and Discussion

Based on the descriptive analysis of the soil samples, soils of the study area were characterized as non-saline, alkaline, and calcareous soils. The SOC contents of the soils ranged from 0.3 % to 2.2% with the mean value of 0.89 %. The coefficient of variation for the SOC contents was 21.7%, based on which soils of the study area are classified as soils with the moderate variability considering the values proposed by Wilding (1985). The results of PCA showed that the most important auxiliary variables could be used for the modeling process are slope aspect, channels network base level, catchment slope, total curvature, height, longitudinal curvature, mass balance index, modified catchment area, slope degree, slope length, topographic position index, vertical distance to channel network, soil adjusted vegetation index, transformed vegetation index, difference vegetation index, ratio vegetation index, and general curvature.These variables explained 80% of the total variance over the study area. The comparison of the two different SOC prediction models, demonstrated that the RF model (ntree =1000 and mtry =10) with the R2, RMSE, nRMSE, and bias values of 0.79, 0.12, 0.13, and 0.002 respectively, had a better performance rather than MLR model in this study. The first five very important variables detected by RF algorithm to predict SOC contents over the study area were transformed vegetation index, ration vegetation index, soil adjusted vegetation index, and slope degree. The final map of the surface SOC distribution over the study area shows that although the estimates made by the RF algorithm have provided better estimates compared with the MLR model, but caused overestimation and/or underestimation in predicting the minimum and maximum values of the surface SOC contents, respectively.



Conclusion

The results of this study showed the better performance of the RF regression algorithm due to its ability to take into account the nonlinear and complex relationships between SOC contents and the environmental covariates compared to the MLR method.

Keywords

Main Subjects