نوع مقاله : کاربردی
نویسندگان
1 دانش آموخته دکتری گروه مهندسی علوم خاک، دانشکده کشاورزی، دانشگاه شهید چمران اهواز، اهواز، ایران
2 استاد گروه مهندسی علوم خاک، دانشکده کشاورزی ، دانشگاه شهید چمران اهواز، اهواز، ایران
3 استاد گروه سنجش ازدور و GIS، دانشکده علوم زمین، دانشکاه شهیدچمران اهواز، اهواز، ایران
چکیده
آگاهی از توزیع مکانی کربن آلی خاک گامی موثر در دستیابی به استفاده پایدار از اراضی و تعیین استرازیهای مدیریتی مربوط به آن است. از این رو، این مطالعه با هدف مدلسازی و نقشهبرداری رقومی کربن آلی خاک سطحی (10-0 سانتیمتری) شهرستان سمیرم با استفاده از روشهای رگرسیون جنگل تصادفی و رگرسیون خطی چند متغیره انجام شد. به این منظور200 نمونه خاک سطحی به صورت منظم و با فواصل نمونهبرداری 5 کیلومتر × 5 کیلومتر از سطح منطقه برداشت گردید و سپس کربن آلی نمونه ها با استفاده از روش واکلی- بلک اندازهگیری شد. در پایان، نقشه رقومی کربن آلی در خاک سطحی منطقه با روشهای مزبور و به کمک متغیرهای کمکی استخراج شده از مدل رقومی ارتفاع و تصاویر ماهوارۀ لندست 8 در محیط نرمافزار RStudio تهیه شد. یافته-های این مطالعه حاکی از آن است که الگوریتم جنگل تصادفی برای برآورد میزان کربن آلی خاک به ترتیب با مقادیر RMSE و R2 معادل 12/0 و 79/0 نسبت به روش رگرسیون خطی چندمتغیره با RMSE و R2 معادل 192/0 و 57/0پیشبینیهای بهتری ارائه داده است. نتایج نشان داد که مهمترین متغیرهای محیطی مؤثر بر توزیع کربن آلی خاک در منطقه مطالعاتی در مدلهای مورد استفاده یکسان نیستند. بهگونهای که در مدل جنگل تصادفی شاخصهای مستخرج از پوشش گیاهی و در رگرسیون خطی چندمتغیره شاخصهای توپوگرافی نقش بیشتری در توزیع کربن آلی داشته است. بررسی نقشه نهایی پراکنش کربن آلی خاک در منطقه مطالعاتی نشان داد که تخمینهای انجام شده با روش جنگل تصادفی اگرچه در مقایسه با روش رگرسیون خطی چندمتغیره تخمینهای بهتری را ارائه داده اما در تخمین مقادیر کمینه و بیشینه مقادیر کربن آلی سطحی خاکها موفق نبوده است.
کلیدواژهها
موضوعات
عنوان مقاله [English]
Modelling and Digitally Mapping of Surface Soil Organic Carbon in Semirom County Employing Several Machine Learning Algorithms
نویسندگان [English]
- Fatemeh Rahmati 1
- ُSaeid Hojati 2
- Kazem Rngzan 3
- Ahmad Landi 2
1 Former PhD Student, Department of Soil Science, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Ahvaz, Iran
2 Professor,Department of Soil Science, Faculty of Agriculture, Shahid Chamran University of Ahvaz, Ahvaz,, Iran
3 Professor,Department of Remote Sensing and GIS, Faculty of Earth Sciences, Shahid Chamran University of Ahvaz,, Ahvaz, Iran
چکیده [English]
Introduction: Knowledge about the spatial distribution of soil organic carbon (SOC) is one of the practical tools in determining sustainable land management strategies. Estimation of carbon contents and stocks are important for carbon sequestration, greenhouse gas emissions and national carbon balance inventories. Accurate mapping of SOC’s spatial distribution is a key assumption for soil resource management and land use planning. During the last two decades, the utilization of data mining approaches in spatial modeling of SOC using machine learning algorithms have been widely taken into consideration. The digital environment needs to have soil continuous maps at local and regional scales. However, such information is always not available at the required scale. Therefore, DSM approach is a key solution for quantifying and assessing the variation of soil properties such as SOC using remotely sensed indices and digital elevation model (DEM) as the most commonly useful ancillary data for soil organic carbon prediction. In this way, the data mining techniques is the pathway to create digital soil maps. Therefore, this study was carried out to compare the two common machine learning algorithms including random forest and multiple linear regressions in digital mapping of surface SOC in the Semirom County, Isfahan province. The digital maps of SOC using the two above-mentioned algorithms were also created and the most important variables affecting the distribution of SOC in the study area reported.
Materials and Methods: A total number of 200 surface soil samples (0-10 cm) were collected from the Semirom area (51º 17' - 52º 3' E; 30º 42' - 31º 51' N), Isfahan, Iran. Based on the synoptic meteorological station reports, the annual average temperature was in the range of 7.5-12.5 ▫C, the annual precipitation ranged between 350-450 mm. Soil moisture and temperature regimes are Xeric and Mesic, respectively. Then, using the Global Positioning System (GPS), sampling was done from the soil surface layer (0-10 cm). The preparation of soil samples includes air drying, pounding and softening of the collected samples performed, and then the samples were passed through a 2 mm sieve. Then, the amount of organic carbon in the samples was determined utilizing the Walkley-Black method. Also, in order to evaluate the effect of other soil properties on the organic contents of the soils, laboratory analyzes including saturated soil moisture content, soil texture, soil pH in saturated pastes, electrical conductivity of the soil saturation extracts and the calcium carbonate equivalent of the soils were measured utilizing standard laboratory protocols.
In this research, auxiliary variables including terrain parameters and vegetation indices were derived from digital elevation model (DEM) and the Landsat 8 OLI satellite images employing ArcMap version 10.4.10 and SAGAGIS version 6.0.4. Then, all auxiliary layers were converted to raster format using the “raster” package and merged with each other using the “Covstack” function. Afterwards, the values of the all environmental covariates at each sampling point were extracted in a single file using the “extract” function of the “sp” package in the RStudio environment. Then, using SPSS software v.19 and the principal component analysis (PCA) method, among the 29 auxiliary variables used in this research, the most important auxiliary covariates were used in the modeling process. The dataset were then split into two groups referred to as calibration (80%) and validation (20%) subsets. Finally, SOC contents of the soils were predicted and mapped using multiple linear regression (MLR) and random forest (RF) algorithms in RStudio environment. MLR and RF algorithms were run employing “lm” and “randomForest” packages, respectively. Five different statistics was used for evaluating the performance of each model including the coefficient of determination (R2), bias, root mean square error (RMSE), nRMSE, and mean bias error (MBE).
Results and Discussion
Based on the descriptive analysis of the soil samples, soils of the study area were characterized as non-saline, alkaline, and calcareous soils. The SOC contents of the soils ranged from 0.3 % to 2.2% with the mean value of 0.89 %. The coefficient of variation for the SOC contents was 21.7%, based on which soils of the study area are classified as soils with the moderate variability considering the values proposed by Wilding (1985). The results of PCA showed that the most important auxiliary variables could be used for the modeling process are slope aspect, channels network base level, catchment slope, total curvature, height, longitudinal curvature, mass balance index, modified catchment area, slope degree, slope length, topographic position index, vertical distance to channel network, soil adjusted vegetation index, transformed vegetation index, difference vegetation index, ratio vegetation index, and general curvature.These variables explained 80% of the total variance over the study area. The comparison of the two different SOC prediction models, demonstrated that the RF model (ntree =1000 and mtry =10) with the R2, RMSE, nRMSE, and bias values of 0.79, 0.12, 0.13, and 0.002 respectively, had a better performance rather than MLR model in this study. The first five very important variables detected by RF algorithm to predict SOC contents over the study area were transformed vegetation index, ration vegetation index, soil adjusted vegetation index, and slope degree. The final map of the surface SOC distribution over the study area shows that although the estimates made by the RF algorithm have provided better estimates compared with the MLR model, but caused overestimation and/or underestimation in predicting the minimum and maximum values of the surface SOC contents, respectively.
Conclusion
The results of this study showed the better performance of the RF regression algorithm due to its ability to take into account the nonlinear and complex relationships between SOC contents and the environmental covariates compared to the MLR method.
کلیدواژهها [English]
- Environmental covariates
- machine learning
- performance
- spatial distribution