نوع مقاله : کاربردی

نویسندگان

1 دانشجوی دکتری علوم خاک، دانشکده کشاورزی، دانشگاه شیراز، شیراز، ایران

2 استاد گروه علوم خاک، دانشکده کشاورزی، دانشگاه شیراز ، شیراز، ایران

3 استاد گروه علوم خاک، دانشکده کشاورزی، دانشگاه شیراز، شیراز، ایران

4 دانشیار گروه علوم خاک، دانشکده کشاورزی، دانشگاه شهید باهنر کرمان، کرمان، ایران

چکیده

تعداد متغیرهای محیطی مورد استفاده برای نقشه‌برداری رقومی خاک به سرعت افزایش یافته است، که انتخاب و تمرکز بر روی مهم‌ترین متغیرهای کمکی را با چالش روبه‌رو کرده است. از طرفی، شناسایی همه متغیرهای محیطی به منظور دستیابی به اطلاعات مکانی برای بهبود پیش‌بینی‌ها، سودمند است. در این راستا، الگوریتم‌های انتخاب ویژگی با شناسایی متغیرهای کمکی مرتبط، به کاهش ابعاد مدل پیش‌بینی کننده کمک می‌کنند. در مطالعه حاضر، چهار تکنیک مختلف انتخاب ویژگی شامل عامل تورم واریانس (VIF)، تجزیه مولفه‌های اصلی (PCA)، باروتا (Boruta) و حذف ویژگی بازگشتی (RFE) به منظور تولید مجموعه‌ای بهینه از متغیرهای کمکی، برای پیش‌بینی مکانی کلاس‌های خاک در سطح گروه بزرگ به کمک مدل جنگل تصادفی‌ بکار گرفته شد. مقایسه تکنیک‌های مختلف انتخاب ویژگی در تخمین کلاس‌های خاک، با استفاده از معیارهای ارزیابی دقت و ضریب کاپا بین مقادیر مشاهده‌شده و پیش‌بینی‌شده، انجام شد. نتایج نشان داد، با استفاده از متغیرهای انتخاب شده توسط روش‌های مختلف انتخاب ویژگی نسبت به کاربرد همه متغیرها در مدل، دقت پیش‌بینی تا حدودی افزایش یافت. همچنین در میان چهار رویکرد انتخاب ویژگی، بهبود عملکرد پیش‌بینی متفاوت بود. روش VIF و PCA به ترتیب بیشترین و کمترین دقت و ضریب کاپا را داشتند، در حالی که روش باروتا با کمترین تعداد متغیر توانست بعد از VIF عملکرد مدل را بهبود بخشد. به‌طور کلی یافته‌ها نشان داد، کاربرد روش‌های انتخاب ویژگی می‌تواند از وابستگی قابل‌توجه متغیرهای کمکی مربوطه برای پیش‌بینی کلاس‌های خاک استفاده کند و دقت مدل‌سازی را بهبود بخشد.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Evaluation of different feature selection algorithms for improving the spatial prediction of soil classes

نویسندگان [English]

  • Vahideh Sadeghizadeh 1
  • seyed ali abtahi 2
  • Majid Baghernejad 3
  • Azam Jafari 4
  • Seyed Ali Akbar Moosavi 3

1 Ph.D. Student, Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Ira

2 Professor, Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran

3 Professor, Department of Soil Science, College of Agriculture, Shiraz University, Shiraz, Iran

4 Associate Professor, Department of Soil Science, College of Agriculture, Shahid Bahonar University of Kerman, Kerman, Iran

چکیده [English]

Introduction The number of environmental variables used in digital soil mapping has increased rapidly, which has made it a challenge to select and focus on the most important covariates. No environmental covariates have the same predictability in modeling, and some covariates may introduce noise that reduces the predictive power of the models used. On the other hand, it is beneficial to identify all environmental variables to obtain spatial information that can improve predictions. In this regard, the feature selection algorithms help reduce the dimensions of the predictive model by identifying the associated covariates. Therefore, this study aims to investigate different feature selection algorithms in the selection of auxiliary variables and evaluation their effect on the predictive model.
Materials and Methods The area under study is a part of Darab city in the southeast of Fars province with an area of about 31000 hectares. In the study area 140 profiles were determined and excavated according to the diversity of geomorphological units and thus the type of soils. After excavating the profiles and checking the morphological characteristics of each soil profile, a sufficient amount of soil samples were collected from the genetic horizons and transported to the laboratory for further analysis. Some of the physical and chemical parameters of soils were tested using accepted techniques after air drying and passing through a 2 mm sieve. Finally, all profiles up to the great group level were classified using the U.S. Soil Taxonomy based on the data collected from field observations and the outcomes of laboratory analysis. Environmental variables include the parameters derived from the Digital Elevation Model, Landsat 8 images, geology and geomorphology maps of the study area. All parameters were derived using ArcGIS, SAGAGIS and ENVI softwares. In the present study, four different feature selection techniques including Variance Inflation Factor (VIF), Principal Component Analysis (PCA), Boruta and Recursive Feature Elimination (RFE), were used to identify an optimal set of covariates for predicting spatial classification of soil classes at the great group level. In addition, a Random Forest model (RF) with 10-fold cross-validation and the 5-repeat method, was used to compare different feature selection strategies in soil class mapping. The comparison of different feature selection techniques in estimating soil classes, was based on the evaluation criteria of accuracy and Kappa coefficient between observed and predicted values.
Results and Discussion The results showed that the prediction accuracy increased by using variables selected with different feature selection methods compared to using all variables in the model. In addition, the improvement in predictive performance is different between the four types of feature selection. The VIF and PCA methods had the highest and lowest accuracy index and Kappa coefficient, respectively. The Boruta method, with the lowest number of variables, improved the model's performance after the VIF method. However, the Kappa coefficient showed poor agreement between predicted and observed values for all approaches. The imbalance of soil classes could be a reason for decreasing the accuracy index and Kappa coefficient. However, the random forest model, with and without feature selection methods, identified all soil great groups in the study area. Therefore, it can be concluded that the Random Forest algorithm is a very powerful technique for spatial prediction of soil classes in the study area. Although the performance of the model varied using different feature selection algorithms, the predicted soil maps had similar spatial patterns. Based on the prediction of model with the variables selected by the VIF, the resulting map indicates that Ustorthents soils are mainly located in high altitude regions with steep slopes. Haplustepts, Calciustepts, and Calciusterts great groups have developed in places with low to medium slopes. Haplosalids have developed downstream of the salt dome. Great groups of Ustifluvents were discovered in fluvial sedimentary plains. Endoaquepts were found in the floodplains, which had the smallest area on the predicted map.
Conclusion Overall, the findings indicate that the feature selection methods can utilize significant dependencies among relevant covariates to predict soil classes and to improve modeling accuracy. In the current study, the environmental factors, obtained from the Digital Elevation Model, were selected as key variables, showing the importance of topography and morphology in the classification of soil types in the area. Although the selected variables improved the performance of the model, the prediction of soil classes was random. This could be attributed to the imbalance of soil classes.

کلیدواژه‌ها [English]

  • Digital Soil Mapping
  • Feature Selection
  • Covariates
  • Random Forest