Soil Genesis and Classification
Vahideh Sadeghizadeh; seyed ali abtahi; Majid Baghernejad; Azam Jafari; Seyed Ali Akbar Moosavi
Abstract
Introduction The number of environmental variables used in digital soil mapping has increased rapidly, which has made it a challenge to select and focus on the most important covariates. No environmental covariates have the same predictability in modeling, and some covariates may introduce noise that ...
Read More
Introduction The number of environmental variables used in digital soil mapping has increased rapidly, which has made it a challenge to select and focus on the most important covariates. No environmental covariates have the same predictability in modeling, and some covariates may introduce noise that reduces the predictive power of the models used. On the other hand, it is beneficial to identify all environmental variables to obtain spatial information that can improve predictions. In this regard, the feature selection algorithms help reduce the dimensions of the predictive model by identifying the associated covariates. Therefore, this study aims to investigate different feature selection algorithms in the selection of auxiliary variables and evaluation their effect on the predictive model. Materials and Methods The area under study is a part of Darab city in the southeast of Fars province with an area of about 31000 hectares. In the study area 140 profiles were determined and excavated according to the diversity of geomorphological units and thus the type of soils. After excavating the profiles and checking the morphological characteristics of each soil profile, a sufficient amount of soil samples were collected from the genetic horizons and transported to the laboratory for further analysis. Some of the physical and chemical parameters of soils were tested using accepted techniques after air drying and passing through a 2 mm sieve. Finally, all profiles up to the great group level were classified using the U.S. Soil Taxonomy based on the data collected from field observations and the outcomes of laboratory analysis. Environmental variables include the parameters derived from the Digital Elevation Model, Landsat 8 images, geology and geomorphology maps of the study area. All parameters were derived using ArcGIS, SAGAGIS and ENVI softwares. In the present study, four different feature selection techniques including Variance Inflation Factor (VIF), Principal Component Analysis (PCA), Boruta and Recursive Feature Elimination (RFE), were used to identify an optimal set of covariates for predicting spatial classification of soil classes at the great group level. In addition, a Random Forest model (RF) with 10-fold cross-validation and the 5-repeat method, was used to compare different feature selection strategies in soil class mapping. The comparison of different feature selection techniques in estimating soil classes, was based on the evaluation criteria of accuracy and Kappa coefficient between observed and predicted values.Results and Discussion The results showed that the prediction accuracy increased by using variables selected with different feature selection methods compared to using all variables in the model. In addition, the improvement in predictive performance is different between the four types of feature selection. The VIF and PCA methods had the highest and lowest accuracy index and Kappa coefficient, respectively. The Boruta method, with the lowest number of variables, improved the model's performance after the VIF method. However, the Kappa coefficient showed poor agreement between predicted and observed values for all approaches. The imbalance of soil classes could be a reason for decreasing the accuracy index and Kappa coefficient. However, the random forest model, with and without feature selection methods, identified all soil great groups in the study area. Therefore, it can be concluded that the Random Forest algorithm is a very powerful technique for spatial prediction of soil classes in the study area. Although the performance of the model varied using different feature selection algorithms, the predicted soil maps had similar spatial patterns. Based on the prediction of model with the variables selected by the VIF, the resulting map indicates that Ustorthents soils are mainly located in high altitude regions with steep slopes. Haplustepts, Calciustepts, and Calciusterts great groups have developed in places with low to medium slopes. Haplosalids have developed downstream of the salt dome. Great groups of Ustifluvents were discovered in fluvial sedimentary plains. Endoaquepts were found in the floodplains, which had the smallest area on the predicted map. Conclusion Overall, the findings indicate that the feature selection methods can utilize significant dependencies among relevant covariates to predict soil classes and to improve modeling accuracy. In the current study, the environmental factors, obtained from the Digital Elevation Model, were selected as key variables, showing the importance of topography and morphology in the classification of soil types in the area. Although the selected variables improved the performance of the model, the prediction of soil classes was random. This could be attributed to the imbalance of soil classes.
Soil Genesis and Classification
Farideh Abbaszadeh Afshar
Abstract
Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil ...
Read More
Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil type and soil properties. It typically implies use of mathematical and statistical models that combine information from soil observations with information contained in correlated variables and remote sensing images. Machine learning is a general term for a broad set of models used to discover patterns in data and to make predictions. Although machine learning is most often applied to large databases, it is an attractive tool for learning about and making spatial predictions of soil classes because knowledge about relationships between soil classes and environmental covariates is often poorly understood. Our objective was to compare multiple machine learning models (multinomial regression logistic, boosted regression trees and decision tree) for predicting soil great groups at Bam distinct in Kerman province. Materials and Methods The study area, Bam district was located between 58°4΄17˝ to 58°28΄8˝ E longitudes and 28°52΄51˝ to 29°9΄29˝ N latitudes (Fig. 1), at Kerman province, (Southeastern Iran). The area is surrounded by mountains (dominantly limestone and volcanic) from northwest toward southeast with major landforms included young alluvial fans and pediment, clay flat and hills. The mean annual precipitation, temperature and potential evapotranspiration are respectively 64 mm, 23.8◦C and 3000 mm with Aridic and Hyper thermic soil moisture and temperate regimes Stratified sampling scheme were defined in 100000 hectares, and 126 soil profiles were excavated and described by Key of soil taxonomy. Our objective was to perform and compare multiple machine learning models for predicting soil taxonomic classes (great group level). The models were used in this study including, multinomial logistic regression (MLR), boosted regression trees (BRT) and decision tree (DT). We used 80/20 training/testing split (80% of the pedon observations were used for model training and 20% for model testing). Kappa index (KI), overall accuracy (OC), Brier scores (BS), User accuracy (UA) and producer accuracy (PA) were used to compare model accuracy. Results and Discussion The profile description revealed the presence of two soil orders: Entisols and Aridisols that, subdivided in six suborders and eight great groups: Haplosalids, Haplocambids, Haplocalcids, Haplogypsids, Calcigypsids, Calciargids, Petrocalcids and Torriorthents. This testifies to the wide pedodiversity of the study area, considering that is characterized by the presence of eight soils great groups. Results showed that the geomorphology map contributed importantly to the prediction accuracy. This can be explained by the fact that the geomorphological surfaces have formed recently, or during a geological period with soil formation under conditions close to those of current processes in the arid regions. Terrain attributes and finally remote sensing indices after geomorphic surface were imported as predictors in the prediction. The best prediction result was obtained when characteristics derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. The spatial distribution of soils in the study area followed the distribution pattern of most geomorphological and terrain attributes. The results of model comparing indicated that decision tree was consistently the most accurate. The results of prediction accuracy of soil groups showed that the highest accuracy related Haplosalids, Calcigypsids and Petrocalcids soil great groups. The lowest of predictive quality was observed for Haplocalcids in three approaches. As a reliable and flexible approach, decision tree could be used successfully to prepare continuous digital soil maps. Conclusion The application of decision trees for prediction of soil types could be a promising alternative. In digital soil mapping, the best prediction result was obtained when parameters derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. Altogether, an extended digital terrain analysis approach and clear description of geomorphological, geological and pedological processes could be a promising key technology in future soil mapping.