Agricultural Engineering

Application of multinomial logistic regression model in digital survey of soil classes in Kouhbanan region of Kerman

Maryam Izadi Bidani; A Jafari; Mohammad Hadi Farpoor; Mojtaba Zeraatpisheh

Volume 43, Issue 3 , December 2020, , Pages 293-313

https://doi.org/10.22055/agen.2020.32275.1540

Abstract

Introduction: Soil digital mapping represents a set of mathematical computations to predict the distribution of soil classes in the landscape. . The digital identification of soils as a tool for creating soil spatial data provides ways to address the growing need for high-resolution soil maps. The use ... Read More Introduction: Soil digital mapping represents a set of mathematical computations to predict the distribution of soil classes in the landscape. . The digital identification of soils as a tool for creating soil spatial data provides ways to address the growing need for high-resolution soil maps. The use of digital soil mapping technique has been expanded considerably; therefore, new methods of mapping and preparing digital maps have been developed by researchers to eliminate the limitations of traditional methods. This approach relies on statistical relationships between measured soil observations and environmental covariates at the sampling locations. Digital soil data is increasing based on new processing tools and various digital data. The present study was conducted with the purpose of digital soil mapping in Kouhbanan region of Kerman based on a Multinomial logistic regression model. Materials and methods: The study area is located in southeastern Iran, northwest of Kerman city, in Kouhbanan distinct. This study covers a 2000 ha area. In this study, a Latin hypercube sampling design was applied and the sampling was done according to the difference in landforms (geomorphology map), topography (including digital elevation map) and geology. Finally, the geographic locations of 70 profiles were identified. Soil profiles were described according to U.S. Soil Taxonomy (Soil Survey Staff, 2014) and finally, the soil samples were taken from their diagnostic horizons. The collected soil samples were transferred to the laboratory, and some physical and chemical analyzes were performed based on routine standard methods. Environmental data include the parameters derived from the digital elevation model, Landsat satellite images (remote sensing indexes), geology map, geomorphic units (geomorphology map) and legacy soil map of the study area. All environmental variables were derived using ENVI and SAGA software. In this research, a multinomial logistic regression model was used to predict soil classes and the modeling was done in R software using nnet package. It is worth noting that leave-one-out cross validation was used for validation. Estimation of predictive accuracy of soil classes was also done using the overall accuracy index and Kappa coefficient.Results and discussion: The results showed that the soils of the study area were mainly classified in the Aridisols and Entisols orders. The modeling results showed that the terrain attributes were recognized as the effective auxiliary variables in the prediction process of soil classes. This confirms topographic importance on soil genesis in the studied area. After that, geomorphology map was an important tool in soil mapping that helps to increase predictive accuracy. Among the soil classes, the prediction of Haplocambids was accompanied with low accuracy, while Haplosalids great groups were predicted with high accuracy. The low estimation accuracy of the great group of Haplocambids is probably due to the low sample size of this class of soil in the study area. A good identification of the relationships between the predictor variables and the target variable depends primarily on the size and distribution of the sample in the layers. There were only two examples of Haplocambids in the area. Therefore, low accuracy is expected because the model has failed to establish a relationship between this class with environmental variables and makes it difficult to identify threshold values for classifying soil classes and, consequently, a poorly trained model. It is also possible that low prediction accuracy is the result of the conceptual model being incomplete, since there is no characteristic feature that can help model training and ultimately prediction. Among the soil great groups, the best predictions were obtained for the great group of Haplosalids, which demonstrates high values of user accuracy and reliability. Accurate prediction of the class of Haplosalids is highly correlated with the spatial distribution of indices such as wetness index and NDVI. Kappa index and purity map were calculated 0.45 and 0.65 for digital soil map derived from multinomial logistic regression. In the predicted map, six major groups of Haplosalids, Haplocambids, Haplocalcids, Haplogypsids, Calcigypsids and Torrifluvents were identified. The great groups of Haplocalcids, Haplosalids, and Calcigypsids cover most of the area and the great groups Haplocambids and Haplogypsids occupy lowest of the area. The great group of Haplosalids is located in the north of the region and in the piedmont plain landform. Haplocalcids great groups were most commonly found in alluvial fan landform, while Calcigypsids are located in pediments, alluvial fans, and piedmont plain landforms. Haplocambids and Haplogypsids great groups are located more in the geomorphic surface of the alluvial fan and the piedmont plain, respectively. The parts of the region with the most variations or diversity of soil classes are exactly where the geomorphological map has the most segmentation. Therefore, the presence of different soil classes in the least-differentiated and most similar regions is resulted to an inefficient conceptual model and poor prediction results. Conclusions: The results showed that topographic parameters were the most important and powerful variable in modeling, and confirms that topography or relief is the most important soil forming factor in the study area. Predictive results of soil classes in Kuhbanan area of Kerman province showed that geomorphological map in the study area is very useful and necessary and also is effective in understanding and communicating between soil and landscape. Using this map as a qualitative auxiliary variable can explain much of the variability of soils in the study area. Careful field observation, satellite imagery consideration, study and interpretation of data obtained from soil profiles indicate that the study area has been evolved by geological, geomorphological, and hydrological processes that lead to the formation of various landforms including rock outcrops, hills, pediment , alluvial fan and plain. For the multinomial logistic regression model in the study area, terrain attributes have the most influence on the prediction of soil classes and soil properties than the remote sensing indices. The strong relationship between soil data and environmental parameters is one of the factors influencing model accuracy. Logistic regression models will have great potential in predicting soil classes if a complete understanding of the study area and proper selection of auxiliary variables are carried out.

Soil Genesis and Classification

Spatial prediction of soil great groups by regression models and decision tree in region, southeastern Iran

Farideh Abbaszadeh Afshar

Volume 41, Issue 2 , September 2018, , Pages 133-146

https://doi.org/10.22055/agen.2018.21050.1336

Abstract

Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil ... Read More Introduction Mapping the spatial distribution of soil taxonomic classes is important for informing soil use and management decisions. Digital soil mapping (DSM) can quantitatively predict the spatial distribution of soil taxonomic classes. DSM is the computer-assisted production of digital maps of soil type and soil properties. It typically implies use of mathematical and statistical models that combine information from soil observations with information contained in correlated variables and remote sensing images. Machine learning is a general term for a broad set of models used to discover patterns in data and to make predictions. Although machine learning is most often applied to large databases, it is an attractive tool for learning about and making spatial predictions of soil classes because knowledge about relationships between soil classes and environmental covariates is often poorly understood. Our objective was to compare multiple machine learning models (multinomial regression logistic, boosted regression trees and decision tree) for predicting soil great groups at Bam distinct in Kerman province. Materials and Methods The study area, Bam district was located between 58°4΄17˝ to 58°28΄8˝ E longitudes and 28°52΄51˝ to 29°9΄29˝ N latitudes (Fig. 1), at Kerman province, (Southeastern Iran). The area is surrounded by mountains (dominantly limestone and volcanic) from northwest toward southeast with major landforms included young alluvial fans and pediment, clay flat and hills. The mean annual precipitation, temperature and potential evapotranspiration are respectively 64 mm, 23.8◦C and 3000 mm with Aridic and Hyper thermic soil moisture and temperate regimes Stratified sampling scheme were defined in 100000 hectares, and 126 soil profiles were excavated and described by Key of soil taxonomy. Our objective was to perform and compare multiple machine learning models for predicting soil taxonomic classes (great group level). The models were used in this study including, multinomial logistic regression (MLR), boosted regression trees (BRT) and decision tree (DT). We used 80/20 training/testing split (80% of the pedon observations were used for model training and 20% for model testing). Kappa index (KI), overall accuracy (OC), Brier scores (BS), User accuracy (UA) and producer accuracy (PA) were used to compare model accuracy. Results and Discussion The profile description revealed the presence of two soil orders: Entisols and Aridisols that, subdivided in six suborders and eight great groups: Haplosalids, Haplocambids, Haplocalcids, Haplogypsids, Calcigypsids, Calciargids, Petrocalcids and Torriorthents. This testifies to the wide pedodiversity of the study area, considering that is characterized by the presence of eight soils great groups. Results showed that the geomorphology map contributed importantly to the prediction accuracy. This can be explained by the fact that the geomorphological surfaces have formed recently, or during a geological period with soil formation under conditions close to those of current processes in the arid regions. Terrain attributes and finally remote sensing indices after geomorphic surface were imported as predictors in the prediction. The best prediction result was obtained when characteristics derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. The spatial distribution of soils in the study area followed the distribution pattern of most geomorphological and terrain attributes. The results of model comparing indicated that decision tree was consistently the most accurate. The results of prediction accuracy of soil groups showed that the highest accuracy related Haplosalids, Calcigypsids and Petrocalcids soil great groups. The lowest of predictive quality was observed for Haplocalcids in three approaches. As a reliable and flexible approach, decision tree could be used successfully to prepare continuous digital soil maps. Conclusion The application of decision trees for prediction of soil types could be a promising alternative. In digital soil mapping, the best prediction result was obtained when parameters derived from terrain, remote sensing and geomorphological processes were used together and when differentiation of geomorphological processes and overall heterogeneity identification and stratification of the study area was made. In areas where the distribution of predictors was more homogenous, the models can better understand and connect predictors and response. Altogether, an extended digital terrain analysis approach and clear description of geomorphological, geological and pedological processes could be a promising key technology in future soil mapping.

Soil Genesis and Classification

Digital Soil Mapping using legacy soil data: Case study of Faryab region of Kerman

Mansooreh Khaleghi; Azam Jafari; Mohammad Hadi Farpour

Volume 41, Issue 4 , March 2018, , Pages 31-48

https://doi.org/10.22055/agen.2018.26477.1439

Abstract

Introduction Soil digital mapping represents a set of mathematical computations to predict the distribution of soil classes in the landscape. This approach relies on statistical relationships between measured soil observations and environmental covariates at the sampling locations. The need for digital ... Read More Introduction Soil digital mapping represents a set of mathematical computations to predict the distribution of soil classes in the landscape. This approach relies on statistical relationships between measured soil observations and environmental covariates at the sampling locations. The need for digital soil mapping as an addition to conventional soil surveys results from a worldwide growing demand for high- resolution digital soil maps for environmental protection and management as well as projects of the public authorities. Digital soil data is increasing based on new processing tools and various digital data. The digital identification of soils as a tool for creating soil spatial data provides ways to address the growing need for high-resolution soil maps. The main objective of this study is to generate the digital soil map based on the legacy soil data. Materials and methods The study area is located in southeastern Iran, 330 km from Kerman city, in Faryab distinct. In this study, a Latin hypercube sampling design was applied and the sampling was done according to the difference in landforms (geomorphology map), topography (including digital elevation map) and geology. The geographic locations of 70 profiles were identified. Soil profiles were described according to U.S. Soil Taxonomy (Soil Survey Staff, 2014) and finally, the soil samples were taken from their diagnostic horizons. The collected soil samples were transferred to the laboratory, and some physical and chemical analyzes were performed based on routine standard methods. Environmental data include the parameters derived from the digital elevation model, Landsat satellite images (remote sensing indexes), geology map, geomorphic units (geomorphology map) and legacy soil map of the study area. All environmental variables were derived using ENVI and SAGA software. In this research, a multinomial logistic regression model was used to predict soil classes and the modeling was done in two scenarios: 1- modeling without the legacy soil map and 2- modeling with the legacy soil map. Estimation of predictive accuracy of soil classes was also done using the overall accuracy index and Kappa coefficient. Results and discussion The result of the modeling with the multinomial logistic regression method in two sets of input variables showed that the topographic position index is the most effective variable in predicting soil classes. This confirms topographic importance on soil genesis in the studied area. After topographic variables, the legacy soil data is an effective parameter in modeling. The legacy data of soil is a strong and valuable database for predicting soil characteristics. The old soil map consists of the salt surfaces and Inceptisols order. Unlike the hot and arid climate of the study area, Inceptisols order was identified in the old soil map. Soil survey with very small scale was probably led to generalization of the studied soils and hiding the main soils of the study area. However, the small-scale mapping and the presentation of different soils in the region do not prevent the presence of the old soil map as an important predictor. It seems that there is a high concordance between the borders of old soil map and the described soils diversity in the study area. The matching and concordance between the boundaries of the old map and the described soil profiles help the model to differentiate different soils, although the correspondence between the soils type of the old soil map and the observed soils can play a more effective role in predicting by the model. Soil legacy information is a powerful and valuable database for predicting any feature of the soil. In both predicted maps, four major groups of Haplosalids, Haplocambids, Haplocalcids and Torriorthents were identified. The great group of Torriorthents is located in the north of the region and in the alluvial fan landform. Haplosalids great groups were most commonly found in clayey surfaces. Haplocambids and Haplocalcids great groups are located more in the geomorphic surface of the cultivated fan and the piedmont plain, respectively. The results of the predictive quality of the logistic regression model showed that the number of well-estimated soils in the presence of the old soil map is more than when there is no old soil map in the modeling. In addition, the results of the validation of the models showed that the map accuracy and kappa index increased in presence of the legacy soil map. As a result, the model's validation indices including the map purity and Kappa index increased from 0.47 and 0.16 to 0.63 and 0.43, respectively. In both models, the highest accuracy of the estimation was obtained for Haplocambids great group. Conclusions The results showed that topographic position index was the most important and powerful variable for forecasting in both models, and confirms that topography or relief is the most important soil forming factor in the study area. Using the legacy soil map as one of the environmental variables in modeling, efficiency and accuracy are more accurate than modeling without the legacy soil map. If the old soil maps as legacy information are used in digital soil mapping, the similarity and matching of the soils of the studied area shoud be cheched even with the very small scale because the high concordance leads to rational prediction, and random and chance predictions do not occur.

Articles in Press

Current Issue

Volume 47 (2024)

Volume 46 (2023)

Volume 45 (2022)

Volume 44 (2021)

Volume 43 (2020)

Volume 42 (2019)

Volume 41 (2018)

Volume 40 (2017)

Volume 39 (2016)

Volume 38 (2015)

Volume 37 (2014)

Volume 36 (2013)

Volume 35 (2013)

Volume 34 (2011)

Volume 33 (2010)

Volume 32 (2010)

Volume 30 (2008)

Volume 29 (2007)

Volume 28 (2005)

Volume 27 (2005)

Volume 25 (2002)

Volume 24 (2001)

Volume 21 (1999)

Volume 18 (1996)

Volume 15 (1991)

Volume 14 (1991)

Volume 13 (1990)

Volume 11 (1987)

Volume 8 (1983)

Volume 7 (1980)

Volume 5 (1978)

Volume 3 (1978)

Volume 4 (1977)

Volume 1.32 (1975-2010)

Keywords = soil great group

Application of multinomial logistic regression model in digital survey of soil classes in Kouhbanan region of Kerman

Abstract

Spatial prediction of soil great groups by regression models and decision tree in region, southeastern Iran

Abstract

Digital Soil Mapping using legacy soil data: Case study of Faryab region of Kerman

Abstract