Research Paper

Split Viewer

Econ. Environ. Geol. 2023; 56(3): 331-341

Published online June 30, 2023

https://doi.org/10.9719/EEG.2023.56.3.331

© THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY

Optimization of Soil Contamination Distribution Prediction Error using Geostatistical Technique and Interpretation of Contributory Factor Based on Machine Learning Algorithm

Hosang Han1, Jangwon Suh2,*, Yosoon Choi3

1Energy and Mineral Resources Engineering, Kangwon National University, Samcheok 25913, Republic of Korea
2Energy Resources and Chemical Engineering, Kangwon National University, Samcheok 25913, Republic of Korea
3Energy Resources Engineering, Pukyong National University, Busan 48513, Republic of Korea

Correspondence to : *jangwonsuh@kangwon.ac.kr; jangwonsuh@hanmail.net

Received: April 8, 2023; Revised: May 3, 2023; Accepted: May 4, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

When creating a soil contamination map using geostatistical techniques, there are various sources that can affect prediction errors. In this study, a grid-based soil contamination map was created from the sampling data of heavy metal concentrations in soil in abandoned mine areas using Ordinary Kriging. Five factors that were judged to affect the prediction error of the soil contamination map were selected, and the variation of the root mean squared error (RMSE) between the predicted value and the actual value was analyzed based on the Leave-one-out technique. Then, using a machine learning algorithm, derived the top three factors affecting the RMSE. As a result, it was analyzed that Variogram Model, Minimum Neighbors, and Anisotropy factors have the largest impact on RMSE in the Standard interpolation. For the variogram models, the Spherical model showed the lowest RMSE, while the Minimum Neighbors had the lowest value at 3 and then increased as the value increased. In the case of Anisotropy, it was found to be more appropriate not to consider anisotropy. In this study, through the combined use of geostatistics and machine learning, it was possible to create a highly reliable soil contamination map at the local scale, and to identify which factors have a significant impact when interpolating a small amount of soil heavy metal data.

Keywords soil contamination map, prediction error, variogram, Ordinary Kriging, machine learning

지구통계 기법을 이용한 토양오염 분포 예측 오차 최적화 및 머신러닝 알고리즘 기반의 영향인자 해석

한호상1 · 서장원2,* · 최요순3

1강원대학교 에너지자원융합공학과
2강원대학교 에너지자원화학공학과
3부경대학교 에너지자원공학과

요 약

지구통계 기법을 기반으로 토양오염지도를 작성하는 경우 예측 오차가 발생하며 이에 영향을 미치는 다양한 원인이 존재한다. 본 연구에서는 정규 크리깅을 활용하여 폐광산지역의 토양 내 중금속 농도 샘플링 데이터로부터 격자형 기반의 토양오염지도를 작성하였다. 해당 지도의 예측 오차에 영향을 미친다고 판단된 5개 인자를 선정하고, Leave-one-out 기법을 기반으로 인자의 옵션과 설정값의 변화에 따른 예측값과 실측값 간의 평균제곱근오차(root mean square error, RMSE) 변화를 분석하였다. 이후 머신러닝 알고리즘을 이용하여 RMSE에 영향을 미치는 상위 3개 인자를 도출하였다. 그 결과, Standard interpolation에서는 Variogram Model, Minimum Neighbors, Anisotropy 인자가 RMSE에 가장 큰 영향을 미치는 것으로 분석되었다. 베리오그램 모델에서는 Spherical 모델이 가장 낮은 RMSE를 보였으며, Minimum Neighbors는 3에서 최젓값을 보인 후 값이 증가함에 따라 증가하였다. Anisotropy의 경우 이방성을 고려하지 않는 것이 더 적합한 것으로 나타났다. 본 연구에서는 지구통계와 머신러닝의 복합 활용을 통해 지역 규모에서 높은 신뢰성을 갖는 토양오염지도를 작성할 수 있었고, 적은 수의 토양 샘플링 데이터의 보간 작업 시 어떠한 요인들이 큰 영향을 미치는지 파악할 수 있었다.

주요어 토양오염지도, 예측 오차, 베리오그램, 정규 크리깅, 머신러닝

Article

Research Paper

Econ. Environ. Geol. 2023; 56(3): 331-341

Published online June 30, 2023 https://doi.org/10.9719/EEG.2023.56.3.331

Copyright © THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY.

Optimization of Soil Contamination Distribution Prediction Error using Geostatistical Technique and Interpretation of Contributory Factor Based on Machine Learning Algorithm

Hosang Han1, Jangwon Suh2,*, Yosoon Choi3

1Energy and Mineral Resources Engineering, Kangwon National University, Samcheok 25913, Republic of Korea
2Energy Resources and Chemical Engineering, Kangwon National University, Samcheok 25913, Republic of Korea
3Energy Resources Engineering, Pukyong National University, Busan 48513, Republic of Korea

Correspondence to:*jangwonsuh@kangwon.ac.kr; jangwonsuh@hanmail.net

Received: April 8, 2023; Revised: May 3, 2023; Accepted: May 4, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

When creating a soil contamination map using geostatistical techniques, there are various sources that can affect prediction errors. In this study, a grid-based soil contamination map was created from the sampling data of heavy metal concentrations in soil in abandoned mine areas using Ordinary Kriging. Five factors that were judged to affect the prediction error of the soil contamination map were selected, and the variation of the root mean squared error (RMSE) between the predicted value and the actual value was analyzed based on the Leave-one-out technique. Then, using a machine learning algorithm, derived the top three factors affecting the RMSE. As a result, it was analyzed that Variogram Model, Minimum Neighbors, and Anisotropy factors have the largest impact on RMSE in the Standard interpolation. For the variogram models, the Spherical model showed the lowest RMSE, while the Minimum Neighbors had the lowest value at 3 and then increased as the value increased. In the case of Anisotropy, it was found to be more appropriate not to consider anisotropy. In this study, through the combined use of geostatistics and machine learning, it was possible to create a highly reliable soil contamination map at the local scale, and to identify which factors have a significant impact when interpolating a small amount of soil heavy metal data.

Keywords soil contamination map, prediction error, variogram, Ordinary Kriging, machine learning

지구통계 기법을 이용한 토양오염 분포 예측 오차 최적화 및 머신러닝 알고리즘 기반의 영향인자 해석

한호상1 · 서장원2,* · 최요순3

1강원대학교 에너지자원융합공학과
2강원대학교 에너지자원화학공학과
3부경대학교 에너지자원공학과

Received: April 8, 2023; Revised: May 3, 2023; Accepted: May 4, 2023

요 약

지구통계 기법을 기반으로 토양오염지도를 작성하는 경우 예측 오차가 발생하며 이에 영향을 미치는 다양한 원인이 존재한다. 본 연구에서는 정규 크리깅을 활용하여 폐광산지역의 토양 내 중금속 농도 샘플링 데이터로부터 격자형 기반의 토양오염지도를 작성하였다. 해당 지도의 예측 오차에 영향을 미친다고 판단된 5개 인자를 선정하고, Leave-one-out 기법을 기반으로 인자의 옵션과 설정값의 변화에 따른 예측값과 실측값 간의 평균제곱근오차(root mean square error, RMSE) 변화를 분석하였다. 이후 머신러닝 알고리즘을 이용하여 RMSE에 영향을 미치는 상위 3개 인자를 도출하였다. 그 결과, Standard interpolation에서는 Variogram Model, Minimum Neighbors, Anisotropy 인자가 RMSE에 가장 큰 영향을 미치는 것으로 분석되었다. 베리오그램 모델에서는 Spherical 모델이 가장 낮은 RMSE를 보였으며, Minimum Neighbors는 3에서 최젓값을 보인 후 값이 증가함에 따라 증가하였다. Anisotropy의 경우 이방성을 고려하지 않는 것이 더 적합한 것으로 나타났다. 본 연구에서는 지구통계와 머신러닝의 복합 활용을 통해 지역 규모에서 높은 신뢰성을 갖는 토양오염지도를 작성할 수 있었고, 적은 수의 토양 샘플링 데이터의 보간 작업 시 어떠한 요인들이 큰 영향을 미치는지 파악할 수 있었다.

주요어 토양오염지도, 예측 오차, 베리오그램, 정규 크리깅, 머신러닝

    Fig 1.

    Figure 1.Flowchart to illustrate the research procedure in this study.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 2.

    Figure 2.Location of soil contaminant sampling points in the study area.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 3.

    Figure 3.Result of exploratory data analysis. (a) Distribution of Cu; (b) QQ-plot.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 4.

    Figure 4.RReliefF coefficient of parameters considered to affect the ordinary kriging prediction error. (a) Standard; (b) Smooth.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 5.

    Figure 5.The variation in RMSE for each model between the second- and third-ranked options. (a) Standard; (b) Smooth.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 6.

    Figure 6.Soil contamination mapping based on RMSE of standard interpolation. (a) Lowest; (b) Highest.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 7.

    Figure 7.Soil contamination mapping based on RMSE of smooth interpolation. (a) Lowest; (b) Highest.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Fig 8.

    Figure 8.Difference between the lowest and highest soil contamination maps based on the RMSE. (a) Standard; (b) Smooth.
    Economic and Environmental Geology 2023; 56: 331-341https://doi.org/10.9719/EEG.2023.56.3.331

    Table 1 . The parameter setting of each contributory factor in ordinary kriging.

    Neighborhood TypeModelAnisotropyMaximum NeighborsMinimum NeighborsSector type
    StandardSpherical / Exponential / GaussianFalse / TrueMin 5 Max 15 Step 2Min 2 Max 10 Step 14 Sector / 4 Sector with 45° offset / 8 Sector
    SmoothSmoothing function
    Min 0.1Max 1.0Step 0.1

    Table 2 . Discriptive statistics data of soil contaminant (Cu) (unit: mg/kg).

    MinMaxMedianMeanSkewnessStandard deviation
    189571051712.37183.63

    Table 3 . RMSE of option selections for each parameter based on sensitivity analysis..

    Neighborhood TypeModelAnisotropyMaximum NeighborsMinimum NeighborsSector TypeRMSE (mg/kg)Remarks
    StandardSphericalFalse5 - 15*34 Sector with 45ο offset112.53Lowest
    ExponentialFalse544 Sector with 45ο offset117.17
    GaussianFalse5 – 15*34 Sector with 45ο offset116.44
    SphericalTrue538 Sector118.36
    ExponentialTrue544 Sector117.35
    GaussianTrue548 Sector124.66
    SphericalFalse11 – 15*104 Sector with 45ο offset121.24Highest
    ExponentialFalse964 Sector with 45ο offset119.13
    GaussianFalse11104 Sector with 45ο offset120.98
    SphericalTrue11 – 15*104 Sector with 45ο offset123.04
    ExponentialTrue1164 Sector118.62
    GaussianTrue11 – 15*104 Sector with 45ο offset128.20
    ModelAnisotropySmoothing functionRMSERemarks
    SmoothSperhicalFalse0.1120.86Lowest
    SperhicalTrue0.4121.84
    ExponentialFalse0.3118.49
    ExponentialTrue0.5118.01
    GaussianFalse0.2122.52
    GaussianTrue0.4126.97
    SperhicalFalse1.0132.66Highest
    SperhicalTrue1.0125.73
    ExponentialFalse1.0120.55
    ExponentialTrue1.0118.99
    GaussianFalse1.0132.20
    GaussianTrue1.0129.36

    *Maximum Neighbors가 변화함에도 RMSE는 동일.


    KSEEG
    Aug 30, 2024 Vol.57 No.4, pp. 353~471

    Stats or Metrics

    Share this article on

    • kakao talk
    • line

    Related articles in KSEEG

    Economic and Environmental Geology

    pISSN 1225-7281
    eISSN 2288-7962
    qr-code Download