Research Paper

Split Viewer

Econ. Environ. Geol. 2022; 55(2): 127-135

Published online April 30, 2022

https://doi.org/10.9719/EEG.2022.55.2.127

© THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY

Predicting As Contamination Risk in Red River Delta using Machine Learning Algorithms

Zheina J. Ottong1, Reta L. Puspasari1, Daeung Yoon2, Kyoung-Woong Kim1,*

1School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, South Korea
2Chonnam National University, Gwangju 61186, South Korea

Correspondence to : *Corresponding author : kwkim@gist.ac.kr

Received: March 3, 2022; Revised: March 31, 2022; Accepted: April 3, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

Excessive presence of As level in groundwater is a major health problem worldwide. In the Red River Delta in Vietnam, several million residents possess a high risk of chronic As poisoning. The As releases into groundwater caused by natural process through microbially-driven reductive dissolution of Fe (III) oxides. It has been extracted by Red River residents using private tube wells for drinking and daily purposes because of their unawareness of the contamination. This long-term consumption of As-contaminated groundwater could lead to various health problems. Therefore, a predictive model would be useful to expose contamination risks of the wells in the Red River Delta Vietnam area. This study used four machine learning algorithms to predict the As probability of study sites in Red River Delta, Vietnam. The GBM was the best performing model with the accuracy, precision, sensitivity, and specificity of 98.7%, 100%, 95.2%, and 100%, respectively. In addition, it resulted the highest AUC of 92% and 96% for the PRC and ROC curves, with Eh and Fe as the most important variables. The partial dependence plot of As concentration on the model parameters showed that the probability of high level of As is related to the low number of wells’ depth, Eh, and SO4, along with high PO43− and NH4+. This condition triggers the reductive dissolution of iron phases, thus releasing As into groundwater.

Keywords groundwater arsenic, machine learning, predictive model, random forest, gradient boosting

Article

Research Paper

Econ. Environ. Geol. 2022; 55(2): 127-135

Published online April 30, 2022 https://doi.org/10.9719/EEG.2022.55.2.127

Copyright © THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY.

Predicting As Contamination Risk in Red River Delta using Machine Learning Algorithms

Zheina J. Ottong1, Reta L. Puspasari1, Daeung Yoon2, Kyoung-Woong Kim1,*

1School of Earth Sciences and Environmental Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, South Korea
2Chonnam National University, Gwangju 61186, South Korea

Correspondence to:*Corresponding author : kwkim@gist.ac.kr

Received: March 3, 2022; Revised: March 31, 2022; Accepted: April 3, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

Excessive presence of As level in groundwater is a major health problem worldwide. In the Red River Delta in Vietnam, several million residents possess a high risk of chronic As poisoning. The As releases into groundwater caused by natural process through microbially-driven reductive dissolution of Fe (III) oxides. It has been extracted by Red River residents using private tube wells for drinking and daily purposes because of their unawareness of the contamination. This long-term consumption of As-contaminated groundwater could lead to various health problems. Therefore, a predictive model would be useful to expose contamination risks of the wells in the Red River Delta Vietnam area. This study used four machine learning algorithms to predict the As probability of study sites in Red River Delta, Vietnam. The GBM was the best performing model with the accuracy, precision, sensitivity, and specificity of 98.7%, 100%, 95.2%, and 100%, respectively. In addition, it resulted the highest AUC of 92% and 96% for the PRC and ROC curves, with Eh and Fe as the most important variables. The partial dependence plot of As concentration on the model parameters showed that the probability of high level of As is related to the low number of wells’ depth, Eh, and SO4, along with high PO43− and NH4+. This condition triggers the reductive dissolution of iron phases, thus releasing As into groundwater.

Keywords groundwater arsenic, machine learning, predictive model, random forest, gradient boosting

    Fig 1.

    Figure 1.Map of the Red River Delta (adapted from Winkel et al., 2011).
    Economic and Environmental Geology 2022; 55: 127-135https://doi.org/10.9719/EEG.2022.55.2.127

    Fig 2.

    Figure 2.Flowchart of machine learning analysis model.
    Economic and Environmental Geology 2022; 55: 127-135https://doi.org/10.9719/EEG.2022.55.2.127

    Fig 3.

    Figure 3.(a) Precision-recall curve and (b) Receiver Operating Characteristic (ROC) curve for all models.
    Economic and Environmental Geology 2022; 55: 127-135https://doi.org/10.9719/EEG.2022.55.2.127

    Fig 4.

    Figure 4.Variable importance projection of (a) RF and (b) GBM model.
    Economic and Environmental Geology 2022; 55: 127-135https://doi.org/10.9719/EEG.2022.55.2.127

    Fig 5.

    Figure 5.PDP of As concentration with (a) depth, (b) Eh, (c) SO4, (d) PO3− and (e) DOC for the RF model.
    Economic and Environmental Geology 2022; 55: 127-135https://doi.org/10.9719/EEG.2022.55.2.127

    Table 1 . Selection of the best hyperparameter from the confusion matrix to predict the test set..

    TNFPFNTP
    LR11021131
    RF1111834
    GBM1120240
    XGB11111032

    TN: True Negative, FP: False Positive, FN: False Negative, TP: True Positive.


    Table 2 . Accuracy, precision, sensitivity, and specificity of GBM and RF in percent (%).

    AccuracyPrecisionSensitivitySpecificity
    GBM98.710095.2100
    RF94.197.180.999.1

    KSEEG
    Apr 30, 2024 Vol.57 No.2, pp. 107~280

    Stats or Metrics

    Share this article on

    • kakao talk
    • line

    Related articles in KSEEG

    Economic and Environmental Geology

    pISSN 1225-7281
    eISSN 2288-7962
    qr-code Download