Research Paper

Split Viewer

Econ. Environ. Geol. 2024; 57(3): 329-342

Published online June 30, 2024

https://doi.org/10.9719/EEG.2024.57.3.329

© THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY

A Grey Wolf Optimized- Stacked Ensemble Approach for Nitrate Contamination Prediction in Cauvery Delta

Kalaivanan K, Vellingiri J*

School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore-632014, India

Correspondence to : *vellingiri.j@vit.ac.in

Received: January 8, 2024; Revised: March 21, 2024; Accepted: May 24, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

The exponential increase in nitrate pollution of river water poses an immediate threat to public health and the environment. This contamination is primarily due to various human activities, which include the overuse of nitrogenous fertilizers in agriculture and the discharge of nitrate-rich industrial effluents into rivers. As a result, the accurate prediction and identification of contaminated areas has become a crucial and challenging task for researchers. To solve these problems, this work leads to the prediction of nitrate contamination using machine learning approaches. This paper presents a novel approach known as Grey Wolf Optimizer (GWO) based on the Stacked Ensemble approach for predicting nitrate pollution in the Cauvery Delta region of Tamilnadu, India. The proposed method is evaluated using a Cauvery River dataset from the Tamilnadu Pollution Control Board. The proposed method shows excellent performance, achieving an accuracy of 93.31%, a precision of 93%, a sensitivity of 97.53%, a specificity of 94.28%, an F1-score of 95.23%, and an ROC score of 95%. These impressive results underline the demonstration of the proposed method in accurately predicting nitrate pollution in river water and ultimately help to make informed decisions to tackle these critical environmental problems.

Keywords nitrate prediction, machine learning, stacked ensemble, decision tree, random forest

Article

Research Paper

Econ. Environ. Geol. 2024; 57(3): 329-342

Published online June 30, 2024 https://doi.org/10.9719/EEG.2024.57.3.329

Copyright © THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY.

A Grey Wolf Optimized- Stacked Ensemble Approach for Nitrate Contamination Prediction in Cauvery Delta

Kalaivanan K, Vellingiri J*

School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore-632014, India

Correspondence to:*vellingiri.j@vit.ac.in

Received: January 8, 2024; Revised: March 21, 2024; Accepted: May 24, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

The exponential increase in nitrate pollution of river water poses an immediate threat to public health and the environment. This contamination is primarily due to various human activities, which include the overuse of nitrogenous fertilizers in agriculture and the discharge of nitrate-rich industrial effluents into rivers. As a result, the accurate prediction and identification of contaminated areas has become a crucial and challenging task for researchers. To solve these problems, this work leads to the prediction of nitrate contamination using machine learning approaches. This paper presents a novel approach known as Grey Wolf Optimizer (GWO) based on the Stacked Ensemble approach for predicting nitrate pollution in the Cauvery Delta region of Tamilnadu, India. The proposed method is evaluated using a Cauvery River dataset from the Tamilnadu Pollution Control Board. The proposed method shows excellent performance, achieving an accuracy of 93.31%, a precision of 93%, a sensitivity of 97.53%, a specificity of 94.28%, an F1-score of 95.23%, and an ROC score of 95%. These impressive results underline the demonstration of the proposed method in accurately predicting nitrate pollution in river water and ultimately help to make informed decisions to tackle these critical environmental problems.

Keywords nitrate prediction, machine learning, stacked ensemble, decision tree, random forest

    Fig 1.

    Figure 1.System architecture of the proposed work.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 2.

    Figure 2.Hierarchy of wolves.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 3.

    Figure 3.Flowchart for working principle of GWO.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 4.

    Figure 4.Update position in GWO during the Hunting Process.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 5.

    Figure 5.The stacked model with meta learner = Multiple-layer Perception and the weak learners = Decision tree, Random Forest, and K-nearest neighbor.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 6.

    Figure 6.Comparison graph of the proposed approach with different base classifier on the Cauvery dataset.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Fig 7.

    Figure 7.ROC curve of the Stacked Ensemble in comparison to the basic classifier on the Cauvery River dataset.
    Economic and Environmental Geology 2024; 57: 329-342https://doi.org/10.9719/EEG.2024.57.3.329

    Table 1 . Literature Survey.

    PaperMachine Learning ModePerformance MetricsKey FindingsLimitation
    Wagh et al. (2017)ANNR2= 0.75ANN model outperformed other methods in predicting nitrate concentrations in the Kadava River catchmentSmall dataset size
    Rodriguez-Galiano et al. (2018)CART, RF, SVMAUC = 0.92RF-SSFS method outperformed others in nitrate-related groundwater contaminationLimited to a specific area
    Benzer et al. (2018)ANNAccuracy = 96ANN model effectively predicted nitrate concentrations in surface waters in a river basin in China.The application of the model has not tested in the other regions.
    Rahmati et al. (2019)KNN, RF, SVMR2 = 0.72, RMSE = 10.41RF model outperformed traditional regression models in estimating nitrate concentration in streams in Iran.The model depends on the assessing seasonal and interannual fluctuations of the nitrate concentrations.
    Knoll et al. (2019)GBR, CART, MLR, RFR2= 0.75GBR could more accurately estimate nitrate levelsThe System did not enhance accuracy
    Jafari et al. (2019)ANFIS, SVM, MLP, GEPRMSE = 58.93, R = 0.998GEP model provided accurate TDS prediction in Tabriz plain aquiferMachine learning techniques did not reduce time complexity
    Band et al. (2020)BANN, Cubist, RF, SVMR2 = 0.89,RF model outperformed others in Marvdasht watershed, IranLimited to a specific region
    Bedi et al. (2020)ANN, XGB, SVMRMSE = 3.91XGB model excelled in predicting nitrate and pesticide contaminationScarcity of labeled data for training advanced models.
    Hà et al. (2020)RFR2 = 0.92RF model performed better in estimating nitrate and phosphorus concentrations in Tri An reservoirFocused solely on the Tri An reservoir.
    Latif et al. (2020)ANNAccuracy score = 0.94ANN model was superior in forecasting nitrate levels in Feitsui reservoir, TaiwanLimited to five input Parameter
    Stamenković et al. (2020)ANN, MLRMAE = 0.53ANN models showed good predictive ability for nitrates in river waterLimited significant deviations in parameters
    Alizamir et al. (2021)Hybrid Bat-ELMR2= 0.89Hybrid model effectively predicted daily chlorophyll-a concentration in rivers.Key factors influencing chlorophyll-a concentration may vary by ecosystem
    Pham et al. (2021)ANN, ANFIS, GMDHMAE = 0.0120.0219, NSE = 0.96ANFIS method excelled in estimating Water Quality Index in surface wetlandsDeep neural networks lacked effective incorporation of prior knowledge
    Lu et al. (2022)GBRT, LSTM, RFRMSE = 0.11LSTM model performed best in predicting total phosphorus and nitrogen concentrations in Taihu LakeFocused on monthly data, limited temporal scope
    Ottong et al. (2022)LR, SVM, RF, GBMAccuracy = 87%, Precision = 100%, Sensitivity = 95.2%, Specificity = 100%GBM model effectively forecasted arsenic contamination risk in the Red River DeltaLimited data points for model training
    Hu et al. (2023)XGBR2= 0.91XGB model effectively predicted nitrogen and phosphorus concentrations in Taihu lakes.-
    Sulaiman et al. (2023)KNN, SVM, DT, NB, RF, GB, XGBAccuracy: 92.8%RF-PCA hybrid method outperformed other models in predicting nitrate concentrations for hydroponic plants.Limited Input Size
    Liang et al. (2024)GBR2= 0.627, MAE: 0.529, RMSE: 0.705Developed models to predict nitrogen levels in Chongqing city using various predictorsTested in a small area
    Mehdaoui et al. (2024)RBF-NNAccuracy = 0.957Introduced MLR and RBF-NN models to forecast nitrate levels in the Cheliff basinThis model specifically suitable for this location

    Table 2 . Attribute of water quality dataset.

    VariableDescriptionBureau of Indian Standard
    NO3Nitrate10
    PhPotential of Hydrogen6.5-8.5
    ClChloride250
    BODBiological oxygen demandNot mentioned
    DODissolved OxygenNot mentioned
    FCFecal coliforms0.2
    TCTotal coliformsNot mentioned
    TuTurbidityNot mentioned
    PaPhenolphthalein AlkalinityNot mentioned
    TalTotal Alkalinity200
    ECElectrical conductivityNot mentioned
    NNitrogen4
    CODChemical Oxygen DemandNot mentioned
    NH3Ammonia50
    CaCalcium75
    ThTotal hardness300
    KPotassium0.4
    MgMagnesium30
    S04Sulphate200
    NaSodium4
    TDSTotal Dissolved Solids500
    PO4PhosphateNot mentioned
    TFSTotal Fixed Solids500
    BrBoron0.3
    TSSTotal Suspended Solids500
    FFluoride1

    Table 3 . Confusion matrix result for test data.

    S.NOClassifierTPFPFNTNAccuracy
    1GWO-Stacked Ensemble (Proposed)7646750.93
    2RF7367760.90
    3DT7189730.88
    4MLP721010690.87
    5KNN701011690.86

    Table 4 . Comparison of the proposed model with the base classifiers.

    ClassifierAccuracyPrecisionSensitivitySpecificityF1-ScoreMCC
    GWO-Stacked Ensemble (Proposed)0.930.950.920.920.930.85
    RF0.900.920.910.900.910.83
    DT0.880.890.880.890.890.78
    MLP0.870.870.860.850.870.74
    KNN0.860.870.870.860.860.72

    Table 5 . Comparison of the performance of the proposed methods against the basic classifier using data splitting validation.

    MetricsData Split RatioClassification Method
    GWO-Stacked (Proposed)RFDTMLPKNN
    Accuracy60-4091.58886.585.383.55
    70-3093.2189.788887.6585.52
    80-2092.9889.1687.0587.2384.89
    90-1091.9488.9486.8385.3282.94
    Precision60-4091.5685.0984.7884.3581.36
    70-309387.2686.7286.7283.88
    80-2092.1386.5885.0985.3981.36
    90-1091.0485.9884.2183.9680
    Sensitivity60-4095.1391.8991.2789.2887.18
    70-3097.5393.339391.8689.10
    80-2096.4191.5691.0489.7488.72
    90-1095.2390.8690.1288.2187.23
    Specificity60-4086.9482.9281.8981.1178.27
    70-3088.5684.2383.9883.1580
    80-2087.8883.5582.4582.1878.89
    90-1086.5282.1081.8981.2377.25
    F1-Score60-4092.4188.7286.9485.3685.10
    70-3094.289089.5688.7886.23
    80-2093.8888.7287.9087.4585.63
    90-1092.5886.2885.9285.1184.23
    ROC60-4093.8892.1290.2589.4188.23
    70-3095.2394.1592.109190.05
    80-2094.7393.7891.6589.4188.14
    90-1093.1292.8992.1088.1187.23
    MCC60-4081.5677.8976.2375.9672.18
    70-308379.5278.3675.1274.89
    80-2082.1678.2377.6573.9673.98
    90-1081.2377.2376.6372.1371.08

    Table 6 . Shows a comparison of performance between the Proposed method and existing research.

    AuthorModelAccuracy (%)
    Proposed ModelGWO-Stacked Ensemble93
    Latif et al. (2020)ANN93
    Sulaiman et al. (2023)KNN, SVM, DT, NB, RF, GB, XGB92.8
    Bhattarai et al. (2021)KNN, NB, RF, GB, SVM92.8
    Alizamir et al. (2021)Hybrid Bat-ELM89
    Knoll et al. (2019)GBR, CART, MLR, RF75

    KSEEG
    Aug 30, 2024 Vol.57 No.4, pp. 353~471

    Stats or Metrics

    Share this article on

    • kakao talk
    • line

    Related articles in KSEEG

    Economic and Environmental Geology

    pISSN 1225-7281
    eISSN 2288-7962
    qr-code Download