Special Research Paper on “Applications of Data Science and Artificial Intelligence in Economic and Environmental Geology”

Split Viewer

Econ. Environ. Geol. 2024; 57(5): 529-537

Published online October 29, 2024

https://doi.org/10.9719/EEG.2024.57.5.529

© THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY

Application of K-means Clustering Model to XRD Experimental Data in the Korea Plateau

Ju Young Park1,3, Sun Young Park2, Jiyoung Choi2, Sungil Kim2, Yuri Kim2, Bo Yeon Yi2, Kyungbook Lee1,3,*

1Department of Geoenvironmental Sciences, Kongju National University, Gongju-si 32588, Republic of Korea
2Petroleum Energy Research Center, Korea Institute of Geoscience and Mineral Resources, Daejeon 34132, Republic of Korea
3Yellow Sea Institute of Geoenvironmental Sciences, Gongju-si 32588, Republic of Korea

Correspondence to : *kblee@kongju.ac.kr

Received: August 30, 2024; Revised: October 2, 2024; Accepted: October 7, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

Mineral composition used to identify the sedimentary environment can be obtained through X-ray diffraction (XRD) analysis. However, due to time constraints for analyzing a large number of samples, a machine learning-based mineral composition analysis model was developed. This model demonstrated reasonable reliability for samples with usual compositions but showed poor performance for unusual samples. Consequently, a clustering model has recently been developed to classify the unusual samples, allowing experts to handle. The purpose of this study is to examine the applicability of the clustering model, developed using XRD data from the Ulleung Basin in previous study, using samples from different regions. Research data consist of intensity profile from XRD experiment and its mineral composition analysis for a total of 54 sediment samples from the Korea Plateau, located northwest of the Ulleung Basin. Because the intensity of samples in the Korea Plateau comprises 7,420 values (3.005-64.996°), differing from 3,100 values (3.01-64.99°) of samples in the Ulleung Basin, linear interpolation was used to align the input feature. Then, min-max scaler was applied to intensity profile for each sample to preserve the trend and peak ratio of the intensity.
Applying the clustering model to the 54 preprocessed intensity profiles, 35 samples and 19 samples were classified into expert and machine learning groups, respectively. For machine learning group, false positive was zero among the 19 samples. This means that the clustering model can increase reliability in when mineral composition from machine learning model because unusual sample did not belong to the machine learning group. For the 35 samples in expert group, the 31 samples were classified as false negative (FN). It means that although machine learning model can properly analyze these samples, they were assigned to expert group. However, when these FN samples were analyzed using machine learning based composition analysis model, a high mean absolute error of 2.94% was observed. Therefore, it is reasonable that the samples were assigned to expert group.

Keywords Korea Plateau, K-means clustering, machine learning, X-ray diffraction, confusion matrix

한국대지 XRD 실험자료 대상 k-평균 군집화 모델 적용성 분석

박주영1,3 · 박선영2 · 최지영2 · 김성일2 · 김유리2 · 이보연2 · 이경북1,3,*

1국립공주대학교 지질환경과학과
2한국지질자원연구원 석유에너지연구센터
3황해지질환경연구소

요 약

퇴적물 생성환경 규명에 사용되는 광물조성자료는 X-선 회절(X-ray diffraction, XRD)분석을 통해 얻을 수 있으나, 대규모 시료에 대한 조성분석 시 효율적인 분석을 위해 머신러닝 기반 광물조성 분석모델이 개발되었다. 해당 모델은 일반조성 시료에 대해 준수한 분석신뢰도를 보였으나, 특이조성을 가지는 시료에 대해서는 저조한 성능을 보였다. 이에 따라 최근 전체 시료 중 특이조성시료를 전문가가 분석할 수 있도록 분류하는 군집화모델이 개발되었다. 본 연구에서는 울릉분지 XRD 시료로 개발한 군집화모델의 타 지역 시료에 대한 적용가능성을 검토하고자 한다. 연구자료는 울릉분지 북서쪽에 위치한 한국대지의 54개 퇴적물 시료에 대한 XRD 실험 및 전문가 광물조성 분석결과로 구성된다. 한국대지 시료의 intensity는 7,420개(3.005-64.996°)로, 울릉분지 3,100개(3.01-64.99°)와 차이를 보여 선형보간을 활용해 일치시켰다. 이후 intensity 비율과 경향성을 보존하기 위해 시료별 최소-최대 정규화를 수행하였다.
전처리한 실험자료에 군집화모델을 적용한 결과, 54개 시료 중 전문가분석은 35개, 머신러닝분석은 19개로 배정되었다. 머신러닝분석으로 판단된 19개 시료 중 false positive(FP)는 0으로, 머신러닝분석 군집에 특이조성시료가 존재하지 않음을 확인하였다. FP는 실제 특이조성을 가져 전문가분석이 필요하지만 머신러닝이 분석하는 것으로 판단된 것을 의미하기 때문에 FP가 적을수록 머신러닝 모델 적용 시 높은 분석신뢰도를 기대할 수 있다. 전문가분석의 경우 35개 중 31개 시료가 false negative로 배정되었으며, 이는 머신러닝이 분석해도 무방하나 전문가가 분석해야할 시료 수가 전체의 57%임을 의미한다. 그러나 해당 시료들을 머신러닝기반 조성분석모델로 분석할 경우 2.94%의 높은 평균절대오차의 평균을 보이기 때문에 전문가분석 군집으로 배정된 것을 합리적으로 평가할 수 있다.

주요어 한국대지, k-평균 군집화, 머신러닝, X-선 회절(XRD), 혼동행렬

Article

Special Research Paper on “Applications of Data Science and Artificial Intelligence in Economic and Environmental Geology”

Econ. Environ. Geol. 2024; 57(5): 529-537

Published online October 29, 2024 https://doi.org/10.9719/EEG.2024.57.5.529

Copyright © THE KOREAN SOCIETY OF ECONOMIC AND ENVIRONMENTAL GEOLOGY.

Application of K-means Clustering Model to XRD Experimental Data in the Korea Plateau

Ju Young Park1,3, Sun Young Park2, Jiyoung Choi2, Sungil Kim2, Yuri Kim2, Bo Yeon Yi2, Kyungbook Lee1,3,*

1Department of Geoenvironmental Sciences, Kongju National University, Gongju-si 32588, Republic of Korea
2Petroleum Energy Research Center, Korea Institute of Geoscience and Mineral Resources, Daejeon 34132, Republic of Korea
3Yellow Sea Institute of Geoenvironmental Sciences, Gongju-si 32588, Republic of Korea

Correspondence to:*kblee@kongju.ac.kr

Received: August 30, 2024; Revised: October 2, 2024; Accepted: October 7, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided original work is properly cited.

Abstract

Mineral composition used to identify the sedimentary environment can be obtained through X-ray diffraction (XRD) analysis. However, due to time constraints for analyzing a large number of samples, a machine learning-based mineral composition analysis model was developed. This model demonstrated reasonable reliability for samples with usual compositions but showed poor performance for unusual samples. Consequently, a clustering model has recently been developed to classify the unusual samples, allowing experts to handle. The purpose of this study is to examine the applicability of the clustering model, developed using XRD data from the Ulleung Basin in previous study, using samples from different regions. Research data consist of intensity profile from XRD experiment and its mineral composition analysis for a total of 54 sediment samples from the Korea Plateau, located northwest of the Ulleung Basin. Because the intensity of samples in the Korea Plateau comprises 7,420 values (3.005-64.996°), differing from 3,100 values (3.01-64.99°) of samples in the Ulleung Basin, linear interpolation was used to align the input feature. Then, min-max scaler was applied to intensity profile for each sample to preserve the trend and peak ratio of the intensity.
Applying the clustering model to the 54 preprocessed intensity profiles, 35 samples and 19 samples were classified into expert and machine learning groups, respectively. For machine learning group, false positive was zero among the 19 samples. This means that the clustering model can increase reliability in when mineral composition from machine learning model because unusual sample did not belong to the machine learning group. For the 35 samples in expert group, the 31 samples were classified as false negative (FN). It means that although machine learning model can properly analyze these samples, they were assigned to expert group. However, when these FN samples were analyzed using machine learning based composition analysis model, a high mean absolute error of 2.94% was observed. Therefore, it is reasonable that the samples were assigned to expert group.

Keywords Korea Plateau, K-means clustering, machine learning, X-ray diffraction, confusion matrix

한국대지 XRD 실험자료 대상 k-평균 군집화 모델 적용성 분석

박주영1,3 · 박선영2 · 최지영2 · 김성일2 · 김유리2 · 이보연2 · 이경북1,3,*

1국립공주대학교 지질환경과학과
2한국지질자원연구원 석유에너지연구센터
3황해지질환경연구소

Received: August 30, 2024; Revised: October 2, 2024; Accepted: October 7, 2024

요 약

퇴적물 생성환경 규명에 사용되는 광물조성자료는 X-선 회절(X-ray diffraction, XRD)분석을 통해 얻을 수 있으나, 대규모 시료에 대한 조성분석 시 효율적인 분석을 위해 머신러닝 기반 광물조성 분석모델이 개발되었다. 해당 모델은 일반조성 시료에 대해 준수한 분석신뢰도를 보였으나, 특이조성을 가지는 시료에 대해서는 저조한 성능을 보였다. 이에 따라 최근 전체 시료 중 특이조성시료를 전문가가 분석할 수 있도록 분류하는 군집화모델이 개발되었다. 본 연구에서는 울릉분지 XRD 시료로 개발한 군집화모델의 타 지역 시료에 대한 적용가능성을 검토하고자 한다. 연구자료는 울릉분지 북서쪽에 위치한 한국대지의 54개 퇴적물 시료에 대한 XRD 실험 및 전문가 광물조성 분석결과로 구성된다. 한국대지 시료의 intensity는 7,420개(3.005-64.996°)로, 울릉분지 3,100개(3.01-64.99°)와 차이를 보여 선형보간을 활용해 일치시켰다. 이후 intensity 비율과 경향성을 보존하기 위해 시료별 최소-최대 정규화를 수행하였다.
전처리한 실험자료에 군집화모델을 적용한 결과, 54개 시료 중 전문가분석은 35개, 머신러닝분석은 19개로 배정되었다. 머신러닝분석으로 판단된 19개 시료 중 false positive(FP)는 0으로, 머신러닝분석 군집에 특이조성시료가 존재하지 않음을 확인하였다. FP는 실제 특이조성을 가져 전문가분석이 필요하지만 머신러닝이 분석하는 것으로 판단된 것을 의미하기 때문에 FP가 적을수록 머신러닝 모델 적용 시 높은 분석신뢰도를 기대할 수 있다. 전문가분석의 경우 35개 중 31개 시료가 false negative로 배정되었으며, 이는 머신러닝이 분석해도 무방하나 전문가가 분석해야할 시료 수가 전체의 57%임을 의미한다. 그러나 해당 시료들을 머신러닝기반 조성분석모델로 분석할 경우 2.94%의 높은 평균절대오차의 평균을 보이기 때문에 전문가분석 군집으로 배정된 것을 합리적으로 평가할 수 있다.

주요어 한국대지, k-평균 군집화, 머신러닝, X-선 회절(XRD), 혼동행렬

    Fig 1.

    Figure 1.Location of wellbores from the Korea Plateau (modified from Yoon et al., 2003).
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 2.

    Figure 2.XRD intensity profile of #2 in the Ulleung Basin and #2 in the Korea Plateau.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 3.

    Figure 3.Box-plots for the 11 minerals in the Ulleung Basin and the Korea Plateau.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 4.

    Figure 4.Structure of the developed clustering model.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 5.

    Figure 5.Confusion matrix diagram in this study. Positive indicates usual mineral composition (modified from Park et al., 2024).
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 6.

    Figure 6.Confusion matrix for the 54 data in the Korea Plateau.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 7.

    Figure 7.Intensity profiles of #53 and #41 in the Korea Plateau: (a) raw intensity profile, (b) its enlarged profile below 1,000 cps, and (c) normalized intensity profile by min-max normalization.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Fig 8.

    Figure 8.Visualization for clustering results for the 54 data using PCA.
    Economic and Environmental Geology 2024; 57: 529-537https://doi.org/10.9719/EEG.2024.57.5.529

    Table 1 . The number of data for 8 boreholes in the Korea Plateau.

    Core21GHP-P01A21GHP-P02A21GHP-H03A21GHP-H04A21GHP-P05A21GHP-P06A21GHP-P07A21GHP-H09B
    Number of samples84778875

    Table 2 . Comparison of statistical factors for each mineral from the 54 data in the Korea Plateau.

    FactorsAlbiteCalciteChloriteDolomiteGypsumHaliteHornblende
    Avg.9.903.612.701.181.052.391.94
    Std.3.492.380.480.240.051.180.41
    Min5.500.201.800.801.000.501.00
    Max25.3010.503.801.401.104.602.50
    FactorsKaoliniteMicroclineMuscovite+illiteOpal-AOrthoclasePyriteQuartz
    Avg.1.569.0022.0036.012.392.0117.68
    Std.0.383.434.529.521.410.585.03
    Min0.804.604.0211.500.100.8011.90
    Max3.0014.4036.2048.307.703.9038.30

    Table 3 . Comparison of data shape in the Ulleung Basin and the Korea Plateau.

    LocationAngle rangeIntervalsNumber of intensityNumber of mineral types
    Ulleung Basin3.01–64.990.02310012
    Korea Plateau3.005–64.9960.008742014

    Table 4 . Average mineral compositions for the samples in each cluster.

    Quartz (4.05–30.05)Albite (2.90–14.10)Calcite (0.00–14.38)Number of samples
    Cluster 120.0911.402.8535
    Cluster 2---0
    Cluster 313.637.390.2818
    Cluster 412.136.120.001
    Cluster 5---0

    KSEEG
    Oct 29, 2024 Vol.57 No.5, pp. 473~664

    Stats or Metrics

    Share this article on

    • kakao talk
    • line

    Related articles in KSEEG

    Economic and Environmental Geology

    pISSN 1225-7281
    eISSN 2288-7962
    qr-code Download