An Efficient Hybrid Data Mining Model for Prognostication of an Imbalanced Data Set of Liver Disorder: A K-Prototype Naïve Bayes Approach

  • Divya Research Scholar, Department of Statistics, Institute of Social Sciences, Dr Bhimrao Ambedkar University, Agra, India.
  • Vineeta Singh Professor, Department of Statistics, Institute of Social Sciences, Dr Bhimrao Ambedkar University, Agra, India.
  • Ravins Dohare Professor, Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, Jamia Nagar, New Delhi, India.
  • Manoj Kumar Assistant Professor, Centre for Economic Studies and Planning, Jawaharlal Nehru University, India; Postdoctoral Associate, McGowan Institute for Regenerative Medicine (MIRM), Department of Surgery, University of Pittsburgh, Pittsburgh, PA, USA.
Keywords: Data Mining, PCA, K-Prototype Clustering, Naïve Bayes, Imbalanced Data, K-PNB

Abstract

Background: Liver disorders have recently become the deadliest disorder in many countries, with the number of patients increasing as a result of alcohol consumption, exposure to toxic gases, and ingestion of tainted foods and drugs. Data mining is the most effective approach for detecting the disease early on.
Objective: This study aimed to predict and diagnose early-stage liver disorders.
Method: In this study, we used the Indian liver patient dataset from the UCI machine learning repository. This dataset contains the sex imbalance for which we applied both oversampling and undersampling strategies; we used principal component analysis (PCA) for feature selection. In this research, we built eight models from 4 experiments in RStudio with the required packages. These models are compared based on the performance factors, which include accuracy, sensitivity, specificity, and error rate. We constructed the Naïve Bayes model and a new innovative hybrid model combining k-prototype clustering and the Naïve Bayes classifier (K-PNB).
Results: The hybrid model gave a classification accuracy of 94%, a sensitivity of 99%, a specificity of 90% and a low error rate of 0.05%.
Conclusion: The findings showed that the proposed hybrid model (the K-PNB) outperformed the other models, which detect and diagnose liver disease in the early stages in very little time.

How to cite this article:
Divya, Singh V, Dohare R, Kumar M. An Efficient Hybrid Data Mining Model for Prognostication of an Imbalanced Data Set of Liver Disorder: A K-Prototype Naïve Bayes Approach. Chettinad Health City Med J. 2024;13(4):21-33.

DOI: https://doi.org/10.24321/2278.2044.202456

References

Asrani SK, Devarbhavi H, Eaton J, Kamath PS. The burden of liver diseases in the world. J Hepatol. 2019;70(1):151-71. [PubMed] [Google Scholar]

Roerecke M, Vafaei A, Hasan OS, Chrystoja BR, Cruz M, Lee R, Neuman MG, Rehm J. Alcohol consumption and risk of liver cirrhosis: a systematic review and metaanalysis. Am J Gastroenterol. 2019;114(10):1574-86. [PubMed] [Google Scholar]

Mondal D, Das K, Chowdhury A. Epidemiology of liver diseases in India. Clin Liver Dis (Hoboken). 2022;19(3):114-7. [PubMed] [Google Scholar]

Wu XN, Xue F, Zhang N, Zhang W, Hou JJ, Lv Y, Xiang JX, Zhang XF. Global burden of liver cirrhosis and other chronic liver diseases caused by specific etiologies from 1990 to 2019. BMC Public Health. 2024;24(1):363. [PubMed] [Google Scholar]

Goyanka R, Yadav J, Kumar M, Sagar SK. Utilisation and out-of-pocket expenditure for AYUSH outpatient care among older adults in India. Chettinad Health City Med J. 2023;12(1):54-64. [Google Scholar]

Ramana BV, Babu MS, Venkateswarlu NB. A critical comparative study of liver patients from USA and India: an exploratory analysis. Int J Comput Sci Iss. 2012;9(3):506-16. [Google Scholar]

Saxena P. Evolving efficient clustering patterns in liver patient data through data mining techniques. Int J Comput Appl. 2013;66(16):23-8. [Google Scholar]

Vijayarani S, Dhayanand S. Liver disease prediction using SVM and Naïve Bayes Algorithms. Int J Sci Engi Techno Res. 2015;4(4):816-20. [Google Scholar]

Roy S, Singh A, Shadev SK. Machine learning method for classification of liver disorders. Far East J Electron Commun. 2016;16(4):789-800.

Baitharu TR, Pani SK. Analysis of data mining techniques for healthcare decision support system using liver disorder dataset. Procedia Comput Sci. 2016;85:862-70. [Google Scholar]

Kuppan P, Manoharan N. A tentative analysis of liver disorder using data mining algorithms J48, decision table and Naive Bayes. Int J Comput Algor. 2017;6(1):37-40. [Google Scholar]

Priya MB, Juliet PL, Tamilselvi PR. Performance analysis of liver disease prediction using machine learning algorithms. Int Res J Eng Technol. 2018;5(1):206-11. [Google Scholar]

Durai V, Ramesh S, Kalthireddy D. Liver disease prediction using machine learning. Int J Adv Res Ideas Innov Technol. 2019;5(2):1584-8. [Google Scholar]

Razali N, Mustapha A, Wahab MH, Mostafa SA, Rostam SK. A data mining approach to prediction of liver diseases. J Phys Conf Ser. 2020;1529. [Google Scholar]

Yajurved J, Prasad PS, Umamaheswari KM. Analysis of chronic disease (liver) prediction using machine learning. J Posit School Psychol. 2022;6(4):5489-96. [Google Scholar]

Baiju BV, Kirubanantham P, Saranya S, Kumaresan A, Prakash G. Liver disease diagnosis and prediction by hybrid data mining approach. AIP Conf Proc. 2023;2523:020045. [Google Scholar]

Huang Z. Extensions to the k-means algorithm for clustering extensive data sets with categorical values. Data Min Knowl Discov. 1998;2:283-304. [Google Scholar]

Aschenbruck R, Szepannek G. Cluster validation for mixed-type data. Arch Data Sci. 2020;6(1):1-12. [Google Scholar]

Published
2024-12-31