Comparative Study of Supervised and Unsupervised Machine Learning Algorithms in CKD Diagnosis

Sana Syed

PDF

Published: Oct 3, 2025

Sana Syed, K. Ravindranath

Abstract

Globally, chronic illnesses place a heavy load on healthcare systems. Preventive actions and better patient outcomes are made possible by early prediction. Supervised and unsupervised methods are essential for this endeavour, and machine learning provides strong tools. This work investigates the application of supervised and unsupervised algorithms for the prediction of chronic diseases. The correlations between patient features and the existence of the disease will be directly learned through supervised techniques that have been trained using labelled data. To find the most accurate predictors, we will assess various techniques, including Support Vector Machines (SVM), Random Forests, and Logistic Regression.

On the other side, unsupervised methods will be employed to find hidden patterns in unlabelled data. Principal Component Analysis (PCA) and clustering algorithms are two techniques that can be used to identify underlying patient categories that have different illness risks. This can offer insightful information for more research and focused actions. The effectiveness of supervised and unsupervised methods in forecasting the onset of chronic illness will be compared in this study. We will evaluate the benefits and drawbacks of each approach, taking accuracy, interpretability, and data availability into account. The results will aid in the creation of reliable and insightful prediction models for the management of chronic illnesses.

In this paper, we present a comprehensive machine-learning approach for predicting chronic kidney disease (CKD) using a combination of supervised and unsupervised learning techniques. Our dataset, sourced from Kaggle, includes various medical attributes such as age, blood pressure, specific gravity, and other diagnostic features. We preprocess the data to handle missing values and encode categorical variables, followed by normalization for consistency. We implement and compare three supervised learning algorithms: Random Forest, Support Vector Machine (SVM), and Gradient Boosting, alongside three unsupervised learning algorithms: K-Means Clustering, Hierarchical Clustering, and DBSCAN. Our results demonstrate that the supervised models achieve high accuracy in predicting CKD, while the clustering analyses provide valuable insights into patient groupings and potential risk factors. By combining these methods, we enhance the predictive power and interpretability of the models, contributing to more effective disease management and prevention strategies

Issue

Vol. 46 No. 04 (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details