Downloads: 0
India | Computer Science | Volume 14 Issue 5, May 2026 | Pages: 76 - 82
Feature-Optimized and Explainable Machine Learning Framework for Early Diabetes Prediction Using Hybrid Clinical and Lifestyle Data
Abstract: However, despite its increasing significance, diagnosis of this disease often happens at a relatively advanced stage and requires effective and timely treatment of associated problems. Therefore, the present study deals with the design of a machine learning-based system for prediction of the disease before any complications take place and involves analysis of patients' clinical and behavioral data. For the purposes of analysis, a database with patient information concerning such parameters as age, gender, BMI, hypertension, heart disease, HbA1c and blood glucose level will be utilized. As a part of preprocessing, missing values will be replaced, categorical features will be converted into numerical and additional actions will be taken in order to improve the quality of collected data. Furthermore, in order to eliminate redundant features, the selection process will be performed prior to application of the models under study. In the course of experimentation, Logistic Regression, Random Forest and XGBoost algorithms will be analyzed. Based on the obtained results, the following accuracy scores will be achieved: 86% for Logistic Regression, 89% for Random Forest and 92% for XGBoost. From the perspective of the experiment outcomes, it can be concluded that ensemble approaches significantly outperform standard techniques thanks to better patterns capturing capabilities. To improve the interpretability of the predictions, methods like SHAP, which belong to the field of explainable AI, have been used as well. With the help of SHAP, it becomes possible to determine the role played by each variable in making the predictions. It has been found that blood glucose level, HbA1c level, body mass index (BMI), and age play the most important roles in predicting diabetes. On the whole, the suggested methodology has successfully achieved a balance between predictive power and interpretability of the model.
Keywords: Machine Learning, Feature Selection, Predictive Modelling, Health Informatics, Classification Algorithm, Data Preprocessing, Clinical Decision Support Systems, Risk Assessment