DSpace Repository

Leveraging machine learning for diabetes prediction: Ensemble model.

Show simple item record

dc.contributor.author Otieno Ogutu, McDonald
dc.contributor.author Nzioka Kituku, Benson
dc.contributor.author Karume, Simon M.
dc.date.accessioned 2026-01-13T09:12:28Z
dc.date.available 2026-01-13T09:12:28Z
dc.date.issued 2025-10-07
dc.identifier.issn eISSN:2582-5003
dc.identifier.uri https://doi.org/10.30574/gjeta.2025.25.1.0267
dc.identifier.uri https://repository.cuk.ac.ke/handle/123456789/1866
dc.description A research article published in the Global Journal of Engineering and Technology Advances. en_US
dc.description.abstract Diabetes presents great global health challenge, with delayed diagnosis significantly impeding effective management, particularly in resource-constrained regions. This project aimed to enhance timely and accurate diabetes prediction by developing an advanced ensemble machine learning model. A hybrid dataset, compiled from the PIMA Indian (768 instances) and Hospital Frankfurt Germany (2000 instances) datasets, totaling to 2768 datapoints, was utilized to improve generalizability beyond single-source limitations. The methodology involved comprehensive data preprocessing, including the critical imputation of physiologically impossible zero values and feature standardization. F1-score was selected as the primary performance metric due to its ability to provide a vital balance between precision and recall, which is crucial in a medical context where both false positives and false negatives carry significant consequences. Six single classifier models—Logistic Regression, Decision Tree, K-Nearest Neighbors, Support Vector Machine, Random Forest, and XGBoost—were trained on the data and evaluated after hyperparameter tuning. The F1-scores of these optimized models were: Logistic Regression (0.6328), Decision Tree (0.9843), K-Nearest Neighbors (0.9869), Support Vector Machine (0.9843), Random Forest (0.9947), and XGBoost (0.9974). Based on these results, XGBoost and Random Forest were selected as base learners for a Stacking Classifier ensemble, which utilized a Logistic Regression meta-learner. The developed ensemble model demonstrated exceptional performance, achieving near-perfect ROC-AUC of 0.9999 and an F1-score of 0.9974. This performance not only surpassed results from recent studies but also highlighted the significant potential of machine learning to predict diabetes accurately. The project recommended further development and integration of the ensemble model into a web application. en_US
dc.language.iso en en_US
dc.publisher Global Journal of Engineering and Technology Advances. en_US
dc.subject Machine learning. en_US
dc.subject Support vector machine. en_US
dc.subject Gradient boosting. en_US
dc.subject Random Forest. en_US
dc.subject Decision Tree. en_US
dc.title Leveraging machine learning for diabetes prediction: Ensemble model. en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account