DSpace Repository

Leveraging machine learning for diabetes prediction:Ensemble model

Show simple item record

dc.contributor.author Otieno Ogutu, McDonald
dc.date.accessioned 2026-07-02T09:54:24Z
dc.date.available 2026-07-02T09:54:24Z
dc.date.issued 2025
dc.identifier.uri https://repository.cuk.ac.ke/handle/123456789/1964
dc.description A research project submitted to the Department of Computer Science and Information Technology in the School of Computing and Mathematics in partial fulfillment of the requirements for the award of the degree of master of Science in Data Science of the Cooperative University of Kenya en_US
dc.description.abstract Diabetes presents a great global health challenge, with delayed diagnosis significantly impeding effective management, particularly in resource-constrained regions. The critical shortage of medical professionals in regions like Kenya with a doctor-to-population ratio far below the WHO standard severely hampers timely screening and diagnosis diabetes. This deficit necessitates innovative, scalable tools, such as machine learning models, to assist in early prediction and intervention.This project research aimed to enhance timely and accurate diabetes prediction by developing an advanced ensemble machine learning model. Ahybrid dataset, compiled from the PIMA Indian (762 instances) and Hospital Frankfurt Germany (2000 instances) datasets, totaling 2762 datapoints, was utilized to improve generalizability beyond single-source limitations. The research employed a quantitative design which involved comprehensive data preprocessing, including the critical imputation of physiologically impossible zero values and feature standardization.After assessing multicollinearity, all independent variables were retained. Six machine learning algorithms; Logistic Regression, Decision Tree, K-Nearest Neighbors, Support Vector Machine, Random Forest, and XGBoostwere evaluated, undergoing hyperparameter tuning to optimize their performance. XGBoost and Random Forest consistently achieved the highest F1-scores (0.9974 and 0.9947 respectively) among individual classifiers. These two top-performing models were then selected as base learners for a StackingClassifier ensemble, which utilized a Logistic Regression meta-learner. The developed ensemble model demonstrated exceptional predictive capabilities, achieving an F1-score of 0.9974 and a near-perfect ROC-AUC of 0.9999. This performance matched XGBoost's F1-score and marginally surpassed its ROC-AUC. Implemented in Python, this research underscores the significant potential of advanced ensemble machine learning to deliver highly accurate and robust diagnostic solutions, thereby contributing to earlier diabetes detection and improved health outcomes, particularly in underserved healthcare environments. en_US
dc.language.iso en en_US
dc.publisher Cuk en_US
dc.title Leveraging machine learning for diabetes prediction:Ensemble model en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account