| dc.description.abstract |
Diabetes presents a great global health challenge, with delayed diagnosis significantly impeding effective management, particularly in resource-constrained regions. The critical shortage of medical professionals in regions like Kenya with a doctor-to-population ratio far below the WHO standard severely hampers timely screening and diagnosis diabetes. This deficit necessitates innovative, scalable tools, such as machine learning models, to assist in early prediction and intervention.This project research aimed to enhance timely and accurate diabetes prediction by developing an advanced ensemble machine learning model. Ahybrid dataset, compiled from the PIMA Indian (762 instances) and Hospital Frankfurt Germany (2000 instances) datasets, totaling 2762 datapoints, was utilized to improve generalizability beyond single-source limitations. The research employed a quantitative design which involved comprehensive data preprocessing, including the critical imputation of physiologically impossible zero values and feature standardization.After assessing multicollinearity, all independent variables were retained. Six machine learning algorithms; Logistic Regression, Decision Tree, K-Nearest Neighbors, Support Vector Machine, Random Forest, and XGBoostwere evaluated, undergoing hyperparameter tuning to optimize their performance. XGBoost and Random Forest consistently achieved the highest F1-scores (0.9974 and 0.9947 respectively) among individual classifiers. These two top-performing models were then selected as base learners for a StackingClassifier ensemble, which utilized a Logistic Regression meta-learner. The developed ensemble model demonstrated exceptional predictive capabilities, achieving an F1-score of 0.9974 and a near-perfect ROC-AUC of 0.9999. This performance matched XGBoost's F1-score and marginally surpassed its ROC-AUC. Implemented in Python, this research underscores the significant potential of advanced ensemble machine learning to deliver highly accurate and robust diagnostic solutions, thereby contributing to earlier diabetes detection and improved health outcomes, particularly in underserved healthcare environments. |
en_US |