Abstract:
Non-communicable diseases continue to claim the lives of the Kenyan population particularly in low-resource settings like Kitui. In this study, a predictive intelligence model was formulated based on clinical, demographic, and behavioral information of 68, 601 patients record to help in the identification of early NCD. Three machine learning models were trained and tested on 5-fold cross-validation namely; Logistic Regression, Random Forest, and XGBoost. Random Forest and XGBoost were more accurate (93% and 93%), than the Logistic Regression (74%). Another hybrid variant of soft-voting (Random Forest and XGBoost) further enhanced the balance of the classification, giving 0.93 accuracy, 0.81 precision, 0.82 recall and 0.81 F1-score. Among the most significant predictors, there were systolic blood pressure, BMI, and fasting blood sugar. SHAP analysis was more interpretable, as it showed the effect of the predictors on the individual risk scores. The results show that hybrid ML models are reliable to assist in early detection of NCD cases in resource-constrained environments.