Leveraging ensemble models for optimizing predictive accuracy of low birthweight risk in Kenya.

Otieno Opiyo, Victor

CUK REPOSITORY HOME
→
Master Theses and Dissertations (MST)
→
School of Computing and Mathematics (SCOM)
→
Department of Computing Science and Information Technology (DCSIT)
→
View Item

dc.contributor.author	Otieno Opiyo, Victor
dc.date.accessioned	2026-07-01T09:13:41Z
dc.date.available	2026-07-01T09:13:41Z
dc.date.issued	2025
dc.identifier.uri	https://repository.cuk.ac.ke/handle/123456789/1958
dc.description	A project submitted to the department of computer science & Information technology in the school of computing and Mathematics in partial fulfilment of the requirements for the Award of the degree of master of science in data science of the Co-operative university of Kenya.	en_US
dc.description.abstract	Low birth weight (LBW) remains a significant public health concern in Kenya, affecting approximately 11.5% of infants and resulting in high infant mortality and long-term poor health. Accurate prediction of LBW risk is crucial to enable timely interventions and improve neonatal health outcomes. The objective of this study was to develop and evaluate ensemble machine learning models to predict the risk of LBW using nationally representative data from the Kenya Demographic and Health Survey (KDHS) 2022. A comprehensive preprocessing pipeline was used to handle missing values, encode categorical variables, and address class imbalance using the Synthetic Minority Over-Sampling Technique (SMOTE). Various machine learning methods, the base methods like Support Vector Machine and Logistic Regression, and ensemble models like the Random Forest, Gradient Boosting, and Extreme Gradient Boosting were all trained and compared. Moreover, the prediction abilities of meta-ensemble methods such as bagging, voting, and stacking classifiers are also evaluated. Model assessment was done using stratified cross-validation, and performance was evaluated on an independent test set using performance metrics such as ROC AUC, F1-score, and Brier score. Random Forest classifier achieved the highest score of 0.957 ROC AUC with decent calibration (Brier score of 0.089), being better than both base and meta-ensemble models. The key predictors identified from the analysis include gestational age, maternal anthropometrics (height, weight), and antenatal care attendance, which proved their biological and contextual applicability to LBW risk in Kenya. The paper highlights the significance of contextualized AI solutions and ethical governance in sustainable healthcare innovation. These results indicate that ensemble learning methods can be used with specific target population selection to achieve better results in LBW risk prediction in low-resource regions. Developing interpretable and stable models can guide clinical decision-making and focused interventions with the long-term objective of encouraging maternal and neonatal health outcomes in Kenya and other contexts.	en_US
dc.language.iso	en	en_US
dc.publisher	Cuk	en_US
dc.title	Leveraging ensemble models for optimizing predictive accuracy of low birthweight risk in Kenya.	en_US
dc.type	Thesis	en_US