Abstract:
Low birth weight (LBW) is a prevalent public health challenge in low- and middle-income countries, including Kenya, where approximately 11.5% of newborns are affected. LBW is linked to heightened infant mortality, infections, and long-term developmental issues. While machine learning (ML), particularly ensemble learning, has demonstrated potential in improving LBW risk prediction, its application in resource-limited settings like Kenya remains underexplored. Prior research has largely focused on developed countries with limited adoption in sub-Saharan Africa, highlighting a crucial gap this study aims to address. This research develops and evaluates ensemble machine learning models to predict LBW risk using nationally representative data from the 2022 Kenya Demographic and Health Survey. The study integrates traditional clinical indicators with advanced computational methods, employing base classifiers such as Support Vector Machines and Logistic Regression alongside ensemble methods including Random Forest, Gradient Boosting, and Extreme Gradient Boosting. Meta-ensemble approaches such as bagging, voting, and stacking were also assessed. Data preprocessing included treatment of missing values, encoding categorical variables, and addressing class imbalance through the Synthetic Minority Over-sampling Technique (SMOTE). Models were trained and validated using stratified cross-validation and independent testing, with evaluation metrics comprising ROC AUC, accuracy, F1-score, Matthews Correlation Coefficient, and Brier score, emphasizing both discrimination and calibration. Results indicate that Random Forest outperformed other models, achieving a high ROC AUC of 0.957 and PR AUC of 0.971, with excellent calibration (Brier score 0.089), evidencing its strong predictive capability for LBW risk in the Kenyan context. Important predictors identified were gestational age, maternal height and weight, antenatal care utilization, and socioeconomic factors, consistent with known biological and contextual determinants. Ethical considerations regarding patient privacy, algorithmic fairness, and transparency were incorporated to promote responsible AI use in healthcare. The findings demonstrate that tailored ensemble learning models provide robust, interpretable, and practical tools for LBW prediction in low-resource settings. This work fills a critical research gap by applying advanced ML methods to Kenyan maternal-child health data, offering potential to enhance clinical decision-making and improve maternal and neonatal outcomes. The study underscores the importance of contextualized AI solutions and ethical governance for sustainable healthcare innovation.