Abstract:
Climate variability is a major driver of malaria surges in Kenya’s Lake Victoria Basin, yet routine decision-making often reacts after peaks have begun. This project develops, validates, and operationalizes an ensemble machine-learning framework that integrates climate and health- system data to provide short-lead malaria early warning for Migori County. The study population comprised public health facilities in Migori; a purposive census of ten facilities yielded 5,000 facility-week records (2015–2024). Secondary data were extracted from routine surveillance (weekly malaria cases and facility capacity indicators) and matched to weekly climate series (rainfall, temperature, humidity, wind). Data collection used standardized extraction templates; instruments for processing and analysis were reproducible Python notebooks. Pre-processing created short lags (t–1, t–2), harmonic calendar terms, and facility fixed effects; blocked time-series splits and rolling-origin cross-validation were used to avoid leakage. Three base learners Random Forest, XGBoost, and a feed-forward ANN were trained and stacked via a Ridge meta- learner; complementary Negative Binomial regression supported inference. Key results show the ensemble outperformed single models on the independent test set (R² = 0.75; RMSE = 2.87; MAE = 2.22), with stable ROCV performance across seasons (R² = 0.72–0.76; RMSE = 2.98–3.27). Interpretability (permutation importance, SHAP, PDP, ICE) confirmed recent rainfall, seasonal terms, and recent cases as dominant drivers, while personnel density and antimalarial stock index dampened predicted surges. The study recommends institutionalizing a dual-trigger SOP flag an alert when weekly rainfall ≥120 mm and ensemble risk ≥0.7 to cue stock pre-positioning, surge rosters, and targeted outreach; integrating near-real-time climate feeds into dashboards/SMS; strengthening data quality pipelines; capacity building on model interpretation; and periodic threshold recalibration. The framework demonstrates that climate-informed analytics can be embedded in routine governance to shift malaria control from reactive response to anticipatory preparedness in resource-constrained settings.