DSpace Repository

Machine learning model for precipitation forecasting in Kenya.

Show simple item record

dc.contributor.author Muthoki Mulinge, Damaris
dc.date.accessioned 2026-07-02T14:13:06Z
dc.date.available 2026-07-02T14:13:06Z
dc.date.issued 2025
dc.identifier.uri https://repository.cuk.ac.ke/handle/123456789/1969
dc.description A research project submitted to the department of computer Science and information technology in the school of computing and mathematics in partial fulfillment of the requirements for the award of the degree of master of science in data science of the cooperative university of Kenya. en_US
dc.description.abstract Accurate precipitation forecasting is important for mitigating the impacts of climate variability in Kenya, where erratic rainfall events considerably affect agriculture, water control, and disaster preparedness. Traditional methods such as ARIMA (Autoregressive Integrated Moving Average) and NWP (Numerical Weather Prediction) have been shown to struggle with complex weather patterns due to linearity assumptions, high computational demands and limited spatial resolution. This paper developed and evaluated an XGBoost-based machine learning model to enhance precipitation predictions both long-term and short-term. Utilizing a 20-year weather dataset (2004 - 2024) with 7300 daily data records sourced from online Visual Crossing Weather Data, key features include temperature, humidity, wind speed, lagged precipitation (1-7), rolling means and seasonal encoding to capture bimodal rainfall patterns of the months of March-May, and October-December. Data processing involved min-max normalization of 0-1 range, feature selection, sin/cosine transformations for seasonal patterns, and temperature- humidity interactions for connective modeling processes. The dataset used was split with 80% for training and 20% for testing and a temporal split ≤ 2020 for training and > 2020 for testing maintaining the chronological data order. The initial attempts exhibited poor performance with low R2 = 0.066 and a high RMSE=1.06. The model again was re-evaluated using XGBoost binary classification shift to predict the likelihood of rain/no-rain tomorrow. Bayesian optimization and GridSearchCV hyperparameter tuning was applied with default 0.5 threshold adjustment for improved rain class sensitivity using classification metrics and resulted 76.76% accuracy, 70.14% precision, 33.36% recall, 45.12% F1- Score and ROC-AUC 0.75. Post-tuning accuracy by reducing the threshold to 0.3 to capture missed rainfall events: 73% accuracy, no-rain precision and recall 81%, 53% rain precision, 54% recall, F1 Score 54%. Temperature- humidity interaction as the top predictor in feature importance. The results indicated that the XGBoost-based model with 73% accuracy and 54% recall in forecasting rain/no-rain occurrences forecasting to support agricultural planning, water resource management and early warning for disaster preparedness in Kenya’s climate vulnerable regions. en_US
dc.language.iso en en_US
dc.publisher Cuk en_US
dc.title Machine learning model for precipitation forecasting in Kenya. en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account