A Hybrid Machine Learning Model for Detecting and Preventing Corruption in Kenya’s Public Procurement Contracts

Ndolo, Melchizedeck; Wanjoya, Anthony; Kasyoka, Philemon

CUK REPOSITORY HOME
→
Research Papers
→
School of Computing and Mathematics (SCOM)
→
Department of Computing Science and Information Technology (DCSIT)
→
View Item

dc.contributor.author	Ndolo, Melchizedeck
dc.contributor.author	Wanjoya, Anthony
dc.contributor.author	Kasyoka, Philemon
dc.date.accessioned	2026-01-08T13:25:39Z
dc.date.available	2026-01-08T13:25:39Z
dc.date.issued	2025-10-10
dc.identifier.uri	https://doi.org/10.11648/j.mlr.20251002.14
dc.identifier.uri	https://repository.cuk.ac.ke/handle/123456789/1859
dc.description	A research published in the Science Publishing group.	en_US
dc.description.abstract	Corruption in public procurement undermines fiscal sustainability, distorts competition, and reduces service quality. Conventional anti-corruption controls-manual audits, rule-based checks, and ex-post reviews-struggle to flag sophisticated, evolving fraud patterns in real time. This study proposes and empirically evaluates a hybrid machine-learning (ML) framework that integrates interpretable supervised models (logistic regression) with high-accuracy ensemble methods (random forest) and unsupervised learning (k-means clustering and anomaly detection) to identify corruption-prone contracts within Kenya’s public procurement ecosystem. Using secondary procurement data-contract values, procurement methods, bidder histories, award timelines-and text-derived indicators from public audit narratives, we construct features representing red flags such as single-bid tenders, repeated awards, and significant deviations from estimated costs. Logistic regression provides transparent coefficient-level evidence, while random forest captures non-linear interactions; clustering approximates high-risk groupings where labels are incomplete. Results indicate that single-bid tenders, prior supplier allegations, and execution irregularities (e.g., substandard deliveries, unusual extensions) are the most predictive factors of corruption labels. The ensemble achieved strong classification performance (AUC ≈ 0.98 on cross-validation), while the baseline logistic model offered high precision and policy-friendly interpretability. We outline a deployment roadmap for integrating the model into e-procurement workflows (IFMIS/PPRA) with explainable-AI (XAI) dashboards for risk-based audits. The contribution is twofold: a context-aware, reproducible pipeline for low- and middle-income settings, and governance guidance for embedding ML in accountability processes to prevent rather than merely detect procurement corruption.	en_US
dc.language.iso	en	en_US
dc.publisher	Science Publishing group.	en_US
dc.relation.ispartofseries	2025, Vol. 10, No. 2;pp. 131-136
dc.subject	Public Procurement.	en_US
dc.subject	Corruption Detection.	en_US
dc.subject	Machine Learning.	en_US
dc.subject	Cybersecurity.	en_US
dc.subject	Logistic Regression.	en_US
dc.subject	Anomaly Detection.	en_US
dc.subject	Explainable AI.	en_US
dc.subject	Kenya.	en_US
dc.subject	Random Forest.	en_US
dc.title	A Hybrid Machine Learning Model for Detecting and Preventing Corruption in Kenya’s Public Procurement Contracts	en_US
dc.type	Article	en_US