Abstract:
The second-hand car market in Kenya has grown significantly, but traditional valuation methods remain subjective and inconsistent, creating inefficiencies and information gaps between buyers and sellers. These approaches often ignore the combined impact of brand, model, and year of manufacture, mileage, and engine size on resale prices. Machine learning offers a more accurate and transparent alternative. This study applied Linear Regression, Random Forest, and XGBoost to a dataset of 28,000 vehicle listings from SBT Japan. After extensive preprocessing, models were evaluated using Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R². Linear Regression performed poorly, while ensemble models produced stronger results. Random Forest achieved a testing R² of 0.816 with an MAE of Ksh 683,303, XGBoost reached a testing R² of 0.837 with an MAE of Ksh 672,930, and a Voting Ensemble combining both models performed best, with a testing R² of 0.840, an MAE of Ksh 649,487, and the lowest RMSE of Ksh 1,069,036.