Abstract:
The second-hand car market in Nairobi continues to grow rapidly, creating a need for accurate and transparent price prediction methods. Traditional valuation approaches rely heavily on subjective judgement, leading to inconsistent and unreliable pricing. This study aimed to develop a data-driven machine learning model capable of predicting second-hand car prices using structured vehicle characteristics such as year of manufacture, mileage, engine capacity, brand, and model. The population consisted of all vehicles listed on the SBT Japan online platform in Kenya. A total of 29,000 records were collected through web scraping, and after cleaning and preprocessing, 20,775 records were retained for analysis and modelling. Feature analysis showed that model, brand, engine capacity, year of manufacture, and mileage were the most influential predictors of price. Two ensemble learning models, Random Forest and Extreme Gradient Boosting, were developed and evaluated. The Extreme Gradient Boosting model achieved the highest accuracy, with a mean absolute error of 95,696.60 Kenyan shillings, a root mean square error of 190,939.99 Kenyan shillings, and a coefficient of determination of 0.99379, which represents a substantial improvement over the baseline error of 1,839,811.92 Kenyan shillings. The study concludes that machine learning provides a reliable, consistent, and highly accurate approach for predicting second-hand car prices in Nairobi, offering practical value to car dealers, financial institutions, insurers, online marketplaces, and potential buyers seeking transparent and data-driven pricing.