Data-driven sentiment analysis model for predicting defacement attacks: a case study using X.

Kariuki Kanja, George

CUK REPOSITORY HOME
→
Master Theses and Dissertations (MST)
→
School of Computing and Mathematics (SCOM)
→
Department of Computing Science and Information Technology (DCSIT)
→
View Item

Data-driven sentiment analysis model for predicting defacement attacks: a case study using X.

Kariuki Kanja, George

URI: https://repository.cuk.ac.ke/handle/123456789/1950

Date: 2025

Abstract:

This study is a response to the growing menace of social media site defacement causing the problem of cyber-defacement through the creation of a sentiment analysis model that can predict attacks on the X (previously Twitter) network. The introduction puts the research within the framework of the growing volume of user-generated content and the insufficiency of the current reactive cybersecurity strategies.Informed by the research questions- (i) what sentiment indicators are related to defacement attacks on X? (ii) What do we do to build a data driven model to predict such attacks? (iii) What is the effectiveness of real-time implementation? and (iv) What does the model predict unseen data?--the study will (1) find predictive signals of defacement using a specific sentiment, (2) develop a predictive model on X, (3) apply the model to real-time streams and (4) test and evaluate the model on predictive ability. The methodology was based on a positivist philosophy and Design Science Research (DSR) model, which facilitated iterative design, implementation and evaluation of the artefact. The X API was used to harvest tweets about previous defacement events and clean the dataset, de-duplicate it, filtering by language and annotating it to create a balanced dataset of just under 45,000-labelled posts over the span of multiple months. Using Natural Language Processing, Latent Dirichlet Allocation on topical cues and engineered sentiment features the textual and temporal features were extracted. Naive Bayes, Support Vector Machine and a BiLSTM network stacked together, trained and evaluated by stratified 10-fold cross- validation were used to create a stacked ensemble that was trained and validated with an ARIMA temporal predictor. The statistical analysis revealed that the ensemble was much better than the single classifiers with a accuracy of 85.7, F1-score of 80.6 and ROC-AUC of 0.91 at the optimal threshold (0.40). Temporal emotion peaks and clusters of negative emotions were identified as predictors of a defacement event that occurs shortly. The discussion sheds light on how the combination of lexical, contextual and temporal features can contribute to the early-warning capability, as well as fill the research gap of proactive social-media security surveillance. This paper finds that predictive modelling based on sentiment is practically viable and operationally useful in predicting defacement attacks. It advises administrators of such platforms to combine such models with incident-response processes, ongoing monitoring and user- sensitization exercises to enhance cyber-defense posture. Further studies are needed to create richer multilingual, multimodal datasets, create adversarial-resistant sentiment algorithms, research transfer learning to other platforms and test-scale deployment in reality.

Description:

A thesis submitted to the Department of Computer Science and Information Technology in the School of Computing and Mathematics in partial fulfillment of the requirements for the award of the degree of master of Science in Cyber Security of the Co-operative University of Kenya.

Show full item record