Comparative Evaluation of Feature Representation Techniques for Hate Speech Detection in Social Media Text
Main Article Content
Abstract
Social media platforms generate an enormous volume of short and highly dynamic textual content, much of which is posted without effective moderation. Although several computational approaches have been proposed for analyzing social media data, reliable hate speech detection remains difficult because model performance is strongly influenced by the quality of feature representation. This study examines the impact of multiple feature extraction techniques on a deep learning model for hate speech classification. Five feature representation methods—bi-gram features, part-of-speech (PoS) features, count vectorization, TF-IDF features, and word embeddings—were implemented and evaluated using a Sequential Convolutional Neural Network (SCNN). The models were assessed on a publicly available hate speech detection dataset through comparative experimental analysis. The findings indicate that count vectorizer and TF-IDF representations achieve superior training and validation accuracy, while changes in feature dimensionality significantly influence classification performance. The results also show that higher-dimensional feature spaces increase computational cost in terms of memory usage and execution time. Overall, the study highlights the importance of word-level textual information in the classification of short social media posts. . Based on these observations, future work will focus on designing a richer feature descriptor to further improve hate speech detection in short-text environments.