Comparative Evaluation of Feature Representation Techniques for Hate Speech Detection in Social Media Text

Priyanshu Jadon

doi:10.52783/tjjpt.v47.i02.10766

PDF

Published: Apr 4, 2026

DOI: https://doi.org/10.52783/tjjpt.v47.i02.10766

Priyanshu Jadon, Deepshikha Bhatia, Durgesh Kumar Mishra

Abstract

Social media platforms generate an enormous volume of short and highly dynamic textual content, much of which is posted without effective moderation. Although several computational approaches have been proposed for analyzing social media data, reliable hate speech detection remains difficult because model performance is strongly influenced by the quality of feature representation. This study examines the impact of multiple feature extraction techniques on a deep learning model for hate speech classification. Five feature representation methods—bi-gram features, part-of-speech (PoS) features, count vectorization, TF-IDF features, and word embeddings—were implemented and evaluated using a Sequential Convolutional Neural Network (SCNN). The models were assessed on a publicly available hate speech detection dataset through comparative experimental analysis. The findings indicate that count vectorizer and TF-IDF representations achieve superior training and validation accuracy, while changes in feature dimensionality significantly influence classification performance. The results also show that higher-dimensional feature spaces increase computational cost in terms of memory usage and execution time. Overall, the study highlights the importance of word-level textual information in the classification of short social media posts. . Based on these observations, future work will focus on designing a richer feature descriptor to further improve hate speech detection in short-text environments.

Issue

Vol. 47 No. 02 (2026)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details