Youtube Comment Feature Selection And Classification Using Fused Machine Learning
Main Article Content
Abstract
The exponential rise of internet platforms, notably YouTube, has resulted in a massive volume of user-generated material, including video comments. Understanding audience input and enhancing user experience require analyzing and forecasting the mood of YouTube comments. This work provides a comprehensive method for YouTube comment prediction and sentiment classification that combines feature selection using Recursive Feature Elimination (RFE), Elastic Net Random Forest with Logistic Regression (RF with LR), and Principal Component Analysis (PCA). The first stage is to choose the most informative features from a given dataset using RFE, a common approach for doing so. RFE aids in the elimination of unnecessary or redundant features, resulting in enhanced model performance and decreased computing complexity. The Elastic Net Random Forest with Logistic Regression (RF with LR) technique is then used to construct a robust sentiment classification model. Elastic Net regularization combines the advantages of both L1 (Lasso) and L2 (Ridge) regularization, allowing for improved feature selection and management of multicollinearity concerns. By integrating many decision trees, the Random Forest ensemble approach improves the model's predictive potential even more. We employ Principal Component Analysis (PCA) to increase the classification model's effectiveness and solve possible difficulties created by high-dimensional data. PCA decreases the dataset's complexity while retaining its fundamental qualities, resulting in a more manageable and efficient feature space for classification. Finally, we compare the performance of three prominent classifiers on the preprocessed dataset: Linear Support Vector Machine (LSVM), Gaussian Naive Bayes (GNB), Logistic Regression (LR), and Decision Tree (DT). We can select the best-performing model for YouTube comment categorization by comparing these classifiers.