Evaluating Sentiment Analysis Models on Historical Texts about Black Americans

Publication Date : Jun-18-2025

DOI: 10.70251/HYJR2348.33147154

Author(s) :

Vishnu Athreya.

Volume/Issue :

Volume 3

Issue 3

(Jun - 2025)

Abstract :

In this research study, four sentiment analysis models were trained on a dataset consisting of primary source documents about the experiences of Black Americans in the twentieth century, specifically in regards to their migration to the northern United States and racially oppressive legislation in the South. Each model used a different algorithm for sentiment analysis: Multinomial Naive Bayes, support vector machine (SVM), Generated Pre-Trained Transformer-2 (GPT-2), and Bidirectional Encoder Representations from Transformers (BERT). The goal was to determine which algorithm was best able to classify a 20th-century document about Black Americans as having either positive or negative outlook on their experiences. The results of this research, coupled with future, more advanced studies on such algorithmic capabilities, can allow for a more streamlined, objective, and accurate approach to categorizing historical documents, enabling historians to analyze them to generate insights and support arguments with greater speed and efficiency. Among the four algorithms, BERT achieved the highest accuracy rate (100%), followed by SVM and GPT (97%), and Multinomial Naive Bayes had the lowest accuracy rate (95%). However, the imbalanced nature of the dataset in terms of the ratio of positive to negative documents raises concerns about the algorithms being more likely to identify documents as positive. Also, the seemingly overwhelming accuracy of BERT signals that overfitting may have artificially skewed the results.

American Journal of Student Research®