COVBERT: Enhancing Sentiment Analysis Accuracy in COVID-19 X Data through Customized BERT


Vanshaj Gupta1, Jaydeep Patel1, Safa Shubbar1, and Kambiz Ghazinour2, 1Kent State University, USA, 2State University of New York, USA


In a time when social media information is a valuable resource for gaining insights, the COVID-19 pandemic has released a flood of public sentiment, abundant with unstructured text data. This paper introduces CovBERT, a novel adaptation of the BERT model, specifically honed for the nuanced analysis of COVID-19-related discourse on X (formerly Twitter). CovBERT stands out by incorporating a bespoke vocabulary, meticulously curated from pandemic-centric tweets, resulting in a remarkable leap in sentiment analysis accuracy-from the baseline 72\% to an impressive 78.64\%. This paper not only presents a detailed comparison of CovBERT with the standard BERT model but also juxtaposes it against traditional machine learning approaches, showcasing its superior proficiency in decoding complex emotional undercurrents in social media data. Furthermore, the integration of geolocation analysis pipeline adds another layer of depth, offering a panoramic view of global sentiment trends.


BERT, CovBERT, COVID-19, Sentiment Analysis, X (Twitter) Data Analysis, Natural Language Processing, Machine Learning, Geolocation Analysis, Social Media Analytics, Data Mining

Full Text  Volume 14, Number 2