Billion-Scale Investigation of COVID-19 Impact on Human Communication in 104 Languages

COVID-19 Research Area(s): Culture, Economics & Business, Environment, Epidemiology & Public Health, Equipment & Technology Innovations, Healthcare Delivery & Policy, Mental Health & Wellbeing, Politics, Governance & Law, Social Impacts

What is the impact of COVID-19 on human communication? We create a billion-scale dataset from social media to enable the study of COVID-19’s impact on human communication in 104 languages. We perform extensive analyses of the data that allows us to uncover interesting patterns including varying use of news media, interpersonal communication and posting behaviors, and human mobility at a global scale.


The Footprint of COVID-19: A Global Impact on Human Life

COVID-19 has changed the way we lead our lives. Regardless of age, gender, culture, economic class, education, income, language, place, profession, etc., we are all in this together. When they communicate, however, different people choose to talk about different topics, use different styles, express different emotions, report different experiences, refer to different places, interact with different people and media, etc. In other words, even though the pandemic is impacting everyone, different people are responding differently. But how can we study that? What are the most important issues we should prioritize? Which places (countries, provinces/states, cities, etc.) are most important to start with? Another important question is what our point of comparison should be such that we identify the scale with which people’s behaviors, hopes, fears, needs, are different from what they used to be? A third question is related to data: What data are useful? And so many other questions. Ultimately, the impact the pandemic is having on human life, and perhaps all types of life on our planet, will be studied for a long time.

We create a billion-scale dataset from social media to enable the study of COVID-19’s impact on our daily lives. We wanted to create a sufficiently large dataset whose investigation can result in reasonably generalizable conclusions. We also wanted a dataset that is diverse (e.g., representing different languages, cultures, communities, countries). Finally, we are interested in comparing user behavior and communication patterns over time and hence we wanted the data to have extended, multi-year temporal coverage. This resulted in us designing Mega-COV, the dataset we report in our paper, around these principles. We undertake in-depth analyses of the data and discover strikingly intriguing patterns that we report in our related paper. We also release our data for research under tight ethical considerations.

