top of page

Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset

The dataset provided with this assignment is called `CMU-MisCOV19' and it comes from a

research project at the Centre for Machine Learning and Health at Carnegie Mellon University. The work was presented under the following title at CIKM 2020, `Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset'. Part of the abstract from this paper is presented as follows to build your understanding as to why was this data collected and annotated.

From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hotbed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are twofold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities."

To create this dataset, the authors used a diverse set of keywords to infer tweets through

Twitter search API. For the annotation process, 17 categories were identified, and the tweets were annotated manually. A codebook on annotations and categories, created by the authors, has been provided. Please refer to this codebook to familiarize yourself with the categories.

The list of categories that these tweets have been categorised/annotated as is provided


1. Irrelevant

2. Conspiracy

3. True Treatment

4. True Prevention

5. Fake Cure

6. Fake Treatment

7. False Fact or Prevention

8. Correction/Calling out

9. Sarcasm/Satire

10. True Public Health Response

11. False Public Health Response

12. Politics

13. Ambiguous/Difficult to classify

14. Commercial Activity or Promotion

15. Emergency Response

16. News

17. Panic Buying

The study mentions that 4573 tweets were annotated, and the annotations were made

publicly available. However, at the time of data extraction for this assignment, some of

these tweets or their authors’ accounts had either been suspended or taken down by

Twitter, or the privacy settings had changed. Therefore, the number of tweets provided for this assignment is slightly less than those annotated in the study.

The aim is to analyze the tweets’ and provide insights into the general trends and patterns of tweets by annotation. To achieve this a series of specific tasks have been outlined.

Task A – Text Mining (25%)

Over the last four weeks you have seen a range of text pre-processing techniques. You are required to utilise these techniques to clean the textual data, extract knowledge and

produce informative visualisations.

Task B – Sentiment Analysis (20%)

Sentiment analysis is “the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.”

You are required to identify comparisons between annotations and investigate the

‘sentiment’ in the provided textual data. You will need to utilise sentiment analysis

techniques covered during lecturers to extract the required information.

Task C – Topic Modelling (15%)

You are required to cluster different word groups and expressions from the tweets that best characterise the information, you will need to uncover hidden trends within the text by annotation. To preform this task you will need to utilise the topic modelling techniques

considered within lecturers.

Task D – Further exploration (5%)

You are required to utilise any further techniques shown to you in lectures or from your

own research in order to draw meaningful insight from the text.

Presentation of Code (10%)

You are required to submit your code in a programming notebook ( R Markdown Report).

You will need to submit you .rmd along with a html or pfd version. Marks available for

students who do the following:

• Break their code into small, meaningful chunks and functions

• Declare all variables using appropriate naming convention

• Comment code in an appropriate, useful manner

• Create a presentable, professional easy to follow R Markdown Report outlining the

analysis for preformed for each task.

Task E - Demonstration (25%)

You are required to deliver a 10-minute demonstration of your code summarising your main findings of each task. This demonstration will take place following the submission deadline. The Lecturer who will interact and ask questions to gauge understanding. At this point, you may be asked to provide information or explain parts of your code. The demonstration is used to test that you a) understand your code and b) can explain the algorithms utilised.

How can you contact us for assignment Help.

  1. Via Email: you can directly send your complete requirement files at email id and our email team follow up there for complete discussion like deadline , budget, programming , payment details and expert meet if needed.

  2. Website live chat: Chat with our live chat assistance for your basis queries and doubts for more information.

  3. Contact Form: Fill up the contact form with complete details and we'll review and get back to you via email

  4. Codersarts Dashboard: Register at Codersarts Dashboard , and track order progress


bottom of page