Reproducibility code for "Online misinformation is linked to early COVID-19 vaccination hesitancy and refusal" Francesco Pierri, Brea Perry, Matthew R. DeVerna, Kai-Cheng Yang, Alessandro Flammini, Filippo Menczer and John Bryden. Nature Scientific Reports (2022) https://www.nature.com/articles/s41598-022-10070-w
.
├── README.md
├── config.ini
└── data
│ ├── county_level
│ ├── covid19
│ ├── misc
│ ├── state_level
│ └── twitter
├── intermediate_files
├── logs
├── output_files
└── src
└── v1-streaming
config.ini- configuration file that specifies paths and filenames for the scriptsdata- folder which contains subfolders with raw data at the state and county level, as well as Twitter data. Check related README files for further detailsintermediate_files- folder which contains intermediate data to be mergedlogs- folder which contains logs for the output of scriptssrc- folder which contains scripts to be executedv1-streaming- folder which contains the code used to stream the tweets
You can find keywords used to filter Twitter stream in src/keywords.txt. You can find the list of low-credibility sources in intermediate_files/low_credibility.csv. Check the Github repository associated to our CoVaxxy project for further details.
- Clone this repository in your local directory.
- Put Twitter data in the
data/twitterfolder. You must put.jsonfiles with one tweetjsonper line. Check the Github repository associated to our CoVaxxy project to see how to download our dataset and reconstruct it using Twitter API. - Go to the
srcfolder and execute Python (we used version 3.8.5) scripts (see associatedsrc/README.mdfile for further details) in the following order:python3 twitter_data_processing.py ../config.ini- to process Twitter datapython3 get_cases_and_deaths.py ../config.ini- download COVID-19 number of cases and deaths; modifyconfig.inito set the date range.python3 aggregate_cases_and_deaths.py ../config.ini- aggregate COVID-19 numbers of cases and deaths for further usepython3 merge_datasets.py ../config.ini- merge together intermediate data in a single dataframe to be used for correlation.
- Run STATA script (
src/stata_script.do) to get correlation results usingoutput_files/master_data--{%Y-%m-%d__%H-%M-%S}.csv. - To do Granger Causality analysis, go to the
srcfolder and execute Python (we used version 3.8.5) scripts (see associatedsrc/README.mdfile for further details) in the following order:python3 get_temporal_data.py ../config.ini- to generate daily aggregates at a user levelpython3 generate_aggregate_files.py ../config.ini- to then aggregate by county or statepython3 causality.py ../config.ini- to run causality analysis