This dataset contains tweet IDs and stance annotations for 16,616,425 tweets related to two intersecting public-discourse crises - the Russo-Ukrainian conflict and the COVID-19 pandemic / vaccination debate - collected between April 2009 and December 2023. In accordance with Twitter/X Developer Terms of Service, only tweet IDs and annotation labels are provided; researchers wishing to use full tweet content must hydrate the tweet IDs using the Twitter/X API or compatible tools such as twarc.
Research context. This dataset was created as part of a study investigating the relationship between personality traits and ideological stance on two polarizing issues - support for Ukraine and attitudes toward COVID-19 vaccination - among Polish Twitter users, in collaboration with Gazeta Wyborcza.
Annotation. Stance scores were assigned at the account level by 12 human annotators across 3,600 unique accounts, on a continuous scale from 1 (most negative/contra) to 9 (most positive/pro) for two dimensions: pro-Ukrainian stance (pro_ukrainian_num) and pro-vaccine stance (pro_vaccine_num). Where multiple annotators scored the same account, scores were resolved by averaging. Categorical labels (pro_ukrainian, pro_vaccine: anti / neutral / pro) were derived from the numeric scores. Retrospective inter-rater reliability estimation on 160 multiply-rated accounts (960 pooled pairwise comparisons) yielded moderate agreement: weighted Cohen's κ = 0.49 for pro-Ukrainian stance, κ = 0.45 for pro-vaccine stance.
Coverage. 31.9% of tweets (5,294,639 rows) carry stance annotations, reflecting the proportion of collected accounts that were annotated.
Files provided.
- ukraine_covid_labels_only_2009_2023.csv.gz - 16,616,425 rows containing tweet ID, pro_ukrainian_num (float, NaN if unlabelled), pro_ukrainian (categorical, NaN if unlabelled), pro_vaccine_num (float, NaN if unlabelled), pro_vaccine (categorical, NaN if unlabelled).