Deniers and believers in climate change discourse on twitter, and anti/pro positions in ukraine war and vaccine discourse

Metrics

Share Dataset

Share this dataset on your favorite social media networks.

Deniers and believers in climate change discourse on twitter, and anti/pro positions in ukraine war and vaccine discourse

Version 1.0

Mahmoudi, Amin; Jemielniak, Dariusz; Ciechanowski, Leon, 2025, "Deniers and believers in climate change discourse on twitter, and anti/pro positions in ukraine war and vaccine discourse", https://doi.org/10.18150/FVIMEK, RepOD, V1

Learn about Data Citation Standards.

Description

We have acquired the data from George Washington University Libraries Dataverse, the Climate Change Tweets Ids [Data set] . This dataset has been collected from the Twitter API using Social Feed Manager, and totalled to 39,622,026 tweets related to climate change. The tweets were collected between September 21, 2017 and May 17, 2019. However, there is a gap in data collection between January 7, 2019 and April 17, 2019. The tweets with the following hashtags and keywords were scraped: climatechange, #climatechangeisreal, #actonclimate, #globalwarming, #climatechangehoax, #climatedeniers, #climatechangeisfalse, #globalwarminghoax, #climatechangenotreal, climate change, global warming, climate hoax.

Due to Twitter's Developer Policy, only the tweet IDs were shared in the database, not the full tweets. Therefore, we had to hydrate the tweet ids with the use of Hydrator application. Hydrating was carried out by us in June, 2020, and it allowed us to obtain 22,564,380 tweets (some tweets or user accounts are deleted or suspended by Twitter in its standard maintenance procedures). Challenges encountered during data hydration included dealing with deleted tweets or suspended user accounts, which is a common occurrence in Twitter's standard maintenance procedures. We addressed this by using the Hydrator application, which allowed us to recover as much data as possible within the constraints of Twitter's Developer Policy.

In order to comprehensively diagnose Polish social networks and to enable automated classification of Twitter users in terms of their attitude towards vaccinations, we collected a balanced, importance-wise database of Twitter users for manual annotation. The most important keywords used by groups that spread anti-vaccination propaganda were identified. Using our programming pipeline, databases of Polish social media on the topic of the pandemic and attitudes towards vaccinations were obtained. The raw data contained over 5 million tweets from almost 3600 users with the following hashtags related to the COVID-19 pandemic in Poland and the war in Ukraine: stopsegregacjisanitarnej, nieszczepimysie, szczepimysie, szczepienie, szczepienia, koronawirus, koronawiruswpolsce, koronawiruspolska, rozliczymysanitarystow, stopss, covid, covid19, sanitaryzm, epidemia, pandemia, plandemia, zelensky, zelenski, wojna, muremzabraunem, konfederacja, wojnanaukrainie, putin, ukraina, ukraine, rosja, russia, wolyn, bandera, upa. Twelve annotators rated the scraped Twitter users based on their posts on a nine-point Likert scale. Samples evaluated by annotators were partially overlapped in order to examine their consistency and reliability. Statistical tests performed on data before and after binning (in three- and two-category versions) confirmed significant annotator agreement. Fleiss' kappa, Randolpha, Kirchendorff alpha, and intracorrelation coefficients indicate non-random agreement among the competent judges (annotators).

Our initial data acquisition based on the abovementioned hashtags yielded 5,308,997 posts. To focus specifically on discussions related to COVID-19 and the war in Ukraine, we implemented a filtering process using Polish word stems relevant to these topics. This step reduced our dataset to 4,840,446 posts. The filtering was performed using regular expressions based on lemmatized versions of key terms. For war-related content, we used stems such as 'wojna' (war), 'inwazj' (invasion), 'ukrai' (Ukraine), and 'putin'. For COVID-related content, we used stems like 'mask' (mask), 'szczepi' (vaccine), and 'koronawirus' (coronavirus). This approach allowed us to capture various grammatical forms of these words.

Following this initial filtering, we removed three users who had no posts related to either COVID-19 or the war in Ukraine. This step left us with 3,597 users and 4,839,995 posts. Finally, to ensure consistency in our analysis, we selected only posts in the Polish language. This final step resulted in our dataset of 3,577,040 posts from 3,597 users. Before the tweets content analysis was performed, text lemmatization had been performed, special characters, links, and low-importance words based on a stop list (e.g. conjunctions) had been removed.

Data preprocessing has been carried out in Python programming language with the use of specific libraries and our original code. The hydrated tweets were further cleaned by removing duplicates and all tweets that had no English language label. Some characters and technical expressions were then replaced with natural language terms (e.g., changing “&amp” into “and”). We have also created a couple of versions of the database, for various purposes - in some of them we have replaced emoji pictures with their descriptions (using the demoji library and our original code), for other database versions we have removed the emojis, hyperlinks, and special characters. This caused the dataset to comprise 24,083,452 tweets (7,741,602 tweets without retweets), which makes it the biggest database of social media data referring to climate change analyzed to date.

We created the social network directed graph with the use of RAPIDS cuGraph library in Python for most of the network statistics calculations, and also with the use of the graph-tool . The final graph visualization was created with the use of Gephi after preparing and filtering the data in Python. The final graph had 4,398,368 nodes and 18,595,472 edges, after removing duplicates and self-loops.

The final label of "believer," "denier," or "neutral/unknown" was assigned to users present across annotators through the averaging of results from multiple annotators.

In the Ukraine dataset, the term 'anti-group' refers to various tactics of information warfare aimed at discrediting Ukraine's sovereignty and legitimacy, whereas the 'pro-group' consists of tweets that support Ukraine's sovereignty and legitimacy. In the Vaccine dataset, 'anti' denotes a group of users who publish tweets against vaccination, while 'pro' users advocate for vaccination programs. In the Climate Change dataset, 'denier' users dismiss it as a conspiracy theory, while 'believer' users perceive climate change as a serious threat to the future of humanity.

For ClimateChange dataset, the creationdate indicates when the connection between two users was established. The user1 and user2 fields are anonymized unique IDs representing the source and target users, respectively. Specifically, user1 is the unique ID of the source, while user2 is the unique ID of the target. The user1status denotes whether user1 is a believer (1), neutral (2), or denier (3). The creationday is a numeric value tied to the creation date. The onset and terminus fields mark the first and last days of any recorded interaction between user1 and user2, respectively, and duration captures the total time they have interacted. Finally, the w field indicates the number of interactions (such as replies, retweets, or direct messages) exchanged between them in a Twitter context.

In the Ukraine war and Vaccine dataset, the “createdate” indicates the date of that interaction. The “likecount,” “retweetcount,” “replycount,” and “quotecount” columns capture various engagement metrics on Twitter—how many times a tweet is liked, retweeted, replied to, or quoted. The “user1” and “user2” fields store unique user IDs, whereas “user1proukraine,” “user1provaccine,” “user2proukraine,” and “user2provaccine” denote each user’s stance (e.g., pro, anti, or unknown) regarding Ukraine and vaccines. The “creationday” is a numeric value corresponding to the creation date, while “onset” and “terminus” mark the first and last recorded interactions between user1 and user2, respectively. Finally, “duration” shows the total time span across which these interactions took place.

(2025)

Subject

Business and Management; Computer and Information Science; Social Sciences

Keyword

Climate change

Related Publication

Mahmoudi, A., Jemielniak, D. & Ciechanowski, L. Characteristics of two polarized groups in online social networks’ controversial discourse. J Comput Soc Sc 8, 22 (2025). https://doi.org/10.1007/s42001-024-00350-y https://doi.org/10.1007/s42001-024-00350-y doi: 10.1007/s42001-024-00350-y

License

CC0 Creative Commons Zero 1.0

Find

1 to 2 of 2 Files

			Download
		climateChangeV1.txt Plain Text - 224.6 MB - Jun 23, 2025 - 0 Downloads MD5: 39bca23503ad942bcb17fbbbe8afc10f License: CC0 Creative Commons Zero 1.0 The creationdate indicates when the connection between two users was established. The user1 and user2 fields are anonymized unique IDs representing the source and target users, respectively. Specifically, user1 is the unique ID of the source, while user2 is the unique ID of the target. The user1status denotes whether user1 is a believer (1), neutral (2), or denier (3). The creationday is a numeric value tied to the creation date. The onset and terminus fields mark the first and last days of any recorded interaction between user1 and user2, respectively, and duration captures the total time they have interacted. Finally, the w field indicates the number of interactions (such as replies, retweets, or direct messages) exchanged between them in a Twitter context.	Preview Download
		UkraineVaccineV1.txt Plain Text - 79.7 MB - Jun 23, 2025 - 0 Downloads MD5: 187103b4c057f023a2426880ff677352 License: CC0 Creative Commons Zero 1.0 In the Ukraine war and Vaccine dataset, the “createdate” indicates the date of that interaction. The “likecount,” “retweetcount,” “replycount,” and “quotecount” columns capture various engagement metrics on Twitter—how many times a tweet is liked, retweeted, replied to, or quoted. The “user1” and “user2” fields store unique user IDs, whereas “user1proukraine,” “user1provaccine,” “user2proukraine,” and “user2provaccine” denote each user’s stance (e.g., pro, anti, or unknown) regarding Ukraine and vaccines. The “creationday” is a numeric value corresponding to the creation date, while “onset” and “terminus” mark the first and last recorded interactions between user1 and user2, respectively. Finally, “duration” shows the total time span across which these interactions took place.	Preview Download

Select File(s)

Please select a file or files to be deleted.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Edit license

License

For selected file(s) set a license to

Select File(s)

Please select a file or files to be downloaded.

Select File(s)

Please select a file or files for access request.

Select File(s)

Please select restricted file(s) to be unrestricted.

Request Access

You need to Log In/Sign Up to request access to this file.

Continue

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Asterisks indicate required fields

Request Access

Access to file(s) subject to additional consent under following conditions:

Package File Download

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Embargo

Are you sure you want to lift the embargo?

Once you lift the embargo, you will not be able to set it again.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Dataset Private URL

Use a Private URL to allow those without Dataverse accounts to access your dataset. For more information about the Private URL feature, please refer to the User Guide.

Private URL has not been created.

Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your dataset.

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Additional information for the dataset reviewer:

Submit

Publish Dataset

This dataset cannot be published until Kozminski University is published. Would you like to publish both right now?

Once you publish this dataset it must remain published.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Kozminski University is published by its administrator.

Publish Dataset

This dataset cannot be published until Kozminski University and RepOD are published.

Return for modification

Additional information

Return e-mail address

Send a copy of this message to the return e-mail address.