eLife Open Peer Review Corpus
Section for Logic & Cognitive Science, Institute of Philosophy and Sociology, Polish Academy of Science
Generated by Ksawery Jasieński under supervision of Marcin Miłkowski (2022)
eLife is committed to open peer review idea. Reviews are not available for download in a single package, so they must be extracted from their complete data set, which is several gigabytes large.
The data set contains all available peer reviews as of 24 July 2022 (10853 papers with reviews or at least decision letters that omit minor comments). As stated by eLife:
“In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.”
In addition, the corpus contains metadata about particular reviews (the filename contains ‘r’ and a numerical id before ‘.xml’), author responses (the filename then contains ‘a’ and a number before the ‘.xml’ suffix) and paper metadata in the JSON format. The JSON schema files are available in the schema
subdirectory in files with self-explanatory names.
The original files were not enriched with any linguistic annotation or converted (these are in XML format, as used by eLife).
Additionally, we are making the crawler code available, in the review_crawler
directory. For eLife reviews, elife_crawler.py
should be used (more notes in the subdirectory).
The files are being made available under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).