Dataset Description
This dataset contains the results of queries performed against the MDPI Open Review Corpus 2, provided by SketchEngine. The files include concordances, frequency lists, and collocations derived from the corpus. The dataset is designed for linguistic analysis, text mining, and other research purposes.
Contents
File Types
- XLSX Files: Several Excel spreadsheets containing different types of query results.
- CSV File: A single CSV file with query results, provided for ease of use with plain-text processing tools.
File Details
- Concordances: Lines of text from the corpus showing specific keywords in context, useful for examining word usage or phrase structures.
- Frequency Lists: Lists of words or phrases sorted by their frequency within the corpus.
- Collocations: Pairs or groups of words that frequently occur together, providing insight into common word combinations.
File Naming
The filenames are self-explanatory and indicate the content of each file (e.g., `concordances.xlsx`, `frequency_list.csv`, `collocations.xlsx`). These names describe the type of data contained within and align with the corresponding queries.
Query Details
The exact queries used to generate the results are included in the data files themselves, typically at the top or within the header rows.
Usage
1. Exploration:
- Open the XLSX or CSV files using spreadsheet software (e.g., Microsoft Excel, Google Sheets) or programmatically with tools like Python (`pandas`) or R.
- Review the filenames to identify the type of data (e.g., concordances, frequency lists, collocations) for your analysis needs.
2. Analysis:
- Use concordances to study word usage in context.
- Use frequency lists to identify commonly used words or phrases.
- Use collocations to explore patterns of word co-occurrence.
3. Reproducibility:
- The queries are included in the files, enabling you to replicate or refine the results using SketchEngine.
Requirements
To work with the dataset, you will need:
- Spreadsheet software for XLSX and CSV files.
- Programming tools for advanced analysis, such as Python with `pandas` or R with `readxl`.
Licensing and Attribution
- The MDPI Open Review Corpus 2 is provided by SketchEngine. Users must adhere to its terms of use.
- This dataset contains derived data (concordances, frequency lists, and collocations). When using this dataset for research or publications, please cite SketchEngine and acknowledge the original MDPI Open Review Corpus 2.
Contact
For questions or further information, contact the dataset maintainer or refer to SketchEngine’s support resources.