Integrating Approaches to the Role of Metaphor in the Evolutionary Dynamics of Language

Metrics

10 Downloads

Share Dataset

Share this dataset on your favorite social media networks.

Integrating Approaches to the Role of Metaphor in the Evolutionary Dynamics of Language

Version 1.0

Placiński, Marek; Pleyer, Michael, 2024, "Integrating Approaches to the Role of Metaphor in the Evolutionary Dynamics of Language", https://doi.org/10.18150/7LWW8R, RepOD, V1

Learn about Data Citation Standards.

Description

POLSKI

1. Informacje ogólne

Tytuł zbioru danych: Zintegrowane podejście do roli metafory w ewolucji języka

Kierownik grantu: dr Michael Pleyer

Współwykonawca: dr Marek Placiński

kontakt: Marek Placiński, marpla@umk.pl

Data zbiórki danych: marzec 2023

Miejsce zbiórki danych: Toruń, Polska

słowa-klucze: wielkie modele językowe, automatyczna identyfikacja metafory, językoznawstwo komputerowe, teoria metafory konceptualnej, ewolucja kulturowa języka

źródło finansowania: Narodowe Centrum Nauki, program Polonez Bis, umowa nr 2021/43/P/HS2/02729.

2. Opis danych

code.zip zawiera kod napisany w języku Python, który posłużył do 1) dopasowania wielkiego modelu językowego PolBERT do naszego zbioru danych 2) obliczenia entropii informacyjnej 3) zidentyfikowanie wyrażeń potencjalnie metaforycznych na podstawie entropii

Dataset.zip zawiera dwie bazy danych: zbiór tekstów zawierających interesujące nas słowa-klucze (patrz powiązany artykuł). Teksty pochodzą z dwóch korpusów, Elektroniczny korpus tekstów polskich z XVII i XVIII w. (https://korba.edu.pl/query_corpus/) oraz Mikrokorpus polszczyzny 1830-1918 (http://www.f19.uw.edu.pl/2017/01/korpus-wersja-zaktualizowana/)

data utworzenia danych: 12.06.2024

3. Licencja: CC0 1.0 Universal

4. Zbiórka danych

Dane zostały automatycznie wyekstrahowane z korpusów Elektroniczny korpus tekstów polskich z XVII i XVIII w. oraz Mikrokorpus polszczyzny 1830-1918 (http://www.f19.uw.edu.pl/2017/01/korpus-wersja-zaktualizowana/)

5. Informacja o danych

nazwy zmiennych:

metaphorical - czy dane słowo ma znaczenie metaforyczne

word - lemat słowa-klucza

sent - zdanie, w którym dane słowo występuje

ENGLISH

1. General information

Title:

Integrating Approaches to the Role of Metaphor in the Evolutionary Dynamics of Language

PI: Michael Pleyer, PhD

Co-investigator: Marek Placiński, PhD

contact information: Marek Placiński, marpla@umk.pl

Date of data collection: czerwiec/lipiec 2024

Geographic location of data collection: Toruń, Polska

keywords: large language models, computational linguistics, automatic metaphor identification, evolutionary linguistics, conceptual metaphor theory

source of funding: National Science Centre, Poland, Polonez Bis program, agreement no 2021/43/P/HS2/02729.

2. Data and file overview

code.zip - contains Python code that was used to 1) fine-tune PolBERT LLM to our downstream task, 2) compute information entropy, 3) identify potentially metaphorical words based on enthropy

Dataset.zip contains two dataset: a collection of texts that contain keywords (see the related paper). The texts come from two corpora, Elektroniczny korpus tekstów polskich z XVII i XVIII w. (https://korba.edu.pl/query_corpus/) and Mikrokorpus polszczyzny 1830-1918 (http://www.f19.uw.edu.pl/2017/01/korpus-wersja-zaktualizowana/)

The files were created on 12.06.2024

3. Licence: CC0 1.0 Universal

4. Methodological information

Methods: the data was automatically extracted from Elektroniczny korpus tekstów polskich z XVII i XVIII w. (https://korba.edu.pl/query_corpus/) and Mikrokorpus polszczyzny 1830-1918 (http://www.f19.uw.edu.pl/2017/01/korpus-wersja-zaktualizowana/)

5. Data-specific information

names of variables:

metaphorical - whether the word is metaphorical

word - keyword lemma

sent - the sentence in which the keyword is attested

(2024)

Subject

Arts and Humanities

Keyword

conceptual metaphor theory, evolutinary linguistics, automatic metaphor identification, computational linguistics, large language model

Related Publication

"Integrating approaches to the role of metaphor in the evolutionary dynamics of language" https://doi.org/10.31234/osf.io/m56cn https://doi.org/10.31234/osf.io/m56cn doi:

License

CC0 Creative Commons Zero 1.0

Find

1 to 3 of 3 Files

			Download
		readme.txt Plain Text - 3.3 KB - Oct 16, 2024 - 2 Downloads MD5: 628575d37dd361859906f6bfed85fd95 License: CC0 Creative Commons Zero 1.0	Preview Download
		Dataset.zip ZIP Archive - 287.3 KB - Oct 16, 2024 - 4 Downloads MD5: 09777ed64f5d3e58533f3d3845296f5f License: CC0 Creative Commons Zero 1.0	Download
		Code.zip ZIP Archive - 5.1 KB - Oct 16, 2024 - 4 Downloads MD5: e66ec5d5bb4c8b110adffe6f20019818 License: CC0 Creative Commons Zero 1.0	Download

Select File(s)

Please select a file or files to be deleted.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Edit license

License

For selected file(s) set a license to

Select File(s)

Please select a file or files to be downloaded.

Select File(s)

Please select a file or files for access request.

Select File(s)

Please select restricted file(s) to be unrestricted.

Request Access

You need to Log In/Sign Up to request access to this file.

Continue

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Asterisks indicate required fields

Request Access

Access to file(s) subject to additional consent under following conditions:

Package File Download

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Embargo

Are you sure you want to lift the embargo?

Once you lift the embargo, you will not be able to set it again.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Dataset Anonymized Private URL

Use a Anonymized Private URL to allow those without Dataverse accounts to access your dataset. For more information about the Private URL feature, please refer to the User Guide.

Private URL has not been created.

"WARNING. This dataset has at least one published version. Those who have access to the Anonymized Private URL for this dataset may be able to use its accessible metadata to look up the full, not anonymized version of this dataset.

Dataset Anonymized Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your dataset.

Dataset Private URL

Use a Private URL to allow those without Dataverse accounts to access your dataset. For more information about the Private URL feature, please refer to the User Guide.

Private URL has not been created.

Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your dataset.

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Additional information for the dataset reviewer:

Submit

Publish Dataset

This dataset cannot be published until Linguistics is published. Would you like to publish both right now?

Once you publish this dataset it must remain published.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until Linguistics is published by its administrator.

Publish Dataset

This dataset cannot be published until Linguistics and Uniwersytet Mikołaja Kopernika w Toruniu are published.

Return for modification

Additional information

Return e-mail address

Send a copy of this message to the return e-mail address.