Data set for the paper "Turing's Conceptual Engineering", by Marcin Miłkowski, as appearing in Philosophies, 2022, 7(3), 69, available at https://doi.org/10.3390/philosophies7030069
Section for Logic & Cognitive Science
Institute of Philosophy and Sociology
Polish Academy of Science
Generated by Marcin Miłkowski (2022) using SketchEngine and gensim from a corpus of published Alan Turing's writing and correspondence, as contained in 'Essential Turing' [1] and 'Collected Works of Allan Turing' [2, 3]. Detailed description of the corpus construction follows below.
The source corpus contained the following papers, in the alphabetic order:
1. A Diffusion Reaction Theory of Morphogenesis in Plants (1952, with C.W. Wardlaw, from [3]
2. Can Automatic Calculating Machines Be Said To Think? (1952), Radio Discussion including Alan Turing, Richard Braithwaite, Geoffrey Jefferson, and Max Newman, from [1]
2. Can Digital Computers Think (1951), from [1]
3. Checking a Large Routine (1949), from [2]
4. Chemical Basis of Morphogenesis (1952), from [1]
5. Chess (1951), from [1]
6. Computing Machinery and Intelligence (1950), from [1]
7. Intelligent Machinery (1948), from [1]
8. Intelligent Machinery, A Heretical Theory (circa 1951), from [1]
9. Lecture on the Automatic Computing Engine (1947),
10. Lecture to the London Mathematical Society on 20 February 1947 (1947), from [2]
11. Excerpts from correspondence (1936-1938), from [1]
12. Letters on Logic to Max Newman (circa 1940), from [1]
13. Letter to W. Ross Ashby (circa 1947), from [2]
14. Letter to to Winston Churchill (1941), from [1]
15. Memorandum to OP-20-G on Naval Enigma, chapters: 1, from [1], as well as all available textual excerpts from Chapter 5, 6, 7 from [4]
16. Morphogen Theory of Phyllotaxis (1952), from [3]
17. On Computable Numbers, with an Application to the Entscheidungsproblem (1936), from [1]
18. On Computable Numbers. A Correction (1937), from [1]
19. Outline of the Development of the Daisy (1952), from [3]
20. Proposals for Development in the Mathematics Division of an Automatic Computing Engine (ACE) (1945), from [2]
21. Solvable and Unsolvable Problems (1954), from [1]
22. Systems of Logic Based on Ordinals (1938), from [1]
The corpus does not contain Turing's papers in pure mathematics and some of his work in logic, mostly because these are only available in the form of scans, for which available OCR software does not really work well. Moreover, the textual analysis of mathematical notation seems somewhat pointless. Papers in [1] and [4] were already available in edited and digitalized form in Oxford Scholarship Online and were simply copied from the XHTML version to UTF-8 text files, whereas papers from [2] and [3] have been scanned and OCRed using tesseract v5.0.0-alpha.20201127, in the default LSTM setting for English. The resultant OCRed files were minimally manually edited to remove page numbers, editorial note numbers, page headers, and hyphenation.
The source corpus will be made available in 2025 when Alan Turing's work will enter public domain.
The provided data set includes 6 data files that correspond to figures in the paper:
1. thesaurus intelligence.csv - terms semantically related to ''intelligence''
2. thesaurus intelligent.csv - terms semantically related to ''intelligent''
3. thesaurus mind.csv - terms semantically related to ''mind''
4. thesaurus thinking.csv - terms semantically related to ''thinking''
5. thesaurus definition.csv - terms semantically related to ''definition''
6. machine_modifiers.csv - modifiers of ''machine'' as found in the word sketch for ''machine''
7. word sketch brain mind.csv - word sketch difference data (the contrasted terms are ''brain'' and ''mind'')
The data was produced using SketchEngine. To reproduce:
- To reproduce the data sets in 1-5, enter the term of interest on the Thesaurus tab after loading the corpus file in SketchEngine, using default settings for English.
- To reproduce data for 6, enter ''machine'' on the word sketch tab.
- To reproduce data in 7, enter the terms of interest on the Word Sketch Difference tab.
After entering the term, click GO. The data will display in a tabular form, which can be visualized or saved in the CSV format.
Additionally, the data set contains two files:
8. word2vec.py - a Python script (requires Python 3 and gensim) to produce a word2vec model
9. Turing.model - word2vec word embedding model created from the source corpus.
References
[1] Turing, A. The Essential Turing. Seminal Writings in Computing, Logic, Philosophy, Artificial Intelligence, and Artificial Life plus The Secrets of Enigma; Copeland, B.J., Ed.; Oxford University Press: Oxford, 2004; ISBN 0–19–825079–7.
[2] Turing, A. Mechanical Intelligence; Ince, D., Ed.; Collected works of A.M. Turing; North-Holland ; Distributors for the U.S. and Canada, Elsevier Science Pub. Co: Amsterdam; New York: New York, NY, U.S.A, 1992; ISBN 978-0-444-88058-1.
[3] Turing, A.; Saunders, P.T.; Turing, A. Morphogenesis; Collected works of A.M. Turing; North-Holland: Amsterdam; New York, NY, U.S.A, 1992; ISBN 978-0-444-88486-2.
[4] Turing's Treatise on Enigma, http://www.ellsbury.com/profsbk/profsbk-080.htm (manually transcribed from archives held in the United Kingdom, The National Archives, Kew, Richmond, Surrey, TW9 4DU.)
The files are being made available under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).