cqp:list-of-coprora
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cqp:list-of-coprora [2024/10/29 17:10] – [PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH] aamoakuh | cqp:list-of-coprora [2025/05/15 00:52] (current) – fix number of tokens aamoakuh | ||
|---|---|---|---|
| Line 6: | Line 6: | ||
| If you don't have access to CQP just yet, check out the [[inlet: | If you don't have access to CQP just yet, check out the [[inlet: | ||
| - | For more information on the [[inlet: | + | For more information on the INLET system, visit [[inlet: |
| For more detailed information on each of these corpora, select the corpus on CQP, type '' | For more detailed information on each of these corpora, select the corpus on CQP, type '' | ||
| Line 21: | Line 21: | ||
| **Text publication dates:** 1960-1993 (split up into 3 periods) | **Text publication dates:** 1960-1993 (split up into 3 periods) | ||
| - | **Tagset:** CLAWS-5 | + | **Tagset: |
| **Cite as:** | **Cite as:** | ||
| Line 36: | Line 36: | ||
| **Size:** 4,644,834 tokens | **Size:** 4,644,834 tokens | ||
| - | **Tagset:** CLAWS-5 | + | **Tagset: |
| **Corpus documentation: | **Corpus documentation: | ||
| + | |||
| + | ===== BNC2014-S ===== | ||
| + | |||
| + | ==== Spoken British National Corpus 2014 ==== | ||
| + | |||
| + | **Size:** 11,422,615 tokens | ||
| + | |||
| + | **Text publication dates**: 2012-2016 | ||
| + | |||
| + | **Tagset:** [[https:// | ||
| + | |||
| + | **Corpus documentation: | ||
| + | |||
| + | **Cite as**: Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3). 319–344. https:// | ||
| + | |||
| + | |||
| + | ===== CLMET ===== | ||
| + | |||
| + | ==== CORPUS OF LATE MODERN ENGLISH TEXTS ==== | ||
| + | |||
| + | **Size**: 40,340,760 tokens | ||
| + | |||
| + | **Text publication dates**: 1710-1920 (split up into 3 periods) | ||
| + | |||
| + | **Tagset**: [[corpora: | ||
| + | |||
| + | **Corpus documentation**: | ||
| + | |||
| + | **Cite as**: De Smet, Hendrik, Susanne Flach, Jukka Tyrkkö & Hans-Jügen Diller. 2015. The Corpus of Late Modern English (CLMET), version 3.1: Improved tokenization and linguistic annotation. KU Leuven, FU Berlin, U Tampere, RU Bochum. | ||
| + | |||
| + | |||
| + | ===== BROWN-LEGACY | ||
| + | |||
| + | ==== The Standard Corpus of Present-Day Edited American English ==== | ||
| + | |||
| + | **Size**: 1,137,466 tokens (approx. 1m words) | ||
| + | |||
| + | **Text publication dates**: 1961 | ||
| + | |||
| + | **Corpus documentation**: | ||
| + | |||
| + | **Cite as**: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island. | ||
| + | |||
| + | |||
| + | ===== FROWN-LEGACY ===== | ||
| + | |||
| + | ==== The Freiburg-Brown corpus of American English ==== | ||
| + | |||
| + | **Size**: 1,180,152 (approx. 1m words) | ||
| + | |||
| + | **Text publication dates**: 1992 | ||
| + | |||
| + | **Corpus documentation**: | ||
| + | |||
| + | **Cite as**: The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster | ||
| + | |||
| + | |||
| + | ===== LOB-LEGACY ===== | ||
| + | |||
| + | ==== The Lancaster-Oslo/ | ||
| + | |||
| + | **Size**: 1,157,496 tokens (approx. 1m words) | ||
| + | |||
| + | **Text publication dates**: 1961 | ||
| + | |||
| + | **Corpus documentation**: | ||
| + | |||
| + | **Cite as**: The LOB Corpus, POS-tagged version (1981–1986), | ||
| + | |||
| + | |||
| + | ===== FLOB-LEGACY ===== | ||
| + | |||
| + | ==== The Freiburg–LOB Corpus of British English ==== | ||
| + | |||
| + | **Size**: 1,165,747 tokens (approx. 1m words) | ||
| + | |||
| + | **Text publication dates**: 1991 | ||
| + | |||
| + | **Corpus documentation**: | ||
| + | |||
| + | **Cite as**: The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster | ||
| Line 55: | Line 136: | ||
| **Cite as:** | **Cite as:** | ||
| Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. // | Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. // | ||
| + | |||
| ===== COCA-S | ===== COCA-S | ||
| Line 64: | Line 146: | ||
| **Text publication dates:** 1990-2012 | **Text publication dates:** 1990-2012 | ||
| - | **Tagset:** CLAWS-7 | + | **Tagset: |
| **Corpus documentation: | **Corpus documentation: | ||
| Line 78: | Line 160: | ||
| **Size:** 471,427,380 tokens (400m words) | **Size:** 471,427,380 tokens (400m words) | ||
| - | **Tagset:** CLAWS-7 | + | **Tagset: |
| **Corpus documentation: | **Corpus documentation: | ||
| Line 96: | Line 178: | ||
| **Text publication dates:** 1150-1500 (split up into 9 periods) | **Text publication dates:** 1150-1500 (split up into 9 periods) | ||
| - | **Tagset: | + | **Tagset: |
| - | **Corpus documentation: | + | **Corpus documentation: |
| **Cite as:** | **Cite as:** | ||
| Line 112: | Line 194: | ||
| **Text publication dates:** 1500-1710 | **Text publication dates:** 1500-1710 | ||
| - | **Tagset: | + | **Tagset: |
| - | **Corpus documentation: | + | **Corpus documentation: |
| **Cite as:** | **Cite as:** | ||
| Line 129: | Line 211: | ||
| **Text publication dates:** 1700-1914 | **Text publication dates:** 1700-1914 | ||
| - | **Tagset: | + | **Tagset: |
| - | **Corpus documentation: | + | **Corpus documentation: |
| **Cite as:** | **Cite as:** | ||
| Line 147: | Line 229: | ||
| **Text publication dates:** 1350-1710 (split up into 5 periods) | **Text publication dates:** 1350-1710 (split up into 5 periods) | ||
| - | **Tagset: | + | **Tagset: |
| **Corpus documentation: | **Corpus documentation: | ||
| Line 153: | Line 235: | ||
| **Cite as:** | **Cite as:** | ||
| Parsed Corpus of Early English Correspondence, | Parsed Corpus of Early English Correspondence, | ||
| + | |||
| + | |||
| + | ===== CED ===== | ||
| + | |||
| + | ==== Corpus of English Dialogues 1560-1760 ==== | ||
| + | |||
| + | **Size:** 1,458,700 tokens | ||
| + | |||
| + | **Text publication dates:** 1560-1760 (split up into 5 periods) | ||
| + | |||
| + | **Tagset**: untagged | ||
| + | |||
| + | **Corpus documentation: | ||
| + | |||
| + | **Cite as:** A Corpus of English Dialogues 1560—1760. 2006. Compiled under the supervision of Merja Kyto (Uppsala University) and Jonathan Culpeper (Lancaster University). | ||
| + | |||
| + | |||
| + | ===== COOEE ===== | ||
| + | |||
| + | ==== Corpus of Oz Early English ==== | ||
| + | |||
| + | **Size:** 2,243,235 tokens | ||
| + | |||
| + | **Text publication dates:** 1788-1900 | ||
| + | |||
| + | **Tagset:** [[corpora: | ||
| + | |||
| + | **Corpus documentation: | ||
| + | |||
| + | **Cite as:** Fritz, Clemens W. A. 2012. From English in Australia to Australian English: 1788-1900. Frankfurt am Main: Peter Lang. | ||
| + | |||
| **[ Introduction to CQP: [[cqp: | **[ Introduction to CQP: [[cqp: | ||
cqp/list-of-coprora.1730218226.txt.gz · Last modified: by aamoakuh
