Differences

This shows you the differences between two versions of the page.

--- cqp:list-of-coprora [2024/10/29 17:10] – [PENN-HELSINKI PARSED CORPUS OF MIDDLE ENGLISH (Version 2)] aamoakuh
+++ cqp:list-of-coprora [2025/02/05 16:51] (current) – [Corpus of English Dialogues 1560-1760] add different documentation link aamoakuh
@@ Line 6: / Line 6: @@
 If you don't have access to CQP just yet, check out the [[inlet:setup|INLET site]] to install the system on your account.
-For more information on the [[inlet:overview|INLET system]], visit this site.
+For more information on the INLET system, visit [[inlet:overview|this site]].
 For more detailed information on each of these corpora, select the corpus on CQP, type ''info'' and press ''ENTER''.
@@ Line 21: / Line 21: @@
 **Text publication dates:** 1960-1993 (split up into 3 periods)
-**Tagset:** CLAWS-5
+**Tagset:** [[corpora:tagset-claws5|CLAWS-5]]
 **Cite as:**
@@ Line 36: / Line 36: @@
 **Size:** 4,644,834 tokens
-**Tagset:** CLAWS-5
+**Tagset:** [[corpora:tagset-claws5|CLAWS-5]]
 **Corpus documentation:** http://www.natcorp.ox.ac.uk/corpus/baby/manual.pdf
+===== BNC2014-S =====
+==== Spoken British National Corpus 2014 ====
+**Size:** 1,1422,615 tokens
+**Text publication dates**: 2012-2016
+**Tagset:** [[https://ucrel.lancs.ac.uk/claws6tags.html|CLAWS-6]]
+**Corpus documentation:** http://corpora.lancs.ac.uk/bnc2014/documentation.php
+**Cite as**:  Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3). 319–344. https://doi.org/10.1075/ijcl.22.3.02lov.
+===== CLMET =====
+==== CORPUS OF LATE MODERN ENGLISH TEXTS ====
+**Size**: 40,340,760 tokens
+**Text publication dates**: 1710-1920 (split up into 3 periods)
+**Tagset**: [[corpora:tagset-penn|PENN Corpora]]
+**Corpus documentation**: https://perswww.kuleuven.be/~u0044428/clmet3_0.htm
+**Cite as**: De Smet, Hendrik, Susanne Flach, Jukka Tyrkkö & Hans-Jügen Diller. 2015. The Corpus of Late Modern English (CLMET), version 3.1: Improved tokenization and linguistic annotation. KU Leuven, FU Berlin, U Tampere, RU Bochum.
+===== BROWN-LEGACY  =====
+==== The Standard Corpus of Present-Day Edited American English ====
+**Size**: 1,137,466 tokens (approx. 1m words)
+**Text publication dates**: 1961
+**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/BROWN/index.html
+**Cite as**: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.
+===== FROWN-LEGACY =====
+==== The Freiburg-Brown corpus of American English ====
+**Size**: 1,180,152 (approx. 1m words)
+**Text publication dates**: 1992
+**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/FROWN/index.html
+**Cite as**: The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster
+===== LOB-LEGACY =====
+==== The Lancaster-Oslo/Bergen Corpus ====
+**Size**: 1,157,496 tokens (approx. 1m words)
+**Text publication dates**: 1961
+**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/LOB/index.html
+**Cite as**: The LOB Corpus, POS-tagged version (1981–1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).
+===== FLOB-LEGACY =====
+==== The Freiburg–LOB Corpus of British English ====
+**Size**: 1,165,747 tokens (approx. 1m words)
+**Text publication dates**: 1991
+**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/FLOB/index.html
+**Cite as**: The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster
@@ Line 55: / Line 136: @@
 **Cite as:**
 Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. //International Corpus of Learner English (ICLE)//. Louvain: Presses Universitaires de Louvain.
 ===== COCA-S  =====
@@ Line 64: / Line 146: @@
 **Text publication dates:** 1990-2012
-**Tagset:** CLAWS-7
+**Tagset:** [[corpora:tagset-claws7-coxa|CLAWS-7]]
 **Corpus documentation:** http://corpus.byu.edu/coca
@@ Line 78: / Line 160: @@
 **Size:** 471,427,380 tokens (400m words)
-**Tagset:** CLAWS-7
+**Tagset:** [[corpora:tagset-claws7-coxa|CLAWS-7]]
 **Corpus documentation:** http://corpus.byu.edu/coha/
@@ Line 96: / Line 178: @@
 **Text publication dates:** 1150-1500 (split up into 9 periods)
-**Tagset:** Penn Corpora
+**Tagset:** [[corpora:tagsets|PENN Corpora]]
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3/
+**Corpus documentation:** https://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-4/index.html , https://github.com/beatrice57/ppche-2024/tree/main/PPCME2-RELEASE-5/docs
 **Cite as:**
@@ Line 108: / Line 190: @@
 ==== PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH ====
-**Size:** 1,968,483
+**Size:** 1,968,483 tokens
 **Text publication dates:** 1500-1710
-**Tagset:** Penn Corpora
+**Tagset:** [[corpora:tagsets|PENN Corpora]]
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/
+**Corpus documentation:** https://github.com/beatrice57/ppche-2024/tree/main/PPCEME-RELEASE-4/docs , https://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-3/index.html
 **Cite as:**
@@ Line 129: / Line 211: @@
 **Text publication dates:** 1700-1914
-**Tagset:** Penn Corpora
+**Tagset:** [[corpora:tagsets|PENN Corpora]]
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/
+**Corpus documentation:** https://github.com/beatrice57/ppche-2024/tree/main/PPCMBE2-RELEASE-2/docs , https://www.ling.upenn.edu/hist-corpora/PPCMBE2-RELEASE-1/index.html
 **Cite as:**
@@ Line 147: / Line 229: @@
 **Text publication dates:** 1350-1710 (split up into 5 periods)
-**Tagset:** Penn Corpora
+**Tagset:** [[corpora:tagsets|PENN Corpora]]
 **Corpus documentation:** http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm
@@ Line 153: / Line 235: @@
 **Cite as:**
 Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.
+===== CED =====
+==== Corpus of English Dialogues 1560-1760 ====
+**Size:** 1,458,700 tokens
+**Text publication dates:** 1560-1760 (split up into 5 periods)
+**Tagset**: untagged
+**Corpus documentation:** https://data.ldaca.edu.au/collection?id=arcp%3A%2F%2Fname%2Chdl10.26180~23961609&_crateId=arcp%3A%2F%2Fname%2Chdl10.26180~23961609
+**Cite as:** A Corpus of English Dialogues 1560—1760. 2006. Compiled under the supervision of Merja Kyto (Uppsala University) and Jonathan Culpeper (Lancaster University).
+===== COOEE =====
+==== Corpus of Oz Early English ====
+**Size:** 2,243,235 tokens
+**Text publication dates:** 1788-1900
+**Tagset:** [[corpora:tagset-treetagger|TreeTagger]]
+**Corpus documentation:** https://varieng.helsinki.fi/CoRD/corpora/COOEE/index.html
+**Cite as:** Fritz, Clemens W. A. 2012. From English in Australia to Australian English: 1788-1900. Frankfurt am Main: Peter Lang.
 **[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]]  -- [[cqp:list-of-coprora|Section 7]] ]**