User Tools

Site Tools


cqp:list-of-coprora

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cqp:list-of-coprora [2024/10/29 17:10] – [PENN-HELSINKI PARSED CORPUS OF MIDDLE ENGLISH (Version 2)] aamoakuhcqp:list-of-coprora [2025/02/05 16:51] (current) – [Corpus of English Dialogues 1560-1760] add different documentation link aamoakuh
Line 6: Line 6:
 If you don't have access to CQP just yet, check out the [[inlet:setup|INLET site]] to install the system on your account.  If you don't have access to CQP just yet, check out the [[inlet:setup|INLET site]] to install the system on your account. 
  
-For more information on the [[inlet:overview|INLET system]], visit this site.+For more information on the INLET system, visit [[inlet:overview|this site]].
  
 For more detailed information on each of these corpora, select the corpus on CQP, type ''info'' and press ''ENTER'' For more detailed information on each of these corpora, select the corpus on CQP, type ''info'' and press ''ENTER''
Line 21: Line 21:
 **Text publication dates:** 1960-1993 (split up into 3 periods) **Text publication dates:** 1960-1993 (split up into 3 periods)
  
-**Tagset:** CLAWS-5+**Tagset:** [[corpora:tagset-claws5|CLAWS-5]]
  
 **Cite as:**  **Cite as:** 
Line 36: Line 36:
 **Size:** 4,644,834 tokens **Size:** 4,644,834 tokens
  
-**Tagset:** CLAWS-5+**Tagset:** [[corpora:tagset-claws5|CLAWS-5]]
  
 **Corpus documentation:** http://www.natcorp.ox.ac.uk/corpus/baby/manual.pdf **Corpus documentation:** http://www.natcorp.ox.ac.uk/corpus/baby/manual.pdf
  
 +
 +===== BNC2014-S =====
 +
 +==== Spoken British National Corpus 2014 ====
 +
 +**Size:** 1,1422,615 tokens
 +
 +**Text publication dates**: 2012-2016
 +
 +**Tagset:** [[https://ucrel.lancs.ac.uk/claws6tags.html|CLAWS-6]]
 +
 +**Corpus documentation:** http://corpora.lancs.ac.uk/bnc2014/documentation.php
 +
 +**Cite as**:  Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics 22(3). 319–344. https://doi.org/10.1075/ijcl.22.3.02lov.
 +
 +
 +===== CLMET =====
 +
 +==== CORPUS OF LATE MODERN ENGLISH TEXTS ====
 +
 +**Size**: 40,340,760 tokens
 +
 +**Text publication dates**: 1710-1920 (split up into 3 periods)
 +
 +**Tagset**: [[corpora:tagset-penn|PENN Corpora]]
 +
 +**Corpus documentation**: https://perswww.kuleuven.be/~u0044428/clmet3_0.htm
 +
 +**Cite as**: De Smet, Hendrik, Susanne Flach, Jukka Tyrkkö & Hans-Jügen Diller. 2015. The Corpus of Late Modern English (CLMET), version 3.1: Improved tokenization and linguistic annotation. KU Leuven, FU Berlin, U Tampere, RU Bochum.
 +
 +
 +===== BROWN-LEGACY  ===== 
 +
 +==== The Standard Corpus of Present-Day Edited American English ====
 +
 +**Size**: 1,137,466 tokens (approx. 1m words)
 +
 +**Text publication dates**: 1961
 +
 +**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/BROWN/index.html
 +
 +**Cite as**: A Standard Corpus of Present-Day Edited American English, for use with Digital Computers (Brown). 1964, 1971, 1979. Compiled by W. N. Francis and H. Kučera. Brown University. Providence, Rhode Island.
 +
 +
 +===== FROWN-LEGACY ===== 
 +
 +==== The Freiburg-Brown corpus of American English ==== 
 +
 +**Size**: 1,180,152 (approx. 1m words)
 +
 +**Text publication dates**: 1992
 +
 +**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/FROWN/index.html
 +
 +**Cite as**: The Freiburg-Brown Corpus (‘Frown’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster
 +
 +
 +===== LOB-LEGACY ===== 
 +
 +==== The Lancaster-Oslo/Bergen Corpus ====
 +
 +**Size**: 1,157,496 tokens (approx. 1m words)
 +
 +**Text publication dates**: 1961
 +
 +**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/LOB/index.html
 +
 +**Cite as**: The LOB Corpus, POS-tagged version (1981–1986), compiled by Geoffrey Leech, Lancaster University, Stig Johansson, University of Oslo (project leaders), Roger Garside, Lancaster University, and Knut Hofland, University of Bergen (heads of computing).
 +
 +
 +===== FLOB-LEGACY ===== 
 +
 +==== The Freiburg–LOB Corpus of British English ====
 +
 +**Size**: 1,165,747 tokens (approx. 1m words)
 +
 +**Text publication dates**: 1991
 +
 +**Corpus documentation**: https://varieng.helsinki.fi/CoRD/corpora/FLOB/index.html
 +
 +**Cite as**: The Freiburg-LOB Corpus (‘F-LOB’) (POS-tagged version) compiled by Christian Mair, Albert Ludwigs-Universität Freiburg, and Geoffrey Leech, University of Lancaster
  
  
Line 55: Line 136:
 **Cite as:** **Cite as:**
 Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. //International Corpus of Learner English (ICLE)//. Louvain: Presses Universitaires de Louvain. Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. //International Corpus of Learner English (ICLE)//. Louvain: Presses Universitaires de Louvain.
 +
  
 ===== COCA-S  ===== ===== COCA-S  =====
Line 64: Line 146:
 **Text publication dates:** 1990-2012 **Text publication dates:** 1990-2012
  
-**Tagset:** CLAWS-7 +**Tagset:** [[corpora:tagset-claws7-coxa|CLAWS-7]]
  
 **Corpus documentation:** http://corpus.byu.edu/coca **Corpus documentation:** http://corpus.byu.edu/coca
Line 78: Line 160:
 **Size:** 471,427,380 tokens (400m words) **Size:** 471,427,380 tokens (400m words)
  
-**Tagset:** CLAWS-7 +**Tagset:** [[corpora:tagset-claws7-coxa|CLAWS-7]]
  
 **Corpus documentation:** http://corpus.byu.edu/coha/ **Corpus documentation:** http://corpus.byu.edu/coha/
Line 96: Line 178:
 **Text publication dates:** 1150-1500 (split up into 9 periods)  **Text publication dates:** 1150-1500 (split up into 9 periods) 
  
-**Tagset:** Penn Corpora +**Tagset:** [[corpora:tagsets|PENN Corpora]]
  
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3/+**Corpus documentation:** https://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-4/index.html , https://github.com/beatrice57/ppche-2024/tree/main/PPCME2-RELEASE-5/docs
  
 **Cite as:** **Cite as:**
Line 108: Line 190:
 ==== PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH ==== ==== PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH ====
  
-**Size:** 1,968,483+**Size:** 1,968,483 tokens
  
 **Text publication dates:** 1500-1710  **Text publication dates:** 1500-1710 
  
-**Tagset:** Penn Corpora +**Tagset:** [[corpora:tagsets|PENN Corpora]] 
  
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/+**Corpus documentation:** https://github.com/beatrice57/ppche-2024/tree/main/PPCEME-RELEASE-4/docs , https://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-3/index.html
  
 **Cite as:** **Cite as:**
Line 129: Line 211:
 **Text publication dates:** 1700-1914  **Text publication dates:** 1700-1914 
  
-**Tagset:** Penn Corpora+**Tagset:** [[corpora:tagsets|PENN Corpora]]
  
-**Corpus documentation:** http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/+**Corpus documentation:** https://github.com/beatrice57/ppche-2024/tree/main/PPCMBE2-RELEASE-2/docs , https://www.ling.upenn.edu/hist-corpora/PPCMBE2-RELEASE-1/index.html
  
 **Cite as:** **Cite as:**
Line 147: Line 229:
 **Text publication dates:** 1350-1710 (split up into 5 periods)  **Text publication dates:** 1350-1710 (split up into 5 periods) 
  
-**Tagset:** Penn Corpora+**Tagset:** [[corpora:tagsets|PENN Corpora]]
  
 **Corpus documentation:** http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm **Corpus documentation:** http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm
Line 153: Line 235:
 **Cite as:** **Cite as:**
 Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive. Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.
 +
 +
 +===== CED =====
 +
 +==== Corpus of English Dialogues 1560-1760 ====
 +
 +**Size:** 1,458,700 tokens
 +
 +**Text publication dates:** 1560-1760 (split up into 5 periods) 
 +
 +**Tagset**: untagged
 +
 +**Corpus documentation:** https://data.ldaca.edu.au/collection?id=arcp%3A%2F%2Fname%2Chdl10.26180~23961609&_crateId=arcp%3A%2F%2Fname%2Chdl10.26180~23961609
 +
 +**Cite as:** A Corpus of English Dialogues 1560—1760. 2006. Compiled under the supervision of Merja Kyto (Uppsala University) and Jonathan Culpeper (Lancaster University).
 +
 +
 +===== COOEE =====
 +
 +==== Corpus of Oz Early English ====
 +
 +**Size:** 2,243,235 tokens
 +
 +**Text publication dates:** 1788-1900
 +
 +**Tagset:** [[corpora:tagset-treetagger|TreeTagger]]
 +
 +**Corpus documentation:** https://varieng.helsinki.fi/CoRD/corpora/COOEE/index.html
 +
 +**Cite as:** Fritz, Clemens W. A. 2012. From English in Australia to Australian English: 1788-1900. Frankfurt am Main: Peter Lang.
 +
  
  
 **[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]]  -- [[cqp:list-of-coprora|Section 7]] ]** **[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]]  -- [[cqp:list-of-coprora|Section 7]] ]**
  
cqp/list-of-coprora.1730218208.txt.gz · Last modified: 2024/10/29 17:10 by aamoakuh

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki