This is an old revision of the document!
Table of Contents
[ Collection: Introduction to CQP ]
7. Available Corpora
Once you've set up your access to CQP on the university server (that is the INLET corpus system), you'll have a selection of different corpora at your disposal. This list will introduce you to some of them that might be interesting to you. If you don't have access to CQP just yet, check out the INLET site to install the system on your account.
For more information on the INLET system, visit this site.
For more detailed information on each of these corpora, select the corpus on CQP, type info
and press ENTER
.
BNC
BRITISH NATIONAL CORPUS
Size: 112,156,361 tokens
Text publication dates: 1960-1993 (split up into 3 periods)
Tagset: CLAWS-5
Cite as: BNC Consortium. 2007. The British National Corpus, version 3 (BNC XML Edition). Oxford: Bodleian Libraries, University of Oxford. URL: http://www.natcorp.ox.ac.uk/
Corpus documentation: http://www.natcorp.ox.ac.uk/
BNC-BABY
BRITISH NATIONAL CORPUS (a smaller version)
Size: 4,644,834 tokens
Tagset: CLAWS-5
Corpus documentation: http://www.natcorp.ox.ac.uk/corpus/baby/manual.pdf
ICLE
INTERNATIONAL CORPUS OF LEARNER ENGLISH
Size: 2,518,276 tokens
Author's first languages: Bulgarian, Czech, Dutch (Netherlands), Dutch (Belgium), French, German, Italian, Polish, Russian, etc.
Corpus documentation: https://uclouvain.be/en/research-institutes/ilc/cecl/icle.html
Cite as: Granger, Sylviane, Estelle Dagneaux & Fanny Meunier. 2002. International Corpus of Learner English (ICLE). Louvain: Presses Universitaires de Louvain.
COCA-S
CORPUS OF CONTEMPORARY AMERICAN ENGLISH (COCA)
Size: 542,341,719 tokens (440m words)
Text publication dates: 1990-2012
Tagset: CLAWS-7
Corpus documentation: http://corpus.byu.edu/coca
Cite as: Davies, Mark. 2008. The Corpus of Contemporary American English: 450 Million Words, 1990-2012. http://corpus.byu.edu/coca.
COHA-S
CORPUS OF HISTORICAL AMERICAN ENGLISH (COHA)
Size: 471,427,380 tokens (400m words)
Tagset: CLAWS-7
Corpus documentation: http://corpus.byu.edu/coha/
Cite as: Davies, Mark. 2010. The Corpus of Historical American English: 400 million words, 1810-2009. http://corpus.byu.edu/coha/.
PPCME2
PENN-HELSINKI PARSED CORPUS OF MIDDLE ENGLISH (Version 2)
Size: 1,354,926 tokens
Text publication dates: 1150-1500 (split up into 9 periods)
Tagset: Penn Corpora
Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCME2-RELEASE-3/
Cite as: Anthony Kroch and Ann Taylor. 2000. The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). Department of Linguistics, University of Pennsylvania. CD-ROM, second edition, (http://www.ling.upenn.edu/hist-corpora/).
PPCEME
PENN-HELSINKI PARSED CORPUS OF EARLY MODERN ENGLISH
Size: 1,968,483 tokens
Text publication dates: 1500-1710
Tagset: Penn Corpora
Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCEME-RELEASE-2/
Cite as: Anthony Kroch, Beatrice Santorini, and Lauren Delfs. 2004. The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). Department of Linguistics, University of Pennsylvania. CD-ROM, first edition, (http://www.ling.upenn.edu/hist-corpora/).
PPCMBE
PENN-HELSINKI PARSED CORPUS OF MODERN BRITISH ENGLISH
Size: 1,095,044 tokens
Text publication dates: 1700-1914
Tagset: Penn Corpora
Corpus documentation: http://www.ling.upenn.edu/hist-corpora/PPCMBE-RELEASE-1/
Cite as: Anthony Kroch, Beatrice Santorini, and Lauren Delfs. 2010. The Penn-Helsinki Parsed Corpus of Modern British English (PPCMBE). Department of Linguistics, University of Pennsylvania. CD-ROM, first edition. (http://www.ling.upenn.edu/hist-corpora/).
PPCEEC
PENN-HELSINKI PARSED CORPUS OF EARLY ENGLISH CORRESPONDENCE
Size: 2,371,920 tokens
Text publication dates: 1350-1710 (split up into 5 periods)
Tagset: Penn Corpora
Corpus documentation: http://www-users.york.ac.uk/~lang22/PCEEC-manual/corpus_description/index.htm
Cite as: Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York and Helsinki: University of Helsinki. Distributed through the Oxford Text Archive.
[ Introduction to CQP: Section 1 – Section 2 – Section 3 – Section 4 – Section 5 – Section 6 – Section 7 ]