====== Tagsets ====== Many corpora are annotated for word class -- for every word form in the corpus, there is a “pos” tag describing what part of speech it is (see [[cqp:corpus-structure|Corpus Structure]] for more information). There is no generally agreed-upon set of word classes and no generally agreed-upon way of referring to word classes -- different corpora use different sets of tags. Thus, you always have to check the ''info'' file or the manual of a corpus in order to construct your queries. Some widely-used tagsets for English are the following: * [[corpora:tagset-penn|The Penn tagset]] * [[corpora:tagset-penn-historical|The Historical English Penn Treebank tagset]] * [[corpora:tagset-treetagger|The Tree Tagger tagset]] * [[corpora:tagset-claws5|The CLAWS 5 tagset]], used, for example, for the British National Corpus * The [[corpora:tagset-claws7|CLAWS 7 tagset]], used, for example, in the BNC 2014. * The [[corpora:tagset-claws7-coxa| CLAWS 7 tagset (COCA/COHA version)]] – a variant of CLAWS 7 used in the Corpus of Contemporary American English and the Corpus of Historical American English Most German corpora use (some version of) the STTS tagset: * [[corpora:tagset-stts-original|The original STTS tagset]]