User Tools

Site Tools


corpora:tagsets

Tagsets

Many corpora are annotated for word class – for every word form in the corpus, there is a “pos” tag describing what part of speech it is (see Corpus Structure for more information).

There is no generally agreed-upon set of word classes and no generally agreed-upon way of referring to word classes – different corpora use different sets of tags. Thus, you always have to check the info file or the manual of a corpus in order to construct your queries.

Some widely-used tagsets for English are the following:

Most German corpora use (some version of) the STTS tagset:

corpora/tagsets.txt · Last modified: 2024/01/16 15:31 by astefanowitsch