User Tools

Site Tools


corpora:tagsets

Tagsets

Many corpora are annotated for word class – for every word form in the corpus, there is a “pos” tag describing what part of speech it is (see Corpus Structure for more information).

There is no generally agreed-upon set of word classes and no generally agreed-upon way of referring to word classes – different corpora use different sets of tags. Thus, you always have to check the info file or the manual of a corpus in order to construct your queries.

Some widely-used tagsets for English are the following:

Most German corpora use (some version of) the STTS tagset:

corpora/tagsets.txt · Last modified: 2024/06/20 13:53 by 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki