Many corpora are annotated for word class – for every word form in the corpus, there is a “pos” tag describing what part of speech it is (see Corpus Structure for more information).
There is no generally agreed-upon set of word classes and no generally agreed-upon way of referring to word classes – different corpora use different sets of tags. Thus, you always have to check the info
file or the manual of a corpus in order to construct your queries.
Some widely-used tagsets for English are the following:
Most German corpora use (some version of) the STTS tagset: