User Tools

Site Tools


corpora:tagset-penn

Tagsets: Penn

The Penn tagset is widely used, although it is often modified to some extent (for example, in the Tree Tagger tagset). There is a document that describes in detail how the tagset should be applied (Santorini 1991). There are two versions of the tagset – the original, and a version for the Penn Treebank (a corpus that includes information about grammatical structure). In the latter version, the tags NP (proper name) and PP (personal pronoun) were modified to make them different from the grammatical labels NP (noun phrase) and PP (prepositional phrase).

Original Treebank Description
CC Coordinating conjunction
CD Cardinal number
DT Determiner
EX Existential there
FW Foreign word
IN Preposition or subordinating conjunction
JJ Adjective
JJR Adjective, comparative
JJS Adjective, superlative
LS List item marker
MD Modal
NN Noun, singular or mass
NNS Noun, plural
NP NNP Proper noun, singular
NPS NNPS Proper noun, plural
PDT Predeterminer
POS Possessive ending
PP PRP Personal pronoun
PP$ PRP$ Possessive pronoun
RB Adverb
RBR Adverb, comparative
RBS Adverb, superlative
RP Particle
SYM Symbol
TO to
UH Interjection
VB Verb, base form
VBD Verb, past tense
VBG Verb, gerund or present participle
VBN Verb, past participle
VBP Verb, non-3rd person singular present
VBZ Verb, 3rd person singular present
WDT Wh-determiner
WP Wh-pronoun
WP$ Possessive wh-pronoun
WRB Wh-adverb

Reference

Beatrice Santorini. 1991. Part-of-speech tagging guidelines for the Penn Treebank Project (3rd Revision). Technical Report No. MS-CIS-90-47. University of Pennsylvania Department of Computer and Information Science.

corpora/tagset-penn.txt · Last modified: 2022/10/05 10:17 by astefanowitsch