====== Tagsets: CLAWS 7 (COCA/COHA)====== The CLAWS 7 tagset as used in the Corpus of Historical American English (COCA) and Corpus of Contemporary American English (COCA) contains two additions and several systamtic errors to the standard CLAWS 7 tagset. In addition, all tags except for one are in lowercase! For the standard CLAWS 7 tagset, see the page [[corpora:tagset-claws7|CLAWS 7 tagset]]. ^ Tag ^ Description ^ | appge | possessive pronoun, pre-nominal (//my//, //your//, //his//, //her//, //its//, //our//, //their//) | | at | article (e.g. //the//, //no//) | | at1 | singular article (e.g. //a//, //an//, //every//) | | bcl | before-clause marker (e.g. //in order (that)//, //in order (to)//; see comment about "ditto tags" at the end of the list!) | | cc | coordinating conjunction (e.g. //and//, //or//) | | ccb | adversative coordinating conjunction (//but//) | | cs | subordinating conjunction (e.g. //if//, //because//, //unless//, //so//, //for//) | | csa | //as// (when used as conjunction) | | csn | //than// (when used as a conjunction) | | cst | //that// (when used as a conjunction) | | csw | //whether// (when used as a conjunction) | | da | after-determiner or post-determiner capable of pronominal function (e.g. //such//, //former//, //same//) | | da1 | singular after-determiner (e.g. //little//, //much//) | | da2 | plural after-determiner (e.g. //few//, //several//, //many//) | | dar | comparative after-determiner (e.g. //more//, //less//, //fewer//) | | dat | superlative after-determiner (e.g. //most//, //least//, //fewest//) | | db | before determiner or pre-determiner capable of pronominal function (//all//, //half//) | | db2 | plural before-determiner (//both//) | | dd | determiner (capable of pronominal function) (e.g //any//, //some//) | | dd1 | singular determiner (e.g. //this//, //that//, //another//) | | dd2 | plural determiner (//these//, //those//) | | ddq | wh-determiner (//which//, //what//) | | ddqge | wh-determiner, genitive (//whose//) | | ddqv | wh-ever determiner (//whichever//, //whatever//) | | ex | existential there | | fo | formula | | fu | unclassified word | | fw | foreign word (e.g. //de//, //la//, //aqua//, //chakra//) | | ge | germanic genitive marker - (' or'//s//) | | if | //for// (when used as a preposition) | | ii | general preposition (all prepositions except //for//, //of//, //with//, //without//) | | io | //of// (when used as a preposition) | | iw | //with//, //without// (when used as a prepositions) | | jj | general adjective (e.g. //good//, //nice//, //lovely//, //different//) | | jjr | general comparative adjective (e.g. //better//, //nicer//) | | jjt | general superlative adjective (e.g. //best//, //nicest//) | | jk | catenative adjective (//able// in //be able to//, //willing// in //be willing to//) | | mc | cardinal number, neutral for number (//two//, //three//, //sixteen//, ...) | | mc1 | singular cardinal number (//one//) | | mc2 | plural cardinal number (e.g. //sixes//, //sevens//, //twenties//) | | mcge | genitive cardinal number, neutral for number (//two's//, //100's//) [**Note:** this does not occur in COCA/COHA, presumably because the sequence //'s// has been erroneously split from the stem and misanalyzed as a form of the verb //be// or a possessive]| | mcmc | hyphenated number (//5-10//, //1914-1918//) | | md | ordinal number (e.g. //first//, //second//, //next//, //last//) | | mf | fraction, neutral for number (e.g. //quarters//, two-//thirds//) | | nd1 | singular noun of direction (e.g. //north//, //southwest//) | | nn | common noun, neutral for number (e.g. //people//, //staff//, //tuna//, //aircraft//, //series//, //ethics//) | | nn1 | singular common noun (e.g. //horse//, //girl//, //love//, //democracy//) | | nn2 | plural common noun (e.g. //horses//, //girls//, //democracies//) | | nna | following noun of title (e.g. //M.A.//) | | nnb | preceding noun of title (e.g. //Mrs.//, //Prof.//) | | nnl1 | singular locative noun (e.g. //Lake//, //Street//, //Hill//) | | nnl2 | plural locative noun (e.g. //Lakes//, //Streets//, //Hills//) | | nno | numeral noun, neutral for number (e.g. //dozen//, //hundred//) | | nno2 | numeral noun, plural (e.g. //hundreds//, //thousands//) | | nnt1 | temporal noun, singular (e.g. //day//, //week//, //year//) | | nnt2 | temporal noun, plural (e.g. //days//, //weeks//, //years//) | | nnu | unit of measurement, neutral for number (e.g. //mm//, //sec//) | | nnu1 | singular unit of measurement (e.g. //millimetre//, //second//) | | nnu2 | plural unit of measurement (e.g. //ins.//, //feet//) | | np | proper noun, neutral for number (e.g. //Philippines//, //Mercedes//) | | np1 | singular proper noun (e.g. //Europe//, //BBC//, //Sarah//) | | np2 | plural proper noun (e.g. //Himalayas//, //Beatles//, //Tudors//) | | npd1 | singular weekday noun (e.g. //Friday//) | | npd2 | plural weekday noun (e.g. //Fridays//) | | npm1 | singular month noun (e.g. //September//) | | npm2 | plural month noun (e.g. //Septembers//) | | OM | general tag for omitted words (not part of CLAWS7) | | pn | indefinite pronoun, neutral for number (//none//) | | pn1 | indefinite pronoun, singular (e.g. //anyone//, //everything//, //nobody//, //one//) | | pnqo | objective wh-pronoun (//whom//) | | pnqs | subjective wh-pronoun (//who//) | | pnqv | wh-ever pronoun (//whoever//) | | pnx1 | reflexive indefinite pronoun (//oneself//) | | ppge | nominal possessive personal pronoun (e.g. //mine//, //yours//) | | pph1 | 3rd person sing. neuter personal pronoun (//it//) | | ppho1 | 3rd person sing. objective personal pronoun (//him//, //her//) | | ppho2 | 3rd person plural objective personal pronoun (//them//) | | pphs1 | 3rd person sing. subjective personal pronoun (//he//, //she//) | | pphs2 | 3rd person plural subjective personal pronoun (//they//) | | ppio1 | 1st person sing. objective personal pronoun (//me//) | | ppio2 | 1st person plural objective personal pronoun (//us//) | | ppis1 | 1st person sing. subjective personal pronoun (//I//) | | ppis2 | 1st person plural subjective personal pronoun (//we//) | | ppx1 | singular reflexive personal pronoun (e.g. //yourself//, //itself//) | | ppx2 | plural reflexive personal pronoun (e.g. //yourselves//, //themselves//) | | ppy | 2nd person personal pronoun (//you//) | | ra | adverb, after nominal head (e.g. //ago//, //am//, //pm//) | | rex | adverb introducing appositional constructions (//namely//, //i.e.//) | | rg | degree adverb (//very//, //so//, //too//) | | rgq | wh- degree adverb (//how//) | | rgqv | wh-ever degree adverb (//however//) | | rgr | comparative degree adverb (//more//, //less//) | | rgt | superlative degree adverb (//most//, //least//) | | rl | locative adverb (e.g. //somewhere//, //forward//, //upstairs//) | | rp | prep. adverb, particle (e.g //up//, //out//, //back//) | | rpk | prep. adv., catenative (e.g. //about// in //be about to//) | | rr | general adverb (e.g. //just//, //actually//, //always//) | | rrq | wh- general adverb (//where//, //when//, //why//, //how//) | | rrqv | wh-ever general adverb (//wherever//, //whenever//) | | rrr | comparative general adverb (e.g. //more//, //better//, //earlier//) | | rrt | superlative general adverb (e.g. //most//, //best//, //earliest//) | | rt | quasi-nominal adverb of time (e.g. //now//, //tomorrow//) | | to | infinitive marker (//to//) | | uh | interjection (e.g. //oh//, //yes//, //um//) | | vb0 | //be//, base form (finite i.e. imperative, subjunctive) | | vbdr | //were// | | vbdz | //was// | | vbg | //being// | | vbi | //be//, infinitive (e.g. in //I'll be wrapped around your finger//, //to be honest//) | | vbm | //am// | | vbn | //been// | | vbr | //are// | | vbz | //is// | | vd0 | //do//, base form (finite) | | vdd | //did// | | vdg | //doing// | | vdi | //do//, infinitive (e.g. in //I could do...//, //To do...//) | | vdn | //done// | | vdz | //does// | | vh0 | //have//, base form (finite) | | vhd | //had// (past tense) | | vhg | //having// | | vhi | //have//, infinitive | | vhn | //had// (past participle) | | vhz | //has// | | vm | modal auxiliary (//can//, //will//, //would//, etc.) | | vmk | modal catenative (//ought//, //used//) | | vv0 | base form of lexical verb (e.g. //say//, //love//) | | vvd | past tense of lexical verb (e.g. //said//, //loved//) | | vvg | -ing participle of lexical verb (e.g. //saying//, //loving//) | | vvgk | -ing participle catenative (//going// in //be going to//) | | vvi | infinitive (e.g. //to say...//, //I will always love you...//) | | vvn | past participle of lexical verb (e.g. //given//, //worked//) | | vvnk | past participle catenative (e.g. //bound// in //be bound to//) | | vvz | -//s// form of lexical verb (e.g. //says//, //loves//) | | xx | //not//, //n't// | | y | major punctuation marks, specifically //: , . " : ( ) ? ; !// (not part of CLAWS 7) | | zz1 | singular letter of the alphabet (e.g. //A//, //b//) | | zz2 | plural letter of the alphabet (e.g. //A's//, //b's//) | | zzc, zzq | speaker labels in transcripts of spoken language (not part of CLAWS 7) | **Errors (COCA)** In the COCA, there are tags where the last character is missing, so that they will not be found in a standard query: ^ Tag ^ Correct Tag ^ Comment ^ | a | at | occurs only with //ze// as a representation of //the// in a foreign accent | | c | cs | occurs only with //cept// as a clipped variant of //except// | | d | dd1 | occurs only with //an-other// as a variant of //another// | | f | fw | occurs only with //de//, //las//, //dos// | | j | jj | occurs with 657 types with 35663 tokens | | m | mc | occurs with 188 types with 391150 tokens | | m1 | mc1 | occurs with 145 types with 63588 tokens | | n | nn, nn1, nn2 | occurs with 15 types with 1588 tokens | | npx | np, np1, np2 | occurs with 243 types with 65782 tokens | | null | | occurs with tokenization errors where the token was too long for the parser and with xml entities (approx. 1200 types with 58897 tokens | | p | ppho2 | occurs with some cases of //them// and //what-all// | | ./. | ppy | occurs with some cases of //y'all// | | vd | vdn | occurs with //don// as a clipped form of //done// (1763 tokens) | | x | | occurs with various symbols, xml entities, tokenization errors, ... (approx. 5900 types with 895086 tokens) | | xxy | xx | occurs with tokenization errors involving the word //no// (11 types with 186 tokens) | | zz | zz1 | occurs with various symbols, abbreviations, tokenization errors (approx. 1200 types with 596697 tokens) | **Ditto Tags** The CLAWS 7 tagset uses so-called “ditto” tags for certain sequences of tokens that are analyzed as belonging to a single lexical unit. For example, //in terms of// is analyzed as a preposition (in CLAWS 5, by comparison, it is analyzed as a sequence of a preposition, a noun and another preposition). In such cases, all words are given the same tag (in the case of //in terms of// the tag ''in'' for //preposition//) followed by two digits: the first one specifying the length of the sequence, the second one specifying the position of the element in the sequence, for example in/ii31 terms/ii32 of/ii33 at_rr21 length_rr22 a_dd21 lot_dd22 This is unfortunate, as it forces analytical decisions on us that are not at all uncontroversial, but we have to live with it!