**[ [[cqp:introduction|Collection: Introduction to CQP]] ]** ====== 3d. Metadata ====== //This section introduces two important ways of working with metadata. It presupposes that you have read [[cqp:corpus-structure|Section 1]] and [[cqp:simple-queries|Section 2]].// As explained in [[cqp:corpus-structure| Section 1]], corpora often contain metadata in the form of xml tags like the following (simplified) example: When the corpus is compiled, the base name of the tag (in this case, ''text'') and the attributes (in this case, ''id'', ''genre'', etc.) are combined into attributes like ''text_id'', ''text_genre'' etc. If you want to know which metadata is available for a given corpus, check the info file (simply select the corpus, type ''info'' and hit the ''RETURN'' key, then use the ''SPACE'' key to scroll down, the ''b'' key to scroll up, and the ''q'' key to close the info file). These metadata can be used in different ways. ===== Displaying metadata in the concordance ===== The simplest way in which we may want to use metadata is to have certain types of information displayed as part of the concordance. This is done using the command ''set PrintStructures'': before running a query, we type this command followed by the relevant metadata attribute(s) in quotation marks. For example, in the BNC every text has a three-letter id with the attribute ''text_id''. To display this text id, type the following and hit ''RETURN'': set PrintStructures "text_id" When you now run a query like ''[hw="love"%c]'', the text id will be given at the beginning of every concordance line. Try it. You will see that often more than one example is from the same text. You can also have the Corpus Workbench display more than one type of metadata. For example, to display the text id and the genre, type: set PrintStructures "text_id text_genre" i.e., list the metadata attributes that you want displayed, separated by whitespaces. To turn off the display, simply type set PrintStructures "" Your settings for displaying metadata will also be reset to zero when you switch to a different corpus or when you quit CQP. ===== Displaying metadata without attributes ===== Recall that there may be metadata that does not have any attribute-value pairs, like ''

'' enclosing paragraphs or '''' enclosing sentences. If you try to display these tags using the ''set PrintStructures'' command, you get an error message: BNC> set PrintStructures "p"; Warning: Structure ``p'' does not have any values. This is because there is nothing that could be displayed at the beginning of the line. However, you may want to display these tags //inside// the concordance line, i.e., at the point where they occur in the text (for example, to see when a new paragraph begins). To do this, you use the ''show'' command, followed by a plus sign and the tag you want to display. For example, to display the paragraph tags, type: show +p Try it, you will see that occasionally, ''

'' or ''

'' tags are now displayed in a concordance line. To turn the display off again, use the same command with a minus sign, i.e., type show -p ===== Displaying linguistic annotation ===== As also discussed in [[cqp:concordances|Section 3f]], the ''show'' command can also be used to display information from the columns following the ''word'' column. For example, if your corpus contains ''pos'' tags (as the BNC does), you can type the following before creating your concordance: show +pos Try it, you will see that for each word in the concordance line, its ''pos'' tag is now shown, separated from the word by a slash, Again, to turn the display off again, use the same command with a minus sign, i.e., type show -pos ===== Restricting a search by metadata ===== A more sophisticated way that you can use metadata is by restricting your search to examples that match a particular value of a metadata attribute. This is done by attaching the command '':: match.//attribute//="//value//"'' to the end of your query. For example, you may be interested in whether men and women use the word ''love'' differently. The BNC contains the attribute ''text_author_sex'' with the values ''male'', ''female'' and ''unknown''. To find only uses of the word //love// produced by men, type the following and hit ''RETURN'': [hw="love"] :: match.text_author_sex="male" To find only uses of the word //love// produced by women, type the following and hit ''RETURN'': [hw="love"] :: match.text_author_sex="female" ===== Summary and outlook ===== This section has shown you how work with metadata. Building on this, you can look at the following sections in any order: * [[cqp:extending-queries-combinations|Section 3a]]: Extending simple queries: Alternative attributes and values * [[cqp:extending-queries-alternatives|Section 3b]]: Extending simple queries: Combinations of attributes and values * [[cqp:complex-queries|Section 3c]]: Complex Queries * [[cqp:regular-expressions-basics|Section 3e]]: Regular expressions (basics) * [[cqp:concordances|Section 3f]]: Working with concordances * [[cqp:sorting-sampling|Section 3g]]: Sorting and sampling **[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]] ]**