User Tools

Site Tools


cqp:metadata

[ Collection: Introduction to CQP ]

3d. Metadata

This section introduces two important ways of working with metadata. It presupposes that you have read Section 1 and Section 2.

As explained in Section 1, corpora often contain metadata in the form of xml tags like the following (simplified) example:

<text id="TBA" genre="novel" author_sex="female" date="2000"> … </text>

When the corpus is compiled, the base name of the tag (in this case, text) and the attributes (in this case, id, genre, etc.) are combined into attributes like text_id, text_genre etc. If you want to know which metadata is available for a given corpus, check the info file (simply select the corpus, type info and hit the RETURN key, then use the SPACE key to scroll down, the b key to scroll up, and the q key to close the info file). These metadata can be used in different ways.

Displaying metadata in the concordance

The simplest way in which we may want to use metadata is to have certain types of information displayed as part of the concordance. This is done using the command set PrintStructures: before running a query, we type this command followed by the relevant metadata attribute(s) in quotation marks. For example, in the BNC every text has a three-letter id with the attribute text_id. To display this text id, type the following and hit RETURN:

set PrintStructures "text_id"

When you now run a query like [hw="love"%c], the text id will be given at the beginning of every concordance line. Try it. You will see that often more than one example is from the same text.

You can also have the Corpus Workbench display more than one type of metadata. For example, to display the text id and the genre, type:

set PrintStructures "text_id text_genre"

i.e., list the metadata attributes that you want displayed, separated by whitespaces.

To turn off the display, simply type

set PrintStructures ""

Your settings for displaying metadata will also be reset to zero when you switch to a different corpus or when you quit CQP.

Displaying metadata without attributes

Recall that there may be metadata that does not have any attribute-value pairs, like <p> … </p> enclosing paragraphs or <s> … </s> enclosing sentences. If you try to display these tags using the set PrintStructures command, you get an error message:

BNC> set PrintStructures "p";                 
Warning:
	Structure ``p'' does not have any values.

This is because there is nothing that could be displayed at the beginning of the line. However, you may want to display these tags inside the concordance line, i.e., at the point where they occur in the text (for example, to see when a new paragraph begins). To do this, you use the show command, followed by a plus sign and the tag you want to display. For example, to display the paragraph tags, type:

show +p

Try it, you will see that occasionally, <p> or </p> tags are now displayed in a concordance line. To turn the display off again, use the same command with a minus sign, i.e., type

show -p

Displaying linguistic annotation

As also discussed in Section 3f, the show command can also be used to display information from the columns following the word column. For example, if your corpus contains pos tags (as the BNC does), you can type the following before creating your concordance:

show +pos

Try it, you will see that for each word in the concordance line, its pos tag is now shown, separated from the word by a slash, Again, to turn the display off again, use the same command with a minus sign, i.e., type

show -pos

Restricting a search by metadata

A more sophisticated way that you can use metadata is by restricting your search to examples that match a particular value of a metadata attribute. This is done by attaching the command :: match.attribute="value" to the end of your query.

For example, you may be interested in whether men and women use the word love differently. The BNC contains the attribute text_author_sex with the values male, female and unknown. To find only uses of the word love produced by men, type the following and hit RETURN:

[hw="love"] :: match.text_author_sex="male"

To find only uses of the word love produced by women, type the following and hit RETURN:

[hw="love"] :: match.text_author_sex="female"

Summary and outlook

This section has shown you how work with metadata. Building on this, you can look at the following sections in any order:

[ Introduction to CQP: Section 1Section 2Section 3Section 4Section 5Section 6 ]

cqp/metadata.txt · Last modified: 2024/01/21 07:18 by aamoakuh