Table of Contents
[ Collection: Introduction to CQP ]
3f. Working with concordances
This section looks in more detail at concordances and discusses some useful ways of making them fit your needs. It also discusses how a concordance can be saved to an external file. This section presupposes that you have read Section 1 and Section 2; it is also useful (but not necessary) to read Section 3d before reading this section.
Context size and context type
As discussed in Section 2, the most typical way of displaying a concordance is in the form of a KWIC concordance (Key Word In Context), with the search term in the middle, and a fixed number of characters to the left and right. The size of this context is 30 characters by default, but we can change it to suit our needs using the set Context
command. We simply type this command, followed by a number specifying how many characters we want displayed. For example, if we want 50 characters to the left and to the right, we type the following and hit RETURN
:
set Context 50
If we now produce a concordance, the context will be shown as specified.
We can also set a different context size for the left context and the right context, using the commands set LeftContext
and set RightContext
. For example, if we want 10 characters to the left and 40 characters to the right, we type:
set LeftContext 10 set RightContext 40
If we now create a concordance, the context will be shown as specified.
Instead of specifying a certain number of characters, we can also specify a certain number of words. For example, to set the context to ten words to the left and right, we type:
set Context 10 words
However, KWIC concordances are just one type of concordance. Instead, we may want to have one or more whole sentences or even paragraphs displayed. In corpora that have metadata tags (see Section 3d for more information) corresponding to sentences and paragraphs, we can use these to set the context. The BNC has a p
tag for paragraphs and an s
tag for sentences. So, if we want to have whole sentences displayed, we type the following before creating a concordance:
set Context s
If we want an additional sentence before and after the one containing the match, we type the following, and so on:
set Context 2 s
If we want a whole paragraph displayed for every match, we type the following:
set Context p
These settings are set back to the default of 30 characters to either side when you quit CQP.
Displaying linguistic annotation
By default, a concordance will always display the contents of the word
column, i.e., the word forms that make up the text sample. As also discussed in Section 3d, we can use the show
command to display information from other columns. For example, if our corpus contains pos
tags (as the BNC does), we can type the following before creating our concordance:
show +pos
Try it, you will see that for each word in the concordance line, its pos
tag is now shown, separated from the word by a slash, To turn the display off again, use the same command with a minus sign, i.e., type
show -pos
Storing concordances internally
When we create a concordance, the information given at the top includes a “Name”, which by default is the name of the corpus followed by a colon and the word Last
. This name points to a variable (a kind of virtual container within the Corpus Workbench) that contains your concordance. As the word Last
suggests, this variable always contains the last concordance we have created.
This is useful if we have created a complex query that takes some time to run and then realize that we want to change the display settings as described above. Instead of re-running the query, we can simply change the display settings as desired and then use the command cat Last
to display the concordance. Try querying the word love, then setting the Context to “sentence”, and then displaying the concordance again:
[hw="love"] set Context s cat Last
If you think this is nice, wait for the next bit! Obviously, we may create more than one concordance during a CQP session. Wouldn't it be great if we could store all of these in variables, rather than just the last one? Fortunately, we can. We simply create a concordance, then think of a name, and type Name = Last
. The concordance will now be stored in a variable called Name and can be displayed at any time during our session using the command cat Name
. For example, we can create a concordance for the word love and store it in a variable called Air (because, as Paul Young sings, “love is in the air”):
[hw="love"] Air = Last cat Air
(Of course, we can also call the variable Love or anything else. The variable name does not have to begin with a capital letter, but it is a useful habit to write variable names in this way, so that you do not confuse them with commands.)
We can also store a concordance in a variable immediately, without displaying it first. This is done by giving the variable name, followed by an equals sign, followed by our query: Name = [variable=“value”]
. For example, to create a concordance for the lemma love and store it in a variable called Love
, we would type:
Love = [hw="love"]
If we hit RETURN, no results are displayed, as they have been stored directly in our variable, which we can display by typing cat Love
and hitting RETURN.
These variables are all deleted when you quit CQP. If you want them to be available the next time you use CQP, you have to save them externally (see below).
Combining internally stored concordances
If we have several concordances stored in variables, we can combine them in various ways. Most straightforwardly, we can combine two separate concordances into a single one using the command union
, i.e. Concordance = union Concordance Concordance
. For example, if we have a concordance for the word love stored in a variable named Love and a concordance for the word friendship stored in a variable named Friendship…
[hw="love"] Love = Last [hw="friendship"] Friendship = Last
…we can combine them into a concordance named GoodRelations as follows:
GoodRelations = union Love Friendship
If you remember set theory from your math lessons, you will recognize that the word union is a set-theoretic term, which suggests that we may also perform other set-theoretic operations on named concordances. And we can! In addition to union
we can use intersection
, which will give us the lines shared by two concordances, and difference
, which will give us only the lines that are different in two concordances.
Saving a concordance
There are two ways of saving a concordance to an external file. First, we can save it in a format that is only readable by the Corpus Workbench, so that if we want to look at it again, we have to import it back into the Corpus Workbench. This is useful if we have created a complex concordance and want it to be available in CQP the next time you work with it (remember: all concordances are deleted when you quit CQP). Second – and this is probably the more typical case – we can export a concordance to a text file that we can then open with a text editor or spreadsheet program. This is useful if we are satisfied with our concordance and want to keep working with it outside of CQP (for example, in order to add further annotations of our own as part of some research project).
Saving a concordance for the next CQP session
To save a concordance in a way that allows us to re-import it into CQP, we must first tell CQP where to save it. This is done by the command set DataDirectory
. In your case, it is easiest to set the data directory to the main directory of your Linux account. This is done as follows:
set DataDirectory "."
Setting the data directory will make CQP exit our current corpus, so we have to select it again by typing its name and hitting RETURN
, but any concordances we have saved in a variable will still be there. Now, we can simply type save Name
and the concordance in the variable will be saved. For example, to save the concordance we have named Love, we type:
save Love
Next time we start CQP, we have to set the data directory again…
set DataDirectory "."
… and we can then work with the concordance as we would work with any concordance stored in a variable, e.g., display it by typing
cat Go;
If we don't remember what we called our concordance, we can display all our saved concordances by typing the following:
show named;
Exporting a concordance to an external file
As mentioned earlier, the more typical case is the one where we want to export a concordance to a text file to open it with a text editor or other program.
This is done using the cat
command that we already know. We simply add the symbol >
followed by a file name in quotation marks at the end – this will save a text file in the main directory of our Linux account. For example, to save the concordance Love
to a file called love.txt
, we simply type
cat Love > "love.txt"
Now, all you have to do is get the file from your Linux account. There are many ways of doing this, the easiest one is to go to the ZEDAT start page, click on Datenablage in the little grey box on the left side of the screen, and enter your login information. This will open a web interface showing all your files.
Summary and outlook
This section has shown you how to manipulate concordances in various ways and how to export them. Building on this, you can look at the following sections in any order:
- Section 3a: Extending simple queries: Alternative attributes and values
- Section 3b: Extending simple queries: Combinations of attributes and values
- Section 3c: Complex Queries
- Section 3d: Metadata
- Section 3e: Regular expressions (basics)
- Section 3g: Sorting and sampling
[ Introduction to CQP: Section 1 – Section 2 – Section 3 – Section 4 – Section 5 – Section 6 ]