cqp:collocates
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
cqp:collocates [2020/04/20 16:57] – created astefanowitsch | cqp:collocates [2024/06/20 13:53] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | **[ [[cqp: | ||
+ | ====== 4c. Collocate lists and tables ====== | ||
+ | |||
+ | //This section introduces a method for creating collocate lists and tables from a concordance. It presupposes that you have read [[cqp: | ||
+ | |||
+ | ===== Two ways of summarizing collocates ===== | ||
+ | |||
+ | Many concordancers offer some way of quickly summarizing the collocates of a search term -- the words occurring in a certain span around the search term. | ||
+ | |||
+ | One type of summary is an (alphabetical) **collocate list** that gives the frequency of every word in the span on the left of the search term, the span on the right, and every individual position within these spans, as in the following example (for the lemma //love// in the BNC): | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | For example, within a span of four words to the left and four words to the right of the lemma //love//, the word form //ability// occurs nine times -- seven times on the left and twice on the right side. Specifically, | ||
+ | |||
+ | Another type of summary is a **collocate table**, with the words at each position in the span ordered in decreasing frequency: | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | For example, the word form //I// is the most frequent token at the position one word to the left, followed by //in// and //of//; the comma is the most frequent token one word to the right, followed by the period and //to//. | ||
+ | |||
+ | ===== Summarizing collocates in CQP ===== | ||
+ | |||
+ | So, how do we create such summaries in CQP? We can't -- the Corpus Workbench does not offer such a function. However, in our installation, | ||
+ | |||
+ | ==== 1. Preparing the concordance ==== | ||
+ | |||
+ | The script creates summaries for a span of four words to the left and to the right of a search term, so the first step is to make sure that our concordance has a context of four words to the left and to the right. This is done using the '' | ||
+ | |||
+ | set Context 4 words | ||
+ | |||
+ | After having set the context to four words, let us create a concordance of the lemma //love//, and save it to a variable called '' | ||
+ | |||
+ | Love = [hw=" | ||
+ | |||
+ | ==== 2. Transforming and exporting the concordance ==== | ||
+ | |||
+ | We then export the concordance as described in [[cqp: | ||
+ | |||
+ | cat Love > "| collocates.pl > love.csv" | ||
+ | |||
+ | This writes the output to a '' | ||
+ | |||
+ | ===== Collocate tables ===== | ||
+ | |||
+ | If we run the script as just shown, it will create a collocate table -- in our view, if you want a simple summary of collocates, this is the best format. The table will be case sensitive, i.e., //In// will be counted separately from //in//, //Of// separately from //of//, etc. Usually, we will want our collocate table to be case insensitive, | ||
+ | |||
+ | cat Love > "| collocates.pl -c > love-case-insensitive.csv" | ||
+ | |||
+ | ===== Collocate lists ===== | ||
+ | |||
+ | If we want the list format instead of the table format, we add '' | ||
+ | |||
+ | cat Love > "| collocates.pl -l > love-list.csv" | ||
+ | |||
+ | This will produce a list like that shown above. Again, this list will be case sensitive. If we do not want this, we have to add both the '' | ||
+ | |||
+ | cat Love > "| collocates.pl -lc > love-list-case-insensitive.csv" | ||
+ | |||
+ | ===== Summary and outlook ===== | ||
+ | |||
+ | This section introduced an extension to our CWB installation that allows you to create collocate lists and tables. You can now read the following to section in any order: | ||
+ | |||
+ | * [[cqp: | ||
+ | * [[cqp: | ||
+ | |||
+ | ===== Running the script outside of our installation ===== | ||
+ | |||
+ | The script '' | ||
+ | |||
+ | **[ Introduction to CQP: [[cqp: |