**[ [[cqp:introduction|Collection: Introduction to CQP]] ]**

====== 4b. Grouping ======

//This section explains how to create complex frequency lists from a concordance, where the frequencies of one variable are grouped by a second variable. It presupposes that you have read [[cqp:corpus-structure|Section 1]], [[cqp:simple-queries|Section 2]], [[cqp:complex-queries|Section 3c]] and [[cqp:frequency-lists|Section 4a]]).// 

===== Grouping by structural features =====

The frequency lists introduced in [[cqp:frequency-lists|Section 4a]] are very useful as long as we are interested in the frequency of single words or connected sequences of words. But sometimes, we may want to determine the frequency of word combinations that do not occur in connected sequences.

An example: Inspired by the line //You tricked me into loving you// from Alexandra Burke's song //You broke my heart//, we might wonder what verbs co-occur in the expression [VERB someone into VERBING]. The following query should find most cases of this expression in the BNC: 

	[pos="VV."] [] [word="into"%c] [pos="VVG"]

It looks for any lexical verb (i.e. any verb except //be//, //have// and //do//), followed by any word (this will capture pronouns and proper names, that most typically occur in this position in the expression, followed by the word form //into//, followed by a lexical verb in the //ing//-form.

If we create a lemmatized frequency list of this query, it looks as follows:

	count Last by hw;

	3       fool you into think  [#132-#134]
	2       aggravate he into produce  [#0-#1]
	2       bully i into say  [#39-#40]
	2       con i into post  [#66-#67]
	2       delude themselves into believe  [#92-#93]
	2       delude yourself into think  [#95-#96]
	2       hoodwink we into believe  [#190-#191]
	2       mislead you into think  [#253-#254]
	2       talk he into give  [#361-#362]
	2       talk he into let  [#365-#366]
	2       talk i into come  [#371-#372]
	1       fool anyone into think  [#144]
	...

This is not useful, because we are not interested in the specific content of the second slot — for example, the first and the last line of the excerpt shown here are examples of the same combination of verbs -- //fool// and //think//, and we would like their frequencies to be combined.

We can create a frequency list of just the verb combinations by using the ''group'' command as follows:

	group Last matchend hw by match[0] hw;

This command creates a special type of frequency list where the last word in our query (''matchend'') is combined with the first word (''match[0]'') and then counted. The result looks like this, giving us a much clearer picture:

	#---------------------------------------------------------------------
	fool                          think                                 12
	mislead                       think                                  6
	deceive                       think                                  5
	delude                        believe                                4
	mislead                       believe                                4
	delude                        think                                  3
	fool                          believe                                3
	force                         make                                   3
	talk                          give                                   3
	                              go                                     3
	trick                         believe                                3
	...

(Note that the gap in the first column after //talk// indicates that the preceding verb is repeated.)

Now, this does not really look like a “grouped” frequency list yet, because the combinations are still listed in order of descending frequency. We can //actually// group the list by modifying the command slightly and repeating the word ''group'' before the word ''by'':

	group Last matchend hw group by match[0] hw

If we run this command, we get an output like the following, which is sorted by the first verb (the grouping variable) first, and then by order of frequency:

	#---------------------------------------------------------------------
	talk                          do                                     3
	                              give                                   3
	                              go                                     3
	                              accept                                 2
	                              come                                   2
	                              let                                    2
	                              take                                   2
	                              abandon                                1
	                              act                                    1
	                              become                                 1
	                              believe                                1
	                              buy                                    1
	                              care                                   1
	                              have                                   1
	                              join                                   1
	                              leave                                  1
	                              make                                   1
	                              offer                                  1
	                              play                                   1
	                              realize                                1
	                              release                                1
	                              ring                                   1
	                              stay                                   1
	                              try                                    1
	                              use                                    1
	                              wear                                   1
	#---------------------------------------------------------------------
	force                         make                                   3
	                              accept                                 1
	                              agree                                  1
	                              choose                                 1
	                              commit                                 1
	                              concede                                1
	                              discard                                1
	                              do                                     1
	                              give                                   1
	                              have                                   1
	                              hide                                   1
	                              lose                                   1
	                              probe                                  1
	                              see                                    1
	                              speak                                  1
	                              switch                                 1
	                              try                                    1
	                              use                                    1
	                              work                                   1
	...

Note that many combinations occur only once, we might not be interested in those. We can add the command ''cut'' at the end to specify the minimum frequency that a combinations must have in order to be included in the list. For example:

	group Last matchend hw group by match[0] hw cut 2

This gives us the following, more readable output that we could now use to think about why we //talk// people into actions like //giving//, //going//, //accepting//, etc., but //fool// them into //thinking// or //believing// something:

	#---------------------------------------------------------------------
	talk                          give                                   3
	                              go                                     3
	                              accept                                 2
	                              come                                   2
	                              let                                    2
	                              take                                   2
	#---------------------------------------------------------------------
	fool                          think                                 12
	                              believe                                3
	#---------------------------------------------------------------------
	force                         make                                   3
	#---------------------------------------------------------------------
	bully                         get                                    2
	                              say                                    2
	                              take                                   2
	#---------------------------------------------------------------------
	trick                         believe                                3
	#---------------------------------------------------------------------
	mislead                       think                                  6
	                              believe                                4
	#---------------------------------------------------------------------
	provoke                       make                                   2
	#---------------------------------------------------------------------
	deceive                       think                                  5
	                              believe                                2
	#---------------------------------------------------------------------
	delude                        believe                                4
	                              think                                  3
	#---------------------------------------------------------------------
	lead                          define                                 2
	#---------------------------------------------------------------------
	trap                          make                                   2
	#---------------------------------------------------------------------
	con                           post                                   2
	#---------------------------------------------------------------------
	lure                          make                                   2
	#---------------------------------------------------------------------
	go                            make                                   2
	#---------------------------------------------------------------------
	tempt                         make                                   2
	#---------------------------------------------------------------------
	lull                          believe                                2
	#---------------------------------------------------------------------
	aggravate                     produce                                2
	#---------------------------------------------------------------------
	hoodwink                      believe                                2

Of course, you can switch the order of variables to group the first verb by the second instead:

	group Last match[0] hw group by matchend hw cut 2;

Look at the output and think about what this can show you as opposed to the table above.

===== Grouping by metadata =====

We can also use the ''group'' command to create frequency lists that are grouped by metadata -- in fact, this is probably the more typical way of using it.

For example, we might be interested in whether there is a connection between social class and the usage of so-called “non-standard” language features such as the form //ain't//. If our corpus includes metadata about class, we can answer this question using a simple query and the ''group'' command as follows.

First, we create a concordance of the form //ain't// and the two standard forms that it can represent -- //isn't// and //aren't//. We limit this query to utterances that contain information about class, which in the BNC is recorded in the attribute ''u_class'' using the labels ''AB'' (roughly ‘upper and middle middle class’), ''C1'' (roughly ‘lower middle class’), ''C2'' (roughly ‘skilled working class’) and ''DE'' (roughly ‘unskilled working class’):

	[word="(am|are|is|ai)"%c] [word="n't"%c] :: match.u_class="(AB|C1|C2|DE)"

We can then group the frequency of the first word of the query by the values of the variable ''u_class'':

	group Last match word group by match u_class

The result looks like this:

	#---------------------------------------------------------------------
	C1                            is                                  1107
	                              are                                  362
	                              ai                                   218
	                              Is                                    45
	                              Are                                   32
	                              Ai                                    13
	#---------------------------------------------------------------------
	C2                            is                                   690
	                              ai                                   592
	                              are                                  344
	                              Is                                    44
	                              Are                                   34
	                              Ai                                    31
	#---------------------------------------------------------------------
	AB                            is                                   968
	                              are                                  335
	                              Is                                    73
	                              Are                                   52
	                              ai                                    49
	                              Ai                                     6
	#---------------------------------------------------------------------
	DE                            is                                   362
	                              ai                                   330
	                              are                                  175
	                              Ai                                    21
	                              Are                                   17
	                              Is                                    11

Unfortunately, the ''group'' command does not allow the ''%c'' flag, so we have to do a bit of manual counting, adding up the frequencies of upper case and lower case instances of each word. This gives us the following table:

^ Class          ^ //isn't//      ^ //aren't//     ^ //ain't//      ^
| AB               | 1041           | 387            | 55             |
| C1               | 1152           | 394            | 231            |
| C2               | 734            | 378            | 623            |
| DE               | 373            | 192            | 351            |

We can now calculate the percentage of cases of //ain't// in each group of speakers by dividing the frequency of //ain't// by the overall frequency of all three forms in each class and multiplying the result by 100. This gives us percentages of 3.71% for the upper and middle middle class, 13.00% for the lower middle class, 35.91% for the skilled working class and 38.32% for the unskilled working class. In other words: the lower the social status, the higher the usage of //ain't// — //Ain't No Love in the Heart of the City//, as Bobby Bland sang, but sure is a lot of //ain't// in the highrises on the outskirts.

===== Summary and outlook =====

This section introduced the ''group'' command. You can now read [[cqp:collocates|Section 4c]] and then continue on the the expert tricks!

**[ Introduction to CQP: [[cqp:corpus-structure|Section 1]] -- [[cqp:simple-queries|Section 2]] -- [[cqp:advanced-querying|Section 3]] -- [[cqp:beyond-queries|Section 4]] -- [[cqp:expert-tricks|Section 5]] -- [[cqp:exercises|Section 6]] ]**