User Tools

Site Tools


r:contingency-tables

[ Collection: Introduction to R ]

Contingency tables

Introduction

Contingency tables (also called cross tabulations) are tables showing the intersections of two variables. For example, there are two variants of the preposition toward(s) (“in the direction of”): one with an s at the end and one without. There are several national varieties of English, very prominent among them British English and American English. Both variants of the preposition occur in both varieties, thus, we have two variables (Variant of the Preposition, with the values toward and towards) and Variety of English (with the values British and American). Obviously, this gives us four intersections: British ∩ toward, British ∩ towards, American ∩ toward and American ∩ towards. If we check the frequency of these intersections in the LOB and BROWN corpora and represent the results as a contingency table, we get the following:

toward towards Total
British 318 14 332
American 64 386 450
Total 382 400 782

Prerequisites

  • There are two ways of creating a contingency table, shown below. For the second of these ways, your data must be in the form of a data frame (see Data Frames).

How to create a contingency table

There are two ways of creating a contingency table in R: you can enter the values manually, or you can create the table from a raw data list in the form of a data frame.

Creating a contingency table manually

In order to create a contingency table manually, you first have to create a vector (see Vectors) containing the values, and store this vector in a variable. For the table above, this vector would look like this (if we call the variable myvector – of course, we can give it any name we want):

c(318, 64, 14, 386) -> myvector

These are the values in the first column followed by those in the second column – the totals are not part of the table – if we need them, we can have R calculate them later.

The first step in transforming this vector to a table is to use the command matrix(), which takes a vector as input and transforms it to a table with a certain number of columns, specified using the ncol option. In our case, this would look as follows (if we call the variable mytable):

matrix(myvector, ncol=2) -> mytable

If you display this variable (by typing mytable and hitting return), you get the following:

     [,1] [,2]
[1,]  318   14
[2,]   64  386

The values are displayed in the right way, but the rows and columns do not have names yet. You can refer to them by using the indices shown: for example, to display the first row of the table, type mytable[1,], to display the second column, type mytable[,2], and to display a specific cell, give both the row and the column number, e.g. mytable[1,2] to display the second cell in the first row.

Strictly speaking, this is all you need to use this contingency table in other contexts, but you may want to add row and column labels so that you and others know what information is contained in this table. To add row and column labels, you use the functions rownames() and colnames(): as their names suggest, these functions provide access to the parts of a contingency table that contain the row and column labels, so you can simply construct a vector that contains the correct number of text strings and assign this vector to the relevant part of the table:

rownames(mytable) <- c("British", "American")
colnames(mytable) <- c("toward", "towards")

If you now display the table (by typing mytable and hitting return), you get the following:

         toward towards
British     318      14
American     64     386

You can still refer to the columns, rows and cells in the way just described, but you can also use the labels instead of numbers (you need to put them in quotation marks, as they are text strings). For example, to display the first row of the table, you can type mytable["British",], to display the second column, you can type mytable[,"towards"], and to display a specific cell, give both the row and the column number, e.g. mytable["British","towards"] to display the second cell in the first row.

From a data frame

If you have imported a raw data table as a data frame (see Data Frames), you can crosstabulate two columns of this data frame to create a contingency table. There is a sample csv file containing the distribution of the word forms toward and towards across different genres in British and American English (from the LOB and BROWN corpora here: data-towards.csv. Import it into a data frame called Toward (as described in Importing Data).

You can now create a contingency table using the table() command, which needs two columns from the data frame as input. Use the command head() to display the first few rows of the data frame:

head(Toward)

You will see the following:

  Variety           Genre Variant
1 British Press_Reportage towards
2 British Press_Reportage towards
3 British Press_Reportage  toward
4 British Press_Reportage towards
5 British Press_Reportage towards
6 British Press_Reportage towards

The first and the third column are relevant to our contingency table. They can be referred to by Toward$Variety and Toward$Variant, so the following command will produce a contingency table:

table(Toward$Variety,Toward$Variant) -> mytable

Type mytable to display it:

	  
         toward towards
American    386      64
British      14     318

As you can see, this is the same table you created manually in the preceding section, but the rows and columns are ordered differently: the table command orders rows and columns alphabetically. If you don't like this order, you can reorder them (see below).

Adding rows and columns to a matrix

Adding data

You can add rows or columns to an existing matrix, no matter how you created it. For rows, this is done by using the rbind() command. First, create a variable containing a vector with the numbers you want to add as a row, and name this variable as you want the new row to be named. For example, to add the frequencies of toward and towards in Indian English to the corpus (the data are from the KOLHAPUR corpus):

Indian <- c(18,337)

The rbind() command needs two arguments: the matrix to which you want to add a row, and the vector containing the row you want to add. Let us write the result to the same variable mytable:

rbind(mytable, Indian) -> mytable

If you now type mytable, you will get the following:

         toward towards
American    386      64
British      14     318
Indian       17     327

The cbind() command works in the same way. For example, to add a column containing the frequencies of the expression in the direction of, we create a corresponding variable (the frequencies are from BROWN, LOB and KOLHAPUR):

in_the_direction_of <- c(11,12,11)

We then add this column to our table:

cbind(mytable, in_the_direction_of) -> mytable

Typing mytable gives us:

         toward towards in_the_direction_of
American    386      64                  11
British      14     318                  12
Indian       18     337                  11

Adding row and column totals

As mentioned above, your contingency table should contain only the intersections of your variables (shown in bold in the introduction), not the row totals, column totals and table total: statistical procedures expect a matrix to contain only data, if the totals are needed, they will be calculated internally. Also, if you want to create a box plot from a contingency table (see Box Plots), it should not contain any totals. However, when you show a table in a research report, it should contain totals, so here is how to add them.

R has special commands for creating these totals: rowSums() and colSums(), which take a matrix as an argument and produce a variable containing the row or column totals. For example, typing rowSums(mytable) produces the following output:

American  British   Indian 
     461      344      366 

Let us add these row totals to our table using the cbind() command (note that the row totals must be added as a column and vice versa) and store the result in a new variable mytable_totals

cbind(mytable,rowSums(mytable)) -> mytable_totals

Typing mytable_totals displays the following:

         toward towards in_the_direction_of    
American    386      64                  11 461
British      14     318                  12 344
Indian       18     337                  11 366

If you want to add the column name Totals, use the colnames() command introduced above – since you only want to change the fourth position, attach [4] to the end:

colnames(mytable_totals)[4] <- "Total"

Now, let us add the column totals to mytable_totals as a new row, storing the result in the same variable:

rbind(mytable_totals,colSums(mytable_totals)) -> mytable_totals

Let's add the row name Total using rownames():

rownames(mytable_totals)[4] <- "Total"

Typing mytable_totals now gives us the following:

         toward towards in_the_direction_of Total
American    386      64                  11   461
British      14     318                  12   344
Indian       18     337                  11   366
Total       418     719                  34  1171

Reordering rows and columns of a matrix

As mentioned above, table() put your rows and columns in alphabetical order. If you want them in a different order, there are various ways of doing so – none of them very straightforward, but also not very complicated. The easiest way is to exploit the possibility of accessing individual cells by giving the row and column number in square brackets, as shown above.

Instead of giving an individual row and column number, you can give a vector of numbers. For example, to display top left cell of our table (the cell in the first row and first column), you type mytable[1,1] – so, to display all cells of our table, you type mytable[c(1,2,3), c(1,2,3)] (try it).

Now, if you want rows and/or columns displayed in a different order, you simply order them differently in the vectors. For example, the order in the variable mytable is as follows:

         toward towards in_the_direction_of
American    386      64                  11
British      14     318                  12
Indian       18     337                  11

If we want to change the order of rows to British, Indian, American, and the order of columns to in_the_direction_of, towards, toward, you type:

mytable[c(2,3,1), c(3,2,1)]

This gives you:

         in_the_direction_of towards toward
British                   12     318     14
Indian                    11     337     18
American                  11      64    386

Of course, you can store this new order in a variable, if you want.

Additional information

  • matrix elements in R behave pretty much like matrices in the mathematical sense, for example, in operations such as addition, subtraction, multiplication etc.
  • You can rotate a matrix clockwise using the command t(), with the matrix as argument.
  • The best way of displaying the content of a contingency table visually is usually a bar plot (see Bar Plots)
  • There are various contingency tests that use a matrix as input in R, for example, the famous chi-square test (see Chi-Square Test)
r/contingency-tables.txt · Last modified: 2022/10/05 10:19 by astefanowitsch