**[ [[R:introduction|Collection: Introduction to R]] ]** ====== Contingency tables ====== ===== Introduction ===== Contingency tables (also called //cross tabulations//) are tables showing the intersections of two variables. For example, there are two variants of the preposition //toward(s)// (“in the direction of”): one with an //s// at the end and one without. There are several national varieties of English, very prominent among them British English and American English. Both variants of the preposition occur in both varieties, thus, we have two variables (Variant of the Preposition, with the values //toward// and //towards//) and Variety of English (with the values British and American). Obviously, this gives us four intersections: British ∩ //toward//, British ∩ //towards//, American ∩ //toward// and American ∩ towards. If we check the frequency of these intersections in the LOB and BROWN corpora and represent the results as a contingency table, we get the following: ^ ^ //toward// ^ //towards// | Total ^ ^ British | **318** | **14** | 332 | ^ American | **64** | **386** | 450 | | Total | 382 | 400 | 782 | ===== Prerequisites ===== * There are two ways of creating a contingency table, shown below. For the second of these ways, your data must be in the form of a data frame (see [[r:data-frames|Data Frames]]). ===== How to create a contingency table ===== There are two ways of creating a contingency table in R: you can enter the values manually, or you can create the table from a raw data list in the form of a data frame. ==== Creating a contingency table manually ==== In order to create a contingency table manually, you first have to create a vector (see [[r:vectors|Vectors]]) containing the values, and store this vector in a variable. For the table above, this vector would look like this (if we call the variable ''myvector'' -- of course, we can give it any name we want): c(318, 64, 14, 386) -> myvector These are the values in the first column followed by those in the second column -- the totals are not part of the table -- if we need them, we can have R calculate them later. The first step in transforming this vector to a table is to use the command ''matrix()'', which takes a vector as input and transforms it to a table with a certain number of columns, specified using the ''ncol'' option. In our case, this would look as follows (if we call the variable ''mytable''): matrix(myvector, ncol=2) -> mytable If you display this variable (by typing ''mytable'' and hitting return), you get the following: [,1] [,2] [1,] 318 14 [2,] 64 386 The values are displayed in the right way, but the rows and columns do not have names yet. You can refer to them by using the indices shown: for example, to display the first row of the table, type ''mytable[1,]'', to display the second column, type ''mytable[,2]'', and to display a specific cell, give both the row and the column number, e.g. ''mytable[1,2]'' to display the second cell in the first row. Strictly speaking, this is all you need to use this contingency table in other contexts, but you may want to add row and column labels so that you and others know what information is contained in this table. To add row and column labels, you use the functions ''rownames()'' and ''colnames()'': as their names suggest, these functions provide access to the parts of a contingency table that contain the row and column labels, so you can simply construct a vector that contains the correct number of text strings and assign this vector to the relevant part of the table: rownames(mytable) <- c("British", "American") colnames(mytable) <- c("toward", "towards") If you now display the table (by typing ''mytable'' and hitting return), you get the following: toward towards British 318 14 American 64 386 You can still refer to the columns, rows and cells in the way just described, but you can also use the labels instead of numbers (you need to put them in quotation marks, as they are text strings). For example, to display the first row of the table, you can type ''mytable["British",]'', to display the second column, you can type ''mytable[,"towards"]'', and to display a specific cell, give both the row and the column number, e.g. ''mytable["British","towards"]'' to display the second cell in the first row. ==== From a data frame ==== If you have imported a raw data table as a data frame (see [[r:data-frames|Data Frames]]), you can crosstabulate two columns of this data frame to create a contingency table. There is a sample csv file containing the distribution of the word forms //toward// and //towards// across different genres in British and American English (from the LOB and BROWN corpora here: {{ :r:data-towards.zip |data-towards.csv}}. Import it into a data frame called ''Toward'' (as described in [[r:importing-data|Importing Data]]). You can now create a contingency table using the ''table()'' command, which needs two columns from the data frame as input. Use the command ''head()'' to display the first few rows of the data frame: head(Toward) You will see the following: Variety Genre Variant 1 British Press_Reportage towards 2 British Press_Reportage towards 3 British Press_Reportage toward 4 British Press_Reportage towards 5 British Press_Reportage towards 6 British Press_Reportage towards The first and the third column are relevant to our contingency table. They can be referred to by ''Toward$Variety'' and ''Toward$Variant'', so the following command will produce a contingency table: table(Toward$Variety,Toward$Variant) -> mytable Type ''mytable'' to display it: toward towards American 386 64 British 14 318 As you can see, this is the same table you created manually in the preceding section, but the rows and columns are ordered differently: the ''table'' command orders rows and columns alphabetically. If you don't like this order, you can reorder them (see below). ===== Adding rows and columns to a matrix ===== ==== Adding data ==== You can add rows or columns to an existing matrix, no matter how you created it. For rows, this is done by using the ''rbind()'' command. First, create a variable containing a vector with the numbers you want to add as a row, and name this variable as you want the new row to be named. For example, to add the frequencies of //toward// and //towards// in Indian English to the corpus (the data are from the KOLHAPUR corpus): Indian <- c(18,337) The ''rbind()'' command needs two arguments: the matrix to which you want to add a row, and the vector containing the row you want to add. Let us write the result to the same variable ''mytable'': rbind(mytable, Indian) -> mytable If you now type ''mytable'', you will get the following: toward towards American 386 64 British 14 318 Indian 17 327 The ''cbind()'' command works in the same way. For example, to add a column containing the frequencies of the expression //in the direction of//, we create a corresponding variable (the frequencies are from BROWN, LOB and KOLHAPUR): in_the_direction_of <- c(11,12,11) We then add this column to our table: cbind(mytable, in_the_direction_of) -> mytable Typing ''mytable'' gives us: toward towards in_the_direction_of American 386 64 11 British 14 318 12 Indian 18 337 11 ==== Adding row and column totals ==== As mentioned above, your contingency table should contain only the intersections of your variables (shown in bold in the introduction), not the row totals, column totals and table total: statistical procedures expect a matrix to contain only data, if the totals are needed, they will be calculated internally. Also, if you want to create a box plot from a contingency table (see [[r:box-plots|Box Plots]]), it should not contain any totals. However, when you show a table in a research report, it should contain totals, so here is how to add them. R has special commands for creating these totals: ''rowSums()'' and ''colSums()'', which take a matrix as an argument and produce a variable containing the row or column totals. For example, typing ''rowSums(mytable)'' produces the following output: American British Indian 461 344 366 Let us add these row totals to our table using the ''cbind()'' command (note that the //row// totals must be added as a //column// and vice versa) and store the result in a new variable ''mytable_totals'' cbind(mytable,rowSums(mytable)) -> mytable_totals Typing ''mytable_totals'' displays the following: toward towards in_the_direction_of American 386 64 11 461 British 14 318 12 344 Indian 18 337 11 366 If you want to add the column name //Totals//, use the ''colnames()'' command introduced above -- since you only want to change the fourth position, attach ''[4]'' to the end: colnames(mytable_totals)[4] <- "Total" Now, let us add the column totals to ''mytable_totals'' as a new row, storing the result in the same variable: rbind(mytable_totals,colSums(mytable_totals)) -> mytable_totals Let's add the row name //Total// using ''rownames()'': rownames(mytable_totals)[4] <- "Total" Typing ''mytable_totals'' now gives us the following: toward towards in_the_direction_of Total American 386 64 11 461 British 14 318 12 344 Indian 18 337 11 366 Total 418 719 34 1171 ===== Reordering rows and columns of a matrix ===== As mentioned above, ''table()'' put your rows and columns in alphabetical order. If you want them in a different order, there are various ways of doing so -- none of them very straightforward, but also not very complicated. The easiest way is to exploit the possibility of accessing individual cells by giving the row and column number in square brackets, as shown above. Instead of giving an individual row and column number, you can give a vector of numbers. For example, to display top left cell of our table (the cell in the first row and first column), you type ''mytable[1,1]'' -- so, to display all cells of our table, you type ''mytable[c(1,2,3), c(1,2,3)]'' (try it). Now, if you want rows and/or columns displayed in a different order, you simply order them differently in the vectors. For example, the order in the variable ''mytable'' is as follows: toward towards in_the_direction_of American 386 64 11 British 14 318 12 Indian 18 337 11 If we want to change the order of rows to //British//, //Indian//, //American//, and the order of columns to //in_the_direction_of//, //towards//, //toward//, you type: mytable[c(2,3,1), c(3,2,1)] This gives you: in_the_direction_of towards toward British 12 318 14 Indian 11 337 18 American 11 64 386 Of course, you can store this new order in a variable, if you want. ===== Additional information ===== * ''matrix'' elements in R behave pretty much like matrices in the mathematical sense, for example, in operations such as addition, subtraction, multiplication etc. * You can rotate a matrix clockwise using the command ''t()'', with the matrix as argument. * The best way of displaying the content of a contingency table visually is usually a bar plot (see [[r:bar-plots|Bar Plots]]) * There are various contingency tests that use a matrix as input in R, for example, the famous chi-square test (see [[r:chi-square-test|Chi-Square Test]])