The \(\chi^2\) goodness-of-fit test is applied to perform hypothesis tests about the distribution of a qualitative (categorical) variable or a discrete quantitative variable that has only finitely many possible values.
The basic logic of the \(\chi^2\) goodness-of-fit test is to compare the frequencies of two variables. We compare the observed frequencies of a sample with the expected frequencies.
Consider a simple example:
On September 22, 2013 the German Federal Election 2013 was held. More than 44 million people turned out to vote. 41.5% of German voters decided to vote for the Christian Democratic Union (CDU) and 25.7% for the Social Democratic Party (SPD). For the sake of simplicity we subsume the remaining percentage of votes (32.8%) as Others.
Based on that data, we may build a frequency table.
\[ \begin{array}{|l|c|} \hline \ \text{Party} & \text{Percentage} & \text{Relative frequency}\\ \hline \ \text{CDU} & 41.5 & 0.415 \\ \ \text{SPD} & 25.7 & 0.257 \\ \ \text{Others} & 32.8 & 0.328 \\ \hline \ & 100 & 1 \\ \hline \end{array} \]
The third column of the table above corresponds to the relative frequencies of the German population/voters. For this exercise we we take a random sample. We ask 123 students of FU Berlin about their party affiliation and record the following answers.
## [1] "SPD" "CDU" "Others" "CDU" "CDU" "CDU" "CDU"
## [8] "CDU" "CDU" "SPD" "CDU" "Others" "SPD" "CDU"
## [15] "SPD" "Others" "CDU" "Others" "CDU" "CDU" "Others"
## [22] "Others" "SPD" "CDU" "CDU" "CDU" "Others" "CDU"
## [29] "CDU" "CDU" "Others" "CDU" "CDU" "SPD" "Others"
## [36] "CDU" "CDU" "CDU" "CDU" "SPD" "CDU" "Others"
## [43] "CDU" "CDU" "Others" "SPD" "Others" "CDU" "SPD"
## [50] "SPD" "Others" "SPD" "Others" "Others" "SPD" "SPD"
## [57] "CDU" "Others" "Others" "Others" "CDU" "CDU" "CDU"
## [64] "SPD" "Others" "SPD" "Others" "Others" "CDU" "SPD"
## [71] "CDU" "SPD" "Others" "CDU" "Others" "CDU" "Others"
## [78] "CDU" "Others" "SPD" "CDU" "CDU" "Others" "Others"
## [85] "CDU" "Others" "Others" "SPD" "CDU" "Others" "CDU"
## [92] "Others" "CDU" "CDU" "CDU" "CDU" "SPD" "SPD"
## [99] "Others" "Others" "SPD" "CDU" "Others" "CDU" "Others"
## [106] "Others" "SPD" "SPD" "CDU" "Others" "CDU" "CDU"
## [113] "SPD" "CDU" "Others" "CDU" "CDU" "CDU" "CDU"
## [120] "CDU" "Others" "Others" "SPD"
In the next step we count the occurrence of each category (party) in our sample. These quantities are the observed frequencies.
## sample.FUB
## CDU Others SPD
## 57 40 26
In the next step we compute the expected frequency, denoted \(E\), for each category.
\[E = n \times p\text{,}\]
where \(n\) is the sample size and \(p\) is the corresponding relative frequency taken from the table above.
\[E_{CDU} = n\times p = 123 \times 0.415 = 51.045\]
\[E_{SPD} = n\times p = 123 \times 0.257 = 31.611\]
\[E_{Others} = n\times p = 123 \times 0.382 = 46.986\]
Note that although we deal with individual counts, represented by integer values, the expected frequency, \(E\), is a floating point number. That is fine.
Now we put the observed frequencies and the expected frequencies together into one table.
\[ \begin{array}{|l|c|} \hline \ \text{Party} & \text{Observed frequency} & \text{Expected frequency}\\ \hline \ \text{CDU} & 57 & 51.045 \\ \ \text{SPD} & 26 & 31.611 \\ \ \text{Others} & 40 & 46.986 \\ \hline \ & 123 & 129.642 \\ \hline \end{array} \] Great! Once we know about the expected frequencies we have to check for two assumptions. First, we have to make sure that all expected frequencies are 1 or greater, and second, at most 20% of the expected frequencies are less than 5. By looking at the table we may confirm that both assumptions are fulfilled.
Now we have all ingredients we need, except the test statistic, to perform a \(\chi^2\) goodness-of-fit test.
The \(\chi^2\) test statistic for a goodness-of-fit is given by
\[\chi^2=\sum \frac{(O-E)^2}{E}\text{,}\] where \(O\) corresponds to the observed frequencies and \(E\) to the expected frequencies. The test statistic \(\chi^2\) approximates a chi-square distribution if the null hypothesis is true. The number of degrees of freedom is 1 less than the number of possible values (categories) for the variable under consideration.
\[df = c-1\]
Based on the observed and expected frequencies given in the table above it is fairly straightforward to calculate the \(\chi^2\)-value, however, to make the calculation procedure more clear, we put together all the necessary computational steps into one table.
\[ \begin{array}{|l|c|} \hline \ \text{Religion} & \text{Observed} & \text{Expected} & \text{Difference} & \text{Square of difference} & \chi^2\text{subtotal}\\ \ & \text{frequency} & \text{frequency} & O-E & (O-E)^2 & (O-E)^2/E\\ \hline \ \text{CDU} & 57 & 51.045 & 5.955 & 35.462025 & 0.6947208\\ \ \text{SPD} & 26 & 31.611 & -5.611 & 31.483321 & 0.9959609\\ \ \text{Others} & 26 & 46.986 & -6.986 & 48.804196 & 1.0386965\\ \hline \ & 123 & 129.642 & -6.642 & & 2.7293783\\ \hline \end{array} \]
In our example the \(\chi^2\) test statistic for a goodness-of-fit evaluates to
\[\chi^2=\sum \frac{(O-E)^2}{E} \approx2.729\]
If the null hypothesis is true, the observed and expected frequencies are roughly equal. This results in a small value of the \(\chi^2\) test statistic, thus, supporting \(H_0\). If, however, the value of the \(\chi^2\) test statistic is large, the data provides evidence against \(H_0\).
In order to get some hands-on experience we apply the \(\chi^2\) goodness-of-fit test in an exercise. Therefore we load the students data set. You may download the students.csv
file here. Import the data set and assign a proper name to it.
students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")
The students data set consists of 8239 rows, each of them representing a particular student, and 16 columns, each of them corresponding to a variable/feature related to that particular student. These self-explaining variables are: stud.id, name, gender, age, height, weight, religion, nc.score, semester, major, minor, score1, score2, online.tutorial, graduated, salary.
Recall, \(\chi^2\) goodness-of-fit tests are applied for qualitative (categorical) variables or a discrete quantitative variables. In the students data set are several categorical variables, such as gender
, religion
, major
, minor
and graduated
, among others.
In order to showcase the \(\chi^2\) goodness-of-fit test we examine if the religion is equally distributed among students compared to the distribution of the religion among the population of the European Union. The data on continental scale is provided by the report “Discrimination in the EU in 2012” (European Union: European Commission, Special Eurobarometer, 393, p. 233). The report provides data for 8 categories: 48% of the people are ascribed as Catholic, 16% as Non believer/Agnostic, 12% as Protestant, 8% as Orthodox, 7% as Atheist, 4% as Other Christian, 3% as Other religion/None stated and 2% as Muslim. We plot the data in form of a pie chart for a better understanding.
data <- c(48, 16, 12, 8, 7, 4, 3, 2)
data.labels <- c('Catholic','Non believer/\nAgnostic','Protestant',
'Orthodox','Atheist','Other Christian',
'Other religion/None stated', 'Muslim')
par( mar = c(3,2,3,2))
library(RColorBrewer)
cols = brewer.pal(length(data ),"Set3")
pie(x = data,
labels = data.labels,
col = cols,
radius = 1)
We start with data exploration and data preparation.
First, we want to know which categories are available in the data set. Therefore we apply the levels()
function, which provides access to the levels (categories) of a variable.
levels(students$religion)
## [1] "Catholic" "Muslim" "Orthodox" "Other" "Protestant"
Obviously, in the students data there are 5 different categories, compared to 8 categories provided by the report of the EU. Thus, in order to make comparisons, we recode the categories of the religion
variable to finally get to 5 categories: “Catholic”, “Muslim”, “Orthodox”, “Protestant”, and “Other”. Be careful not to mix-up categories during that step!
# set category names
data.labels <- c("Catholic", "Muslim", "Orthodox", "Other", "Protestant")
# recode European data according to category names
data.raw <- c(48, 2, 8, sum(16, 7, 4, 3), 12)
# gernate a data.frame object
data <- data.frame(data.raw/100)
row.names(data) <- data.labels
colnames(data) <- 'relative.frequency'
data
## relative.frequency
## Catholic 0.48
## Muslim 0.02
## Orthodox 0.08
## Other 0.30
## Protestant 0.12
Now we take a random sample. We pick at random 256 students and count with the table()
function the number of students in each particular category of the religion
variable. Recall this quantity corresponds to the observed frequencies.
n <- 256
students.sample <- sample(students$religion, n)
O.frequencies <- table(students.sample)
O.frequencies
## students.sample
## Catholic Muslim Orthodox Other Protestant
## 83 16 16 88 53
With one line of code we insert the observed frequencies in data
, the data.frame
we constructed above.
data$observed.frequencies <- O.frequencies
data
## relative.frequency observed.frequencies
## Catholic 0.48 83
## Muslim 0.02 16
## Orthodox 0.08 16
## Other 0.30 88
## Protestant 0.12 53
In the next step we calculate the expected frequencies. Recall the equation:
\[E = n \times p\] We insert the the expected frequencies as a new column in data
, the data.frame
we constructed above.
n <- 256
data$expected.frequencies <- n * data$relative.frequency
data
## relative.frequency observed.frequencies expected.frequencies
## Catholic 0.48 83 122.88
## Muslim 0.02 16 5.12
## Orthodox 0.08 16 20.48
## Other 0.30 88 76.80
## Protestant 0.12 53 30.72
Once we know about the expected frequencies we have to check for two assumptions. First, we have to make sure that all expected frequencies are 1 or greater, and second, at most 20% of the expected frequencies are less than 5. By looking at the table we may confirm that both assumptions are fulfilled.
Perfect, now we are done! The data set is ready to be analysed with the \(\chi^2\) goodness-of-fit test. Recall the question we are interested in: Is the religion equally distributed among students compared to the distribution of the religion among the population of the European Union.
In order to conduct the \(\chi^2\) goodness-of-fit test we follow the step-wise implementation procedure for hypothesis testing. The \(\chi^2\) goodness-of-fit test follows the same step-wise procedure as hypothesis tests for the population mean.
\[ \begin{array}{l} \hline \ \text{Step 1} & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\ \ \text{Step 2} & \text{Decide on the significance level, } \alpha\text{.} \\ \ \text{Step 3} & \text{Compute the value of the test statistic.} \\ \ \text{Step 4} &\text{Determine the p-value.} \\ \ \text{Step 5} & \text{If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\ \ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\ \hline \end{array} \]
Step 1: State the null hypothesis \(H_0\) and alternative hypothesis \(H_A\)
The null hypothesis states that the religion is equally distributed among students compared to the distribution of the religion among the population of the European Union.
\[H_0: \quad \text{The variable has the specified distribution}\]
Alternative hypothesis \[H_A: \quad \text{The variable does not have the specified distribution} \]
Step 2: Decide on the significance level, \(\alpha\)
\[\alpha = 0.01\]
alpha <- 0.01
Step 3 and 4: Compute the value of the test statistic and the p-value.
For illustration purposes we manually compute the test statistic in R. Recall the equation for the test statistic from above:
\[\chi^2=\sum \frac{(O-E)^2}{E}\]
# Compute the value of the test statistic
x2 <- sum((data$observed.frequencies - data$expected.frequencies)^2 / data$expected.frequencies)
x2
## [1] 54.83496
The numerical value of the test statistic is \(\approx 54.83\).
In order to calculate the p-value we apply the pchisq()
function. Recall how to calculate the degrees of freedom:
\[df = (c - 1)\]
# Compute df
df <- nrow(data)-1
# Compute the p-value
p <- pchisq(q = x2, df = df, lower.tail = FALSE)
p
## [1] 3.518237e-11
Step 5: If \(p \le \alpha\), reject \(H_0\); otherwise, do not reject \(H_0\).
p <= alpha
## [1] TRUE
The p-value is less than the specified significance level of 0.01; we reject \(H_0\). The test results are statistically significant at the 1% level and provide very strong evidence against the null hypothesis.
Step 6: Interpret the result of the hypothesis test.
\(p = 3.518237\times 10^{-11}\). At the 1% significance level, the data provides very strong evidence to conclude that the religion distribution among students differs from the religion distribution of the population of the European Union.
We just completed a \(\chi^2\) goodness-of-fit test in R manually. Very cool, but now we redo that example and make use of the R machinery to obtain the same result as above by just one line of code!
In order to conduct a \(\chi^2\) goodness-of-fit test in R we apply the chisq.test()
function. We provide two vectors as data inputdata$observed.frequencies
and data$expected.frequencies
.
chisq.test(data$observed.frequencies, p = data$relative.frequency)
##
## Chi-squared test for given probabilities
##
## data: data$observed.frequencies
## X-squared = 54.835, df = 4, p-value = 3.518e-11
Worked out fine! Compare the output of the chisq.test()
function with our result from above. Again, we may conclude that at the 1% significance level, the data provides very strong evidence to conclude that the religion distribution among students differs from the religion distribution of the population of the European Union.