20761_chi-square_goodness-of-fit

The $\chi^2$ goodness-of-fit test is applied to perform hypothesis tests on the distribution of a qualitative (categorical) variable or a discrete quantitative variable, that has only finitely many possible values.

The basic logic of the $\chi^2$ goodness-of-fit test is to compare the frequencies of two variables. We compare the observed frequencies of a sample with the expected frequencies.

Consider a simple example:

On September 22, 2013 the German Federal Election 2013 was held. More than 44 million people turned up to vote. 41.5 % of German voters decided to vote for the Christian Democratic Union (CDU) and 25.7 % for the Social Democratic Party (SPD). For the sake of simplicity we subsume the remaining percentage of votes (32.8 %) as Others.

Based on that data, we may build a frequency table:

\[ \begin{array}{|l|c|} \hline \ \text{Party} & \text{Percentage} & \text{Relative frequency}\\ \hline \ \text{CDU} & 41.5 & 0.415 \\ \ \text{SPD} & 25.7 & 0.257 \\ \ \text{Others} & 32.8 & 0.328 \\ \hline \ & 100 & 1 \\ \hline \end{array} \]

The third column of the table above corresponds to the relative frequencies of the German population/voters. For this exercise we take a random sample. We ask 123 students of FU Berlin about their party affiliation and record the following answers:

##   [1] "CDU"    "SPD"    "Others" "SPD"    "SPD"    "CDU"    "Others" "SPD"   
##   [9] "Others" "Others" "SPD"    "Others" "Others" "Others" "CDU"    "SPD"   
##  [17] "CDU"    "CDU"    "CDU"    "SPD"    "SPD"    "Others" "Others" "SPD"   
##  [25] "Others" "SPD"    "Others" "Others" "CDU"    "CDU"    "SPD"    "SPD"   
##  [33] "Others" "SPD"    "CDU"    "Others" "SPD"    "CDU"    "CDU"    "CDU"   
##  [41] "CDU"    "Others" "Others" "CDU"    "CDU"    "CDU"    "CDU"    "Others"
##  [49] "CDU"    "SPD"    "CDU"    "Others" "SPD"    "CDU"    "Others" "CDU"   
##  [57] "CDU"    "SPD"    "SPD"    "Others" "Others" "CDU"    "Others" "CDU"   
##  [65] "SPD"    "Others" "SPD"    "SPD"    "SPD"    "Others" "SPD"    "Others"
##  [73] "SPD"    "CDU"    "Others" "CDU"    "Others" "Others" "CDU"    "CDU"   
##  [81] "CDU"    "Others" "Others" "SPD"    "CDU"    "Others" "SPD"    "SPD"   
##  [89] "SPD"    "CDU"    "CDU"    "Others" "CDU"    "Others" "CDU"    "CDU"   
##  [97] "SPD"    "CDU"    "Others" "Others" "Others" "CDU"    "Others" "SPD"   
## [105] "Others" "SPD"    "SPD"    "Others" "Others" "CDU"    "SPD"    "CDU"   
## [113] "CDU"    "SPD"    "SPD"    "CDU"    "Others" "SPD"    "Others" "Others"
## [121] "Others" "CDU"    "CDU"

In the next step we count the occurrence of each category (party) in our sample. These quantities are the observed frequencies:

## sample_FUB
##    CDU Others    SPD 
##     43     44     36

In the next step we compute the expected frequency, denoted $E$, for each category:

\[E = n \times p\text{,}\]

where $n$ is the sample size and $p$ is the corresponding relative frequency taken from the table above.

\[E_{CDU} = n\times p = 123 \times 0.415 = 51.045\]

\[E_{SPD} = n\times p = 123 \times 0.257 = 31.611\]

\[E_{Others} = n\times p = 123 \times 0.328 = 40.344\]

Note: Although we deal with individual counts, represented by integer values, the expected frequency, $E$, is a floating point number. That is fine.

Now, we put the observed frequencies and the expected frequencies together into one table:

\[ \begin{array}{|l|c|} \hline \ \text{Party} & \text{Observed frequency} & \text{Expected frequency}\\ \hline \ \text{CDU} & 43 & 51.045 \\ \ \text{SPD} & 36 & 31.611 \\ \ \text{Others} & 44 & 40.344 \\ \hline \ & 123 & 123 \\ \hline \end{array} \]

Great! Once we have the expected frequencies, we have to check for two assumptions. First, we have to make sure that all expected frequencies are 1 or greater, and second, at most 20 % of the expected frequencies are less than 5. By looking at the table we may confirm that both assumptions are fulfilled.

Now, we have all ingredients we need, except the test statistic, to perform a $\chi^2$ goodness-of-fit test.

The $\chi^2$ test statistic for a goodness-of-fit is given by

\[\chi^2=\sum \frac{(O-E)^2}{E}\text{,}\] where $O$ corresponds to the observed frequencies and $E$ to the expected frequencies. The test statistic $\chi^2$ approximates a chi-square distribution, if the null hypothesis is true. The number of degrees of freedom is 1 less than the number of possible values (categories) for the variable under consideration:

\[df = c-1\]

Based on the observed and expected frequencies given in the table above it is fairly straightforward to calculate the $\chi^2$-value. However, to make the calculation procedure easier to follow, we put together all the necessary computational steps into one table:

\[ \begin{array}{|l|c|} \hline \ \text{Religion} & \text{Observed} & \text{Expected} & \text{Difference} & \text{Square of difference} & \chi^2\text{ subtotal}\\ \ & \text{frequency} & \text{frequency} & O-E & (O-E)^2 & (O-E)^2/E\\ \hline \ \text{CDU} & 43 & 51.045 & -8.045 & 64.722025 & 1.2679405\\ \ \text{SPD} & 36 & 31.611 & 4.389 & 19.263321 & 0.6093866\\ \ \text{Others} & 36 & 40.344 & 3.656 & 13.366336 & 0.3313091\\ \hline \ & 123 & 123 & 3.5527137\times 10^{-15} & & 2.2086363\\ \hline \end{array} \]

In our example the $\chi^2$ test statistic for a goodness-of-fit evaluates to

\[\chi^2=\sum \frac{(O-E)^2}{E} \approx2.209\]

If the null hypothesis is true, the observed and expected frequencies are roughly equal. This results in a small value of the $\chi^2$ test statistic, thus, supporting $H_0$. If, however, the value of the $\chi^2$ test statistic is large, the data provides evidence against $H_0$.

In our case, we may compare the empirical $\chi^2$ test statistic with the corresponding critical $\chi^2$ value for a significance level of 95 % with a degree of freedom of 3 categories minus 1:

qchisq(0.95, df = 3 - 1)

## [1] 5.991465

Since our empirical $\chi^2$ value is smaller than the critical $\chi^2$ value, we cannot reject the null hypotheses!

$\chi^2$ goodness-of-fit test: An Example

In order to get some hands-on experience we apply the $\chi^2$ goodness-of-fit test in an exercise. For this we load the students data set, which you may also download here.

students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")

The students data set consists of 8239 rows, each of them representing a particular student, and 16 columns, each of them corresponding to a variable/feature related to that particular student. These self-explaining variables are: stud.id, name, gender, age, height, weight, religion, nc.score, semester, major, minor, score1, score2, online.tutorial, graduated, salary.

Recall, $\chi^2$ goodness-of-fit tests are applied for qualitative (categorical) variables or discrete quantitative variables. There are several categorical variables in the students data set, such as gender, religion, major, minor and graduated, among others.

In order to showcase the $\chi^2$ goodness-of-fit test we examine, if religions are equally distributed among students compared to the distribution of religions among the population of the European Union. The data on continental scale is provided in the report “Discrimination in the EU in 2012” (European Union: European Commission, Special Eurobarometer, 393, p. 233). The report provides data for 8 categories: 48 % of the people are ascribed as Catholic, 16 % as Non believer/Agnostic, 12 % as Protestant, 8% as Orthodox, 7% as Atheist, 4 % as Other Christian, 3 % as Other religion/None stated and 2 % as Muslim. We plot the data in form of a pie chart for a better understanding:

data <- c(48, 16, 12, 8, 7, 4, 3, 2)
data_labels <- c(
  "Catholic", "Non believer/\nAgnostic", "Protestant",
  "Orthodox", "Atheist", "Other Christian",
  "Other religion/None stated", "Muslim"
)

par(mar = c(3, 2, 3, 2))
library(RColorBrewer)
cols <- brewer.pal(length(data), "Set3")

pie(
  x = data,
  labels = data_labels,
  col = cols,
  radius = 1
)

Data preparation

We start with data exploration and data preparation.

First, we want to know which categories are available in the students data set. Therefore, we apply the unique() function, which provides access to the levels (categories) of a variable.

unique(students$religion)

## [1] "Muslim"     "Other"      "Protestant" "Catholic"   "Orthodox"

Obviously, in the students data set there are 0 different categories, compared to 8 categories provided by the report of the EU. Thus, in order to make comparisons, we summarize the categories of EU report to 5 categories: “Catholic”, “Muslim”, “Orthodox”, “Protestant” and “Other”. Be careful not to mix-up categories during that step!

# set category names
data_labels <- c("Catholic", "Muslim", "Orthodox", "Other", "Protestant")

# recode European data according to category names
data_raw <- c(48, 2, 8, sum(16, 7, 4, 3), 12)

# generate a data.frame object
data <- data.frame(data_raw / 100)
row.names(data) <- data_labels
colnames(data) <- "relative_frequency"
data

##            relative_frequency
## Catholic                 0.48
## Muslim                   0.02
## Orthodox                 0.08
## Other                    0.30
## Protestant               0.12

Now, we take a random sample. We randomly pick 256 students and count the number of students in each particular category of the religion using the table() function. Recall, that this quantity corresponds to the observed frequencies.

n <- 256
students_sample <- sample(students$religion, n)
O_frequencies <- table(students_sample)
O_frequencies

## students_sample
##   Catholic     Muslim   Orthodox      Other Protestant 
##         83         15         16         82         60

With one line of code we insert the observed frequencies into data, the data.frame we constructed above.

data$observed_frequencies <- O_frequencies
data

##            relative_frequency observed_frequencies
## Catholic                 0.48                   83
## Muslim                   0.02                   15
## Orthodox                 0.08                   16
## Other                    0.30                   82
## Protestant               0.12                   60

In the next step we calculate the expected frequencies. Recall the equation:

\[E = n \times p\]

We insert the expected frequencies as a new column in data.

n <- 256
data$expected_frequencies <- n * data$relative_frequency
data

##            relative_frequency observed_frequencies expected_frequencies
## Catholic                 0.48                   83               122.88
## Muslim                   0.02                   15                 5.12
## Orthodox                 0.08                   16                20.48
## Other                    0.30                   82                76.80
## Protestant               0.12                   60                30.72

Once we know the expected frequencies, we have to check for two assumptions. First, we have to make sure that all expected frequencies are 1 or greater. Second, at most 20 % of the expected frequencies should be less than 5. By looking at the table we may confirm that both assumptions are fulfilled.

Perfect, now we are done with the preparation! The data set is ready to be analyzed with the $\chi^2$ goodness-of-fit test. Recall the question we are interested in: Is the religion equally distributed among students compared to the distribution of the religion among the population of the European Union?

Hypothesis Testing

In order to conduct the $\chi^2$ goodness-of-fit test we follow the step-wise implementation procedure for hypothesis testing. The $\chi^2$ goodness-of-fit test follows the same step-wise procedure as hypothesis tests for the population mean:

\[ \begin{array}{l} \hline \ \text{Step 1} & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\ \ \text{Step 2} & \text{Decide on the significance level, } \alpha\text{.} \\ \ \text{Step 3} & \text{Compute the value of the test statistic.} \\ \ \text{Step 4} &\text{Determine the p-value.} \\ \ \text{Step 5} & \text{If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\ \ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\ \hline \end{array} \]

Step 1: State the null hypothesis $H_0$ and alternative hypothesis $H_A$

The null hypothesis states that the religion is equally distributed among students compared to the distribution of the religion among the population of the European Union:

\[H_0: \quad \text{The variable has the specified distribution}\]

Alternative hypothesis:

\[H_A: \quad \text{The variable does not have the specified distribution} \]

Step 2: Decide on the significance level, $\alpha$

\[\alpha = 0.01\]

alpha <- 0.01

Step 3 and 4: Compute the value of the test statistic and the p-value

For illustration purposes we manually compute the test statistic in R. Recall the equation for the test statistic from above:

\[\chi^2=\sum \frac{(O-E)^2}{E}\]

# compute the value of the test statistic
x2 <- sum((data$observed_frequencies - data$expected_frequencies)^2 / data$expected_frequencies)
x2

## [1] 61.24772

The numerical value of the test statistic is $\approx 61.25$.

In order to calculate the p-value we apply the pchisq() function. Recall, how to calculate the degrees of freedom:

\[df = (c - 1)\]

# compute df
df <- nrow(data) - 1

# compute the p-value
p <- pchisq(q = x2, df = df, lower.tail = FALSE)
p

## [1] 1.585774e-12

$p = 1.5857736\times 10^{-12}$.

Step 5: If $p \le \alpha$, reject $H_0$; otherwise, do not reject $H_0$

p <= alpha

## [1] TRUE

The p-value is smaller than the specified significance level of 0.01; we reject $H_0$. The test results are statistically significant at the 1 % level and provide very strong evidence against the null hypothesis.

Step 6: Interpret the result of the hypothesis test

At the 1 % significance level the data provides very strong evidence to conclude, that the religion distribution among students differs from the religion distribution of the population of the European Union.

Hypothesis Testing in R

We just manually completed a $\chi^2$ goodness-of-fit test in R. Very cool, but now we redo that example and make use of the R machinery to obtain the same result as above in just one line of code!

In order to conduct a $\chi^2$ goodness-of-fit test in R we apply the chisq.test() function. We provide two vectors as input data: data$observed_frequencies and data$expected_frequencies.

chisq.test(data$observed_frequencies, p = data$relative_frequency)

## 
##  Chi-squared test for given probabilities
## 
## data:  data$observed_frequencies
## X-squared = 61.248, df = 4, p-value = 1.586e-12

Worked out fine! Compare the output of the chisq.test() function with our result above. Again, we may conclude that at the 1 % significance level the data provides very strong evidence to conclude, that the religion distribution among students differs from the religion distribution of the population of the European Union.

Exercise: With his famous pea plant experiments Augustinian monk Gregor Mendel discovered the inheritance law of recessive and dominant traits in genes. His results show a 1:3 ratio of green to yellow peas from cross-bred seeds. Assume we repeated his experiment and got 123 green and 355 yellow pea plants. Does our observation confirm Mendel’s inheritance law? Perform a test with 95 % significance level!

### your code here

Show code

chisq.test(c(123, 355), p = c(0.25, 0.75))

## 
##  Chi-squared test for given probabilities
## 
## data:  c(123, 355)
## X-squared = 0.13668, df = 1, p-value = 0.7116

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.

\(\chi^2\) goodness-of-fit test: An Example

Data preparation

Hypothesis Testing

Hypothesis Testing in R