The $\chi^{2}$ goodness-of-fit test is applied to perform hypothesis tests on the distribution of a qualitative (categorical) variable or a discrete quantitative variable that has only finitely many possible values.
The basic logic of the $\chi^{2}$ goodness-of-fit test is to compare the frequencies of two variables. We compare a sample's observed frequencies with the expected frequencies.
Consider a simple example:
On September 22, 2013, the German Federal Election 2013 was held. More than 44 million people turned up to vote. 41.5 % of German voters decided to vote for the Christian Democratic Union (CDU) and 25.7 % for the Social Democratic Party (SPD). For simplicity, we subsume the remaining percentage of votes (32.8 %) as Others.
Based on that data, we may build a frequency table:
Party | Percentage | Relative frequency |
---|---|---|
CDU | 41.5 | 0.415 |
SPD | 25.7 | 0.275 |
Others | 32.8 | 0.328 |
$\sum$ | 100 | 1 |
The third column of the table above corresponds to the relative frequencies of the German population/voters. For this exercise, we take a random sample. We asked 123 students of FU Berlin about their party affiliation and recorded their answers. Afterwards, we counted the occurrence of each category (party) in our sample. These quantities are the observed frequencies. The actual corresponding counts are:
Party | Observed sample frequencies |
---|---|
CDU | 43 |
SPD | 36 |
Others | 44 |
$\sum$ | 123 |
In the next step we compute the expected frequency, denoted $E$, for each category:
$$E = n \times p \text{,}$$where $n$ is the sample size and $p$ is the corresponding relative population frequency taken from election results given in the table above. Applying this information, we expect the following absolute frequencies per party:
$$E_{CDU} = n \times p = 123 \times 0.415 = 51.045$$$$E_{SPD} = n \times p = 123 \times 0.257 = 31.611$$$$E_{Others} = n \times p = 123 \times 0.382 = 46.986$$Note: Although we deal with individual counts, represented by integer values, the expected frequency, $E$, is a floating point number. That is fine.
Now, we put the observed frequencies and the expected frequencies together into one table:
Party | Observed sample frequencies | Expected sample frequencies |
---|---|---|
CDU | 43 | 51.045 |
SPD | 36 | 31.611 |
Others | 44 | 46.986 |
$\sum$ | 123 | 129.642 |
Great! Once we have the expected frequencies, we have to check for two assumptions.
By looking at the table, we may confirm that both assumptions are fulfilled.
Now, we have all ingredients we need, except the test statistic, to perform a $\chi^{2}$ goodness-of-fit test.
The $\chi^{2}$ test statistic for a goodness-of-fit is given by:
$$\chi^{2} = \sum \frac {(O - E)^{2}} {E}$$where $O$ corresponds to the observed frequencies and $E$ to the expected frequencies. If the null hypothesis is true, the test statistic $\chi^{2}$ approximates a chi-square distribution.
The number of degrees of freedom is one less than the number of possible values (categories) for the variable under consideration. Hence:
$$df = c - 1$$Based on the observed and expected frequencies given in the table above it is fairly straightforward to calculate the $\chi^{2}$-value. However, to make the calculation procedure easier, we put all the necessary computational steps into one table. The observed sample frequencies are shortened and named as $O$ while the expected sample frequencies are named as $E$:
Party | $$O$$ | $$E$$ | $$O - E$$ | $$(O - E)^{2}$$ | $$\frac {(O - E)^{2}} {E}$$ |
---|---|---|---|---|---|
CDU | 43 | 51.045 | -8.045 | 64.722 | 1.268 |
SPD | 36 | 31.611 | 4.389 | 19.263 | 0.609 |
Others | 44 | 46.986 | -2.986 | 8.916 | 0.190 |
$\sum$ | 123 | 129.642 | -6.642 | - | 2.067 |
Conclusively, the $\chi{2}$ test statistic for a goodness-of-fit evaluates to 2.06709 for our sample data.
$$\chi^{2} = \sum \frac {(O - E)^{2}} {E} \approx 2.067$$The observed and expected frequencies are roughly equal if the null hypothesis is true. This results in a small value of the $\chi^{2}$ test statistic, thus, supporting $H_{0}$. If, however, the value of the $\chi^{2}$ test statistic is large, the data provide evidence against $H_{0}$.
In our case, we may compare the empirical $\chi^{2}$ test statistic with the corresponding critical $\chi^{2}$ value for a significance level of 95 % with a degree of freedom of 3 categories minus 1. To derive the critical value $\chi^{2}$ with Python, we apply the chi2.ppf
function over the stats
module within the scipy
package:
Note: Make sure the
scipy
package is part of yourmamba
environment!
from scipy.stats import chi2
chi2.ppf(0.95, df = 2)
5.991464547107979
Since our empirical $\chi^{2}$ value is smaller than the critical $\chi^{2}$ value, we cannot reject the null hypotheses!
In order to get some hands-on experience, we apply the $\chi^{2}$ goodness-of-fit test in an exercise. For this, we load the students
data set. You may download the students.csv
file here and import it from your local file system, or you load it directly as a web resource. In either case, you import the data set to python as pandas
dataframe
object by using the read_csv
method:
Note: Make sure the
numpy
andpandas
packages are part of yourmamba
environment!
import pandas as pd
import numpy as np
students = pd.read_csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
The students data set consists of 8239 rows, each representing a particular student, and 16 columns corresponding to a variable/feature related to that particular student. These self-explaining variables are:
Recall $\chi^{2}$ goodness-of-fit tests are applied for qualitative (categorical) or discrete quantitative variables. There are several categorical variables in the students data set, such as gender
, religion
, major
, minor
and graduated
.
In order to showcase the $\chi^{2}$ goodness-of-fit test, we examine if religions are equally distributed among students compared to the distribution of religions among the population of the European Union. The data on the continental scale is provided in the report "Discrimination in the EU in 2012" (European Union: European Commission, Special Eurobarometer, 393, p. 233).
The report provides data for eight categories of how people ascribed themselves:
We plot the data in the form of a pie chart for a better understanding:
Note: Make sure the
matplotlib
and theseaborn
package are part of yourmamba
environment!
import seaborn as sns
data = [48, 16, 12, 8, 7, 4, 3, 2]
religions = ["Catholic", "Non believer/\nAgnostic", "Protestant",
"Orthodox", "Atheist", "Other Christian",
"Other religion/None stated", "Muslim"]
data = pd.Series(data, index = religions)
data.plot.pie(colors = sns.color_palette("Set3", 8))
<Axes: >
We start with data exploration and data preparation.
First, we want to know which categories are available in the students data set for the column religion
. Therefore, we apply the unique()
method, which provides access to the levels (categories) of a variable:
print(students["religion"].unique())
['Muslim' 'Other' 'Protestant' 'Catholic' 'Orthodox']
Obviously, in the students
data set there are 5 different categories, compared to 8 categories provided by the report of the EU. Thus, in order to make comparisons, we summarize the categories of EU report to 5 categories:
"Catholic"
"Muslim"
"Orthodox"
"Protestant"
"Other"
Be careful not to mix-up categories during that step!
data_raw = [48, 2, 8, (16 + 7 + 4 + 3), 12]
religions = ["Catholic", "Muslim", "Orthodox", "Other", "Protestant"]
data = pd.Series(data_raw, index = religions, name = "relative_frequency") / 100
data.to_frame()
relative_frequency | |
---|---|
Catholic | 0.48 |
Muslim | 0.02 |
Orthodox | 0.08 |
Other | 0.30 |
Protestant | 0.12 |
Now, we take a random sample based on the students
data set. The sample size is $n = 256$. Afterwards, we count the number of students in each particular religion
category using the groupby()
function.
Recall that this quantity corresponds to the observed frequencies.
n = 256
sample = students.sample(n, random_state = 8).groupby(["religion"])
sample.size().to_frame("Oberserved Frequencies")
Oberserved Frequencies | |
---|---|
religion | |
Catholic | 80 |
Muslim | 12 |
Orthodox | 21 |
Other | 104 |
Protestant | 39 |
Let's combine both pieces of information into a nice-looking table by converting it to a pandas
dataframe
object:
df = pd.DataFrame({'relative frequencies' : data,
'observed frequencies' : sample.size()})
df
relative frequencies | observed frequencies | |
---|---|---|
Catholic | 0.48 | 80 |
Muslim | 0.02 | 12 |
Orthodox | 0.08 | 21 |
Other | 0.30 | 104 |
Protestant | 0.12 | 39 |
In the next step we calculate the expected frequencies add the information as seperate column to our existing dataframe
df
. Recall the equation:
df["expected frequencies"] = df["relative frequencies"] * 256
df
relative frequencies | observed frequencies | expected frequencies | |
---|---|---|---|
Catholic | 0.48 | 80 | 122.88 |
Muslim | 0.02 | 12 | 5.12 |
Orthodox | 0.08 | 21 | 20.48 |
Other | 0.30 | 104 | 76.80 |
Protestant | 0.12 | 39 | 30.72 |
Once we know the expected frequencies, we must check for two assumptions.
We may confirm that both assumptions are fulfilled by looking at the table.
Perfect, now we are done with the preparation! The data set can be analyzed with the $\chi^{2}$ goodness-of-fit test. Recall the question we are interested in: Is the religion equally distributed among students compared to the distribution of the religion among the population of the European Union?
In order to conduct the $\chi^{2}$ goodness-of-fit test, we follow the step-wise implementation procedure for hypothesis testing. The $\chi^{2}$ goodness-of-fit test follows the same step-wise, generalized test scheme for hypothesis tests:
Step 1: State the null hypothesis $H_{0}$ and alternative hypothesis $H_{A}$
The null hypothesis states that the religion is equally distributed among students compared to the distribution of the religion among the population of the European Union:
$$H_{0}: \quad \text {The variable has the specified distribution}$$Alternative hypothesis:
$$H_{A}: \quad \text {The variable does not have the specified distribution}$$Step 2: Decide on the significance level, $\alpha$
$$\alpha = 0.01$$alpha = 0.01
Step 3 and 4: Compute the value of the test statistic and the p-value
For illustration purposes we will manually compute the test statistic with Python firstly. Recall the equation for the test statistic from above:
$$\chi^{2} = \sum \frac {(O - E)^{2}} {E}$$O_E = (df["observed frequencies"] - df["expected frequencies"]) ** 2
chi_squared = np.sum(O_E / df["expected frequencies"])
chi_squared
36.086588541666664
The numerical value of the test statistic is $\approx 36.0866$.
In order to calculate the p-value, we apply the chi2.cdf
function derived by the scipy
package over the stats
module to calculate the probability of occurrence for the test statistic based on the $\chi^{2}$ distribution. To do so, we also need the degrees of freedom. Recall how to calculate the degrees of freedom:
from scipy.stats import chi2
p = 1 - chi2.cdf(chi_squared, df = df.shape[0] - 1)
p
2.7774032629324097e-07
$p = 2.77740326 \times 10^{-7}$.
Step 5: If $p \le \alpha$, reject $H_{0}$; otherwise, do not reject $H_{0}$
# reject H0?
p < alpha
True
The p-value is smaller than the specified significance level of 0.01; we reject $H_{0}$. The test results are statistically significant at the 1 % level and provide very strong evidence against the null hypothesis.
Step 6: Interpret the result of the hypothesis test
At the 1 % significance level the data provides very strong evidence to conclude, that the religion distribution among students differs from the religion distribution of the population of the European Union.
scipy
¶We manually completed a $\chi^{2}$ goodness-of-fit test in Python. Very cool, but now we redo that example and the power of Python's package universe, namely the scipy
package, to obtain the same result as above in just one line of code!
In order to conduct a $\chi^{2}$ goodness-of-fit test in Python over the stats
module from the scipy
package, we apply the chisquare()
function. We only have to provide observed, and the expected frequencies
We provide that information over a column selection based on our dataframe
. Additional information regarding the function's usage can be derived directly from the function's documentation of scipy
.
from scipy import stats
test_result = stats.chisquare(df["observed frequencies"], df["expected frequencies"])
test_result
Power_divergenceResult(statistic=36.086588541666664, pvalue=2.777403262517103e-07)
The chisquare()
function returns an object
, which provides the teststatic as well as the corresponding p-value of the test result. Those values can be retrieved over the following properties:
<object>.statistic
holds the actual teststatic and represents the empirical test value.<object>.pvalue
represents the p-value of the performed significance test.Consequently, the teststatistic ($\chi^{2}_{emp}$) is retrieved over:
test_result.statistic
36.086588541666664
The p-value is retrieved over:
test_result.pvalue
2.777403262517103e-07
Lastly, we want to provide a nicely printed output of the testresults:
print("Teststatistic = {}".format(round(test_result.statistic, 5)))
print("p-value = {}".format(round(test_result.pvalue, 7)))
Teststatistic = 36.08659 p-value = 3e-07
Compared with the manually calculated $\chi^{2}_{emp}$ value and the p-value, they match perfectly. Again, at the 1 % significance level, the data provide very strong evidence to conclude that the religion distribution among students differs from the religion distribution of the population of the European Union.
Exercise: With his famous pea plant experiments, Augustinian monk Gregor Mendel discovered the inheritance law of recessive and dominant traits in genes. His results show a 1:3 ratio of green to yellow peas from cross-bred seeds. Assume we repeated his experiment and got 123 green and 355 yellow pea plants. Does our observation confirm Mendel's inheritance law? Perform a test with 95 % significance level!
Exercise: With his famous pea plant experiments, Augustinian monk Gregor Mendel discovered the inheritance law of recessive and dominant traits in genes. His results show a 1:3 ratio of green to yellow peas from cross-bred seeds. Assume we repeated his experiment and got 123 green and 355 yellow pea plants. Does our observation confirm Mendel's inheritance law? Perform a test with 95 % significance level!
observed = [123, 355]
expected = [np.sum(observed) * 0.25, np.sum(observed) * 0.75 ]
test_result = stats.chisquare(observed, expected)
print("Chi_squared = {}".format(round(test_result.statistic, 5)))
print("p-value = {}".format(round(test_result.pvalue, 5)))
print("Because the p value ({}) is less than alpha (0.05) we do not have any evidence to reject H0.".
format(round(test_result.pvalue, 3)))
Chi_squared = 0.13668 p-value = 0.7116 Because the p value (0.712) is less than alpha (0.05) we do not have any evidence to reject H0.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.