Inferences for one population standard deviation are based on the *chi-square* (\(\chi^2\)) distribution. A \(\chi^2\)-distribution is right-skewed probability density curve. The shape of the \(\chi^2\)-curve is determined by its degrees of freedom \((df)\).

In order to perform a hypothesis test for one population standard deviation, we relate a \(\chi^2\)-value to a specified area under a \(\chi^2\)-curve. Either we consult a \(\chi^2\)-table to look up that value or we make use of the R machinery.

Given \(\alpha\), where \(\alpha\) corresponds to a probability between 0 and 1, \(\chi^2_{\alpha}\) denotes the \(\chi^2\)-value having the area \(\alpha\) to its right under a \(\chi^2\)-curve.

The \(100(1-\alpha)\)% confidence interval for \(\sigma\) is

\[\sqrt{\frac{n-1}{\chi^2_{\alpha/2}}} \le \sigma \le \sqrt{\frac{n-1}{\chi^2_{1-\alpha/2}} }\] where where \(n\) is the sample size and \(s\) the standard deviation of the sample data.

The hypothesis testing procedure for one standard deviation is called **one standard deviation \(\chi^2\)-test**. Hypothesis testing for variances follows the same step-wise procedure as hypothesis tests for the mean. \[
\begin{array}{l}
\hline
\ \text{Step 1} & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\
\ \text{Step 2} & \text{Decide on the significance level, } \alpha\text{.} \\
\ \text{Step 3} & \text{Compute the value of the test statistic.} \\
\ \text{Step 4} &\text{Determine the p-value.} \\
\ \text{Step 5} & \text{If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\
\ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\
\hline
\end{array}
\] The test statistic for a hypothesis test with the null hypothesis \(H_0: \,\sigma = \sigma_0\) for a normally distributed variable is given by

\[\chi^2 = \frac{n-1}{\sigma^2_0}s^2 \]

The variable follows a \(\chi^2\)-distribution with \(n - 1\) degrees of freedom.

Be aware, that the one standard deviation \(\chi^2\)-test is not robust against violations of the normality assumption (Weiss 2010).

In order to get some hands-on experience we apply the **one standard deviation \(\chi^2\)-test** in an exercise. Therefore we load the *students* data set. You may download the `students.csv`

file here. Import the data set and assign a proper name to it.

`students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")`

The *students* data set consists of 8239 rows, each of them representing a particular student, and 16 columns, each of them corresponding to a variable/feature related to that particular student. These self-explaining variables are: *stud.id, name, gender, age, height, weight, religion, nc.score, semester, major, minor, score1, score2, online.tutorial, graduated, salary*.

In order to showcase the **one standard deviation \(\chi^2\)-test** we examine the spread of the height in cm of female students and compare it to the spread of the height of the all students (our population). **We want to test, if the standard deviation of the height of female students is less than standard deviation of the height of all students**.

We start with data preparation.

- First, we define the standard deviation of the population. In our example the population corresponds to the height of all 8239 students in the data set. We calculate the standard deviation for the
`height`

variable and assign it the variable name`sigma0`

. - Second, we subset the data set based on the variable
`gender`

. - Third, we sample 30 female students and extract the statistic of interest, the standard deviation of the height of female students in our sample.

```
sigma0 <- sd(students$height)
sigma0
```

`## [1] 11.07753`

The standard deviation of the population of interest (\(\sigma_0\)) is \(\approx\) 11.08 cm.

```
female <- subset(students, gender=='Female')
n <- 30
female.sample <- sample(female$height, n)
sample.sd <- sd(female.sample)
```

Further, we check the normality assumption by plotting a Q-Q plot. In R we apply the `qqnorm()`

and the `qqline()`

functions for plotting Q-Q plots.

```
par(mar = c(5,5,4,2))
# Sample data
qqnorm(female.sample, main = 'Q-Q plot for weight of\n sampled female students', cex.main = 0.9)
qqline(female.sample, col = 3, lwd = 2)
```