Inferences for one population standard deviation are based on the *chi-square* (\(\chi^2\)) distribution. A \(\chi^2\)-distribution is a right-skewed
probability density curve. The shape of the \(\chi^2\)-curve is determined by its degrees
of freedom \((df)\).

In order to perform a hypothesis test for one population standard deviation, we relate a \(\chi^2\)-value to a specified area under a \(\chi^2\)-curve. Either we consult a \(\chi^2\)-table to look up that value or we make use of the R machinery.

Given \(\alpha\), where \(\alpha\) corresponds to a probability between 0 and 1, \(\chi^2_{\alpha}\) denotes the \(\chi^2\)-value having the area \(\alpha\) to its right under a \(\chi^2\)-curve.

The \(100(1-\alpha)\) % confidence interval for \(\sigma\) is

\[\sqrt{\frac{n-1}{\chi^2_{\alpha/2}}} \le \sigma \le \sqrt{\frac{n-1}{\chi^2_{1-\alpha/2}} }\text{,}\]

where \(n\) is the sample size and \(s\) the standard deviation of the sample data.

The hypothesis testing procedure for one standard deviation is called
**one standard deviation \(\chi^2\)-test**. Hypothesis testing
for variances follows the same step-wise procedure as hypothesis tests
for the mean:

\[ \begin{array}{l} \hline \ \text{Step 1} & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\ \ \text{Step 2} & \text{Decide on the significance level, } \alpha\text{.} \\ \ \text{Step 3} & \text{Compute the value of the test statistic.} \\ \ \text{Step 4} &\text{Determine the p-value.} \\ \ \text{Step 5} & \text{If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\ \ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\ \hline \end{array} \]

The test statistic for a hypothesis test with the null hypothesis \(H_0: \,\sigma = \sigma_0\) for a normally distributed variable is given by

\[\chi^2 = \frac{n-1}{\sigma^2_0}s^2 \text{.}\]

The variable follows a \(\chi^2\)-distribution with \(n - 1\) degrees of freedom.

Be aware, that the one standard deviation \(\chi^2\)-test is **not
robust** against violations of the normality assumption (Weiss, 2010).

In order to get some hands-on experience we apply the **one
standard deviation \(\chi^2\)-test** in an exercise. For
this we load the *students* data set, which you may also download
here.

`students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")`

The *students* data set consists of 8239 rows, each of them
representing a particular student, and 16 columns, each of them
corresponding to a variable/feature related to that particular student.
These self-explaining variables are: *stud.id, name, gender, age,
height, weight, religion, nc.score, semester, major, minor, score1,
score2, online.tutorial, graduated, salary*.

In order to showcase the **one standard deviation \(\chi^2\)-test** we examine the
spread of the height in cm of female students and compare it to the
spread of the height of all students (our population). **We want
to test, if the standard deviation of the height of female students is
significantly smaller than the standard deviation of the height of all
students**.

We start with data preparation.

- First, we define the standard deviation of the population. In our
example the population corresponds to the height of all 8239 students in
the data set. We calculate the standard deviation for the
`height`

variable and assign it the variable name`sigma0`

. - Second, we subset the data set based on the variable
`gender`

. - Third, we sample 30 female students and extract the statistic of interest, the standard deviation of the height of female students in our sample.

```
sigma0 <- sd(students$height)
sigma0
```

`## [1] 11.07753`

The standard deviation of the population of interest (\(\sigma_0\)) is \(\approx\) 11.08 cm.

```
female <- subset(students, gender == "Female")
n <- 30
female_sample <- sample(female$height, n)
sample_sd <- sd(female_sample)
```

Further, we check the normality assumption by plotting a Q-Q plot. In R we apply the `qqnorm()`

and the `qqline()`

functions for plotting Q-Q plots.

```
par(mar = c(5, 5, 4, 2))
# sample data
qqnorm(female_sample, main = "Q-Q plot for weight of\n sampled female students", cex.main = 0.9)
qqline(female_sample, col = 3, lwd = 2)
```

As we can see, the data falls roughly onto a straight line. Based on the graphical evaluation approach we conclude, that the variable of interest is roughly normally distributed.

In order to conduct the **one standard deviation \(\chi^2\)-test** we follow the
step-wise implementation procedure for hypothesis testing.

**Step 1: State the null hypothesis \(H_0\) and alternative hypothesis \(H_A\)**

The null hypothesis states, that the standard deviation of the height of female students (\(\sigma\)) equals the standard deviation of the population (\(\sigma_0 \approx\) 11.08 cm):

\[H_0: \quad \sigma = \sigma_0\]

Alternative hypothesis:

\[H_A: \quad \sigma < \sigma_0 \]

This formulation results in a left-tailed hypothesis test.

**Step 2: Decide on the significance level, \(\alpha\)**

\[\alpha = 0.05\]

`alpha <- 0.05`

**Step 3 and 4: Compute the value of the test statistic and the
p-value**

For illustration purposes we manually compute the test statistic in R. Recall the equation for the test statistic from above:

\[\chi^2 = \frac{n-1}{\sigma^2_0}s^2 \]

```
# compute the value of the test statistic
n <- length(female_sample)
s_2 <- var(female_sample)
sigma0_2 <- var(students$height)
x2 <- ((n - 1) / sigma0_2) * s_2
x2
```

`## [1] 12.80536`

The numerical value of the test statistic is 12.8053583.

In order to calculate the *p*-value we apply the
`pchisq()`

function. Recall how to calculate the degrees of
freedom:

\[ df=n-1\]

```
# compute df
df <- n - 1
# compute the p-value
p <- pchisq(x2, df = df, lower.tail = TRUE)
p
```

`## [1] 0.004049209`

\(p = 0.0040492\).

**Step 5: If \(p \le \alpha\),
reject \(H_0\); otherwise, do not
reject \(H_0\)**

`p <= alpha`

`## [1] TRUE`

The *p*-value is smaller than the specified significance level
of 0.05; we reject \(H_0\). The test
results are statistically significant at the 5 % level and provide very
strong evidence against the null hypothesis.

**Step 6: Interpret the result of the hypothesis
test**

At the 5 % significance level the data provides very strong evidence to conclude, that the standard deviation of the height of female students is less than 11 cm.

We just completed a one standard deviation \(\chi^2\)-test in R manually. To our
knowledge R does not provide an in-built function to calculate a one
standard deviation \(\chi^2\)-test.
However, we may implement such a function by ourselves. Our function
`simple.x2.test()`

takes as input a sample vector,
`x`

, the standard deviation of the population,
`sigma0`

, the significance level, `alpha`

, and the
specified method, `right`

, `left`

or
`two-sided`

.

```
simple.x2.test <- function(x, sigma0, alpha, method = "two-sided") {
df <- length(x) - 1
v <- var(x)
# calculate test statistic
testchi <- df / (sigma0^2) * v
# for left-tailed test
if (method == "left") {
p <- pchisq(q = testchi, df = df, lower.tail = TRUE)
}
# for right-tailed test
else if (method == "right") {
p <- pchisq(q = testchi, df = df, lower.tail = FALSE)
}
# for two-sided test (default)
else {
p_upper <- pchisq(q = testchi, df = df, lower.tail = FALSE)
p_lower <- pchisq(q = testchi, df = df, lower.tail = TRUE)
if (p_upper * 2 > 1) {
p <- p_lower * 2
} else {
p <- p_upper * 2
}
}
# evaluate p < alpha
if (p < alpha) {
reject <- TRUE
} else {
reject <- FALSE
}
# print out summary and evaluation
print(paste("Significance level:", alpha))
print(paste("Degrees of freedom:", df))
print(paste("Test statistic:", round(testchi, 4)))
print(paste("p-value:", p))
print(paste("Reject H0:", reject))
}
```

Let us apply our self-built function `simple.x2.test()`

to
the example data from above.

```
sigma0 <- sd(students$height)
simple.x2.test(female_sample, sigma0, alpha = 0.05, "left")
```

```
## [1] "Significance level: 0.05"
## [1] "Degrees of freedom: 29"
## [1] "Test statistic: 12.8054"
## [1] "p-value: 0.00404920914882534"
## [1] "Reject H0: TRUE"
```

Perfect! Compare the output of the `simple.x2.test()`

function to our result from above. Again, we may conclude that at the 5
% significance level the data provides very strong evidence to conclude,
that the standard deviation of the height of female students is less
than 11 cm.

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*