Inferences for one population standard deviation are based on the chi-square (\(\chi^2\)) distribution. A \(\chi^2\)-distribution is a right-skewed probability density curve. The shape of the \(\chi^2\)-curve is determined by its degrees of freedom \((df)\).

In order to perform a hypothesis test for one population standard deviation, we relate a \(\chi^2\)-value to a specified area under a \(\chi^2\)-curve. Either we consult a \(\chi^2\)-table to look up that value or we make use of the R machinery.

Given \(\alpha\), where \(\alpha\) corresponds to a probability between 0 and 1, \(\chi^2_{\alpha}\) denotes the \(\chi^2\)-value having the area \(\alpha\) to its right under a \(\chi^2\)-curve.


Interval Estimation of \(\sigma\)

The \(100(1-\alpha)\) % confidence interval for \(\sigma\) is

\[\sqrt{\frac{n-1}{\chi^2_{\alpha/2}}} \le \sigma \le \sqrt{\frac{n-1}{\chi^2_{1-\alpha/2}} }\text{,}\]

where \(n\) is the sample size and \(s\) the standard deviation of the sample data.


One standard deviation \(\chi^2\)-test

The hypothesis testing procedure for one standard deviation is called one standard deviation \(\chi^2\)-test. Hypothesis testing for variances follows the same step-wise procedure as hypothesis tests for the mean:

\[ \begin{array}{l} \hline \ \text{Step 1} & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\ \ \text{Step 2} & \text{Decide on the significance level, } \alpha\text{.} \\ \ \text{Step 3} & \text{Compute the value of the test statistic.} \\ \ \text{Step 4} &\text{Determine the p-value.} \\ \ \text{Step 5} & \text{If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\ \ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\ \hline \end{array} \]

The test statistic for a hypothesis test with the null hypothesis \(H_0: \,\sigma = \sigma_0\) for a normally distributed variable is given by

\[\chi^2 = \frac{n-1}{\sigma^2_0}s^2 \text{.}\]

The variable follows a \(\chi^2\)-distribution with \(n - 1\) degrees of freedom.

Be aware, that the one standard deviation \(\chi^2\)-test is not robust against violations of the normality assumption (Weiss, 2010).


One standard deviation \(\chi^2\)-test: An example

In order to get some hands-on experience we apply the one standard deviation \(\chi^2\)-test in an exercise. For this we load the students data set, which you may also download here.

students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")

The students data set consists of 8239 rows, each of them representing a particular student, and 16 columns, each of them corresponding to a variable/feature related to that particular student. These self-explaining variables are: stud.id, name, gender, age, height, weight, religion, nc.score, semester, major, minor, score1, score2, online.tutorial, graduated, salary.

In order to showcase the one standard deviation \(\chi^2\)-test we examine the spread of the height in cm of female students and compare it to the spread of the height of all students (our population). We want to test, if the standard deviation of the height of female students is significantly smaller than the standard deviation of the height of all students.


Data preparation

We start with data preparation.

sigma0 <- sd(students$height)
sigma0
## [1] 11.07753

The standard deviation of the population of interest (\(\sigma_0\)) is \(\approx\) 11.08 cm.

female <- subset(students, gender == "Female")

n <- 30
female_sample <- sample(female$height, n)
sample_sd <- sd(female_sample)

Further, we check the normality assumption by plotting a Q-Q plot. In R we apply the qqnorm() and the qqline() functions for plotting Q-Q plots.

par(mar = c(5, 5, 4, 2))

# sample data
qqnorm(female_sample, main = "Q-Q plot for weight of\n sampled female students", cex.main = 0.9)
qqline(female_sample, col = 3, lwd = 2)

As we can see, the data falls roughly onto a straight line. Based on the graphical evaluation approach we conclude, that the variable of interest is roughly normally distributed.


Hypothesis testing

In order to conduct the one standard deviation \(\chi^2\)-test we follow the step-wise implementation procedure for hypothesis testing.

Step 1: State the null hypothesis \(H_0\) and alternative hypothesis \(H_A\)

The null hypothesis states, that the standard deviation of the height of female students (\(\sigma\)) equals the standard deviation of the population (\(\sigma_0 \approx\) 11.08 cm):

\[H_0: \quad \sigma = \sigma_0\]

Alternative hypothesis:

\[H_A: \quad \sigma < \sigma_0 \]

This formulation results in a left-tailed hypothesis test.


Step 2: Decide on the significance level, \(\alpha\)

\[\alpha = 0.05\]

alpha <- 0.05

Step 3 and 4: Compute the value of the test statistic and the p-value

For illustration purposes we manually compute the test statistic in R. Recall the equation for the test statistic from above:

\[\chi^2 = \frac{n-1}{\sigma^2_0}s^2 \]

# compute the value of the test statistic
n <- length(female_sample)
s_2 <- var(female_sample)
sigma0_2 <- var(students$height)
x2 <- ((n - 1) / sigma0_2) * s_2
x2
## [1] 12.80536

The numerical value of the test statistic is 12.8053583.

In order to calculate the p-value we apply the pchisq() function. Recall how to calculate the degrees of freedom:

\[ df=n-1\]

# compute df
df <- n - 1
# compute the p-value
p <- pchisq(x2, df = df, lower.tail = TRUE)
p
## [1] 0.004049209

\(p = 0.0040492\).


Step 5: If \(p \le \alpha\), reject \(H_0\); otherwise, do not reject \(H_0\)

p <= alpha
## [1] TRUE

The p-value is smaller than the specified significance level of 0.05; we reject \(H_0\). The test results are statistically significant at the 5 % level and provide very strong evidence against the null hypothesis.


Step 6: Interpret the result of the hypothesis test

At the 5 % significance level the data provides very strong evidence to conclude, that the standard deviation of the height of female students is less than 11 cm.


Hypothesis testing in R

We just completed a one standard deviation \(\chi^2\)-test in R manually. To our knowledge R does not provide an in-built function to calculate a one standard deviation \(\chi^2\)-test. However, we may implement such a function by ourselves. Our function simple.x2.test() takes as input a sample vector, x, the standard deviation of the population, sigma0, the significance level, alpha, and the specified method, right, left or two-sided.

simple.x2.test <- function(x, sigma0, alpha, method = "two-sided") {
  df <- length(x) - 1
  v <- var(x)

  # calculate test statistic
  testchi <- df / (sigma0^2) * v

  # for left-tailed test
  if (method == "left") {
    p <- pchisq(q = testchi, df = df, lower.tail = TRUE)
  }
  # for right-tailed test
  else if (method == "right") {
    p <- pchisq(q = testchi, df = df, lower.tail = FALSE)
  }
  # for two-sided test (default)
  else {
    p_upper <- pchisq(q = testchi, df = df, lower.tail = FALSE)
    p_lower <- pchisq(q = testchi, df = df, lower.tail = TRUE)
    if (p_upper * 2 > 1) {
      p <- p_lower * 2
    } else {
      p <- p_upper * 2
    }
  }
  # evaluate p < alpha
  if (p < alpha) {
    reject <- TRUE
  } else {
    reject <- FALSE
  }
  # print out summary and evaluation
  print(paste("Significance level:", alpha))
  print(paste("Degrees of freedom:", df))
  print(paste("Test statistic:", round(testchi, 4)))
  print(paste("p-value:", p))
  print(paste("Reject H0:", reject))
}

Let us apply our self-built function simple.x2.test() to the example data from above.

sigma0 <- sd(students$height)
simple.x2.test(female_sample, sigma0, alpha = 0.05, "left")
## [1] "Significance level: 0.05"
## [1] "Degrees of freedom: 29"
## [1] "Test statistic: 12.8054"
## [1] "p-value: 0.00404920914882534"
## [1] "Reject H0: TRUE"

Perfect! Compare the output of the simple.x2.test() function to our result from above. Again, we may conclude that at the 5 % significance level the data provides very strong evidence to conclude, that the standard deviation of the height of female students is less than 11 cm.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.