20620_interval_estimate.knit

Instead of assigning a single value to a population parameter, an interval estimation gives a probabilistic statement, relating the given interval to the probability that this interval actually contains the true (unknown) population parameter.

The level of confidence is chosen a priori and thus depends on the users preferences. It is denoted by

$100(1-\alpha) \%$

Although any value of confidence level can be chosen, the most common values are 90 %, 95 % and 99 %. When expressed as probability, the confidence level is called the confidence coefficient and is denoted by ( $1 - \alpha$ ). Most common confidence coefficients are 0.90, 0.95 and 0.99, respectively.

A $100(1-\alpha)\%$ confidence interval is an interval estimate around a population parameter $\theta$ (here, the Greek letter $\theta$ (theta) is a placeholder for any population parameter of interest, such as the mean $\mu$ or the standard deviation $\sigma$ , among others) that, under repeated random samples of size $N$ , is expected to include $\theta$ ’s true value $100(1-\alpha)\%$ of the time (Lovric 2010).

The actual number added to and subtracted from the point estimate is called the margin of error.

$CI: \text{Point estimate} \pm \text{Margin of error}$

The margin of error constitutes of two entities. First, the so called critical value and second, a measure of variability of the sampling distribution. The critical value is a numerical value that corresponds to the a priori set level of confidence. It is denoted as $z^*$ . The relation to the level of confidence is made explicit by the subscript $z^*_{\alpha/2}$ .

Note: The confidence interval has a lower and an upper limit. Consequently, $\alpha$ is divided by 2 as the area under the curve beyond those limits corresponds to $\frac{\alpha}{2} \times 2 = \alpha$ .

The measure of variability is the standard error, denoted as $\frac{\sigma}{\sqrt{n}}$ , if $\sigma$ is known. If $\sigma$ is not known, the sample standard error given by $\frac{s}{\sqrt{n}}$ , where $s$ is the standard deviation of the sample, may be chosen instead.

Thus, the margin of error (ME) is expressed as

$ME = z^*_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$

Let us look at a figure for better comprehension:

Accordingly, the full equation for the confidence interval is given by

$CI: \text{Point estimate} \pm z^*_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}$

In order to get the corresponding value for $z^*_{\alpha/2}$ one may look it up in a table or make use of the qnorm() function in R. Make sure to apply the lower.tail argument in the qnorm() function properly.

Let us construct some confidence intervals for practice:

Confidence level of 90 % ( $\alpha = 0.1$ )

lower_90 <- qnorm(0.05, lower.tail = TRUE)
upper_90 <- qnorm(0.05, lower.tail = FALSE)

paste(
  "The lower and upper limits of the interval that covers an area of 90% around the mean are given by z-scores of",
  round(lower_90, 2), "and", round(upper_90, 2), "respectively."
)

## [1] "The lower and upper limits of the interval that covers an area of 90% around the mean are given by z-scores of -1.64 and 1.64 respectively."

For a confidence level of 90 % ( $\alpha = 0.1$ ) the equation from above evaluates to

$CI_{90\%}: \text{Point estimate} \pm 1.64 \times \frac{\sigma}{\sqrt{n}}$

Confidence level of 95 % ( $\alpha = 0.05$ )

lower_95 <- qnorm(0.025, lower.tail = TRUE)
upper_95 <- qnorm(0.025, lower.tail = FALSE)

paste(
  "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of",
  round(lower_95, 2), "and", round(upper_95, 2), "respectively."
)

## [1] "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of -1.96 and 1.96 respectively."

For a confidence level of 95 % ( $\alpha = 0.05$ ) the equation from above evaluates to

$CI_{95\%}: \text{Point estimate} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}$

Confidence level of 99 % ( $\alpha = 0.01$ )

lower_99 <- qnorm(0.005, lower.tail = TRUE)
upper_99 <- qnorm(0.005, lower.tail = FALSE)

paste(
  "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of",
  round(lower_99, 2), "and", round(upper_99, 2), "respectively."
)

## [1] "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of -2.58 and 2.58 respectively."

For a confidence level of 99 % ( $\alpha = 0.01$ ) the equation from above evaluates to $CI_{99\%}: \text{Point estimate} \pm 2.58 \times \frac{\sigma}{\sqrt{n}}$

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.