Instead of assigning a single value to a population parameter, an
**interval estimation** gives a probabilistic statement,
relating the given interval to the probability that this interval
actually contains the true (unknown) population parameter.

The **level of confidence** is chosen a priori and thus
depends on the users preferences. It is denoted by

\[100(1-\alpha) \%\]

Although any value of confidence level can be chosen, the most common
values are 90 %, 95 % and 99 %. When expressed as probability, the
confidence level is called the **confidence coefficient**
and is denoted by (\(1 - \alpha\)).
Most common confidence coefficients are 0.90, 0.95 and 0.99,
respectively.

A \(100(1-\alpha)\%\) **confidence interval** is an interval
estimate around a population parameter \(\theta\) (here, the Greek letter \(\theta\) (theta) is a placeholder for any
population parameter of interest, such as the mean \(\mu\) or the standard deviation \(\sigma\), among others) that, under
repeated random samples of size \(N\),
is expected to include \(\theta\)’s
true value \(100(1-\alpha)\%\) of the
time (Lovric 2010).

The actual number added to and subtracted from the point estimate is
called the **margin of error**.

\[CI: \text{Point estimate} \pm \text{Margin of error}\]

The margin of error constitutes of two entities. First, the so called
critical value and second, a measure of variability of the **sampling distribution**. The
**critical value** is a numerical value that corresponds to
the a priori set level of confidence. It is denoted as \(z^*\). The relation to the level of
confidence is made explicit by the subscript \(z^*_{\alpha/2}\).

Note:The confidence interval has a lower and an upper limit. Consequently, \(\alpha\) is divided by 2 as the area under the curve beyond those limits corresponds to \(\frac{\alpha}{2} \times 2 = \alpha\).

The **measure of variability** is the **standard error**, denoted as \(\frac{\sigma}{\sqrt{n}}\), if \(\sigma\) is known. If \(\sigma\) is not known, the sample standard
error given by \(\frac{s}{\sqrt{n}}\),
where \(s\) is the standard deviation
of the sample, may be chosen instead.

Thus, the margin of error (*ME*) is expressed as

\[ ME = z^*_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]

Let us look at a figure for better comprehension:

Accordingly, the full equation for the confidence interval is given by

\[CI: \text{Point estimate} \pm z^*_{\alpha/2} \times \frac{\sigma}{\sqrt{n}}\]

In order to get the corresponding value for \(z^*_{\alpha/2}\) one may look it up in a table or make use of the `qnorm()`

function in R. Make sure to apply the `lower.tail`

argument
in the `qnorm()`

function properly.

Let us construct some confidence intervals for practice:

**Confidence level of 90 % (\(\alpha
= 0.1\))**

```
lower_90 <- qnorm(0.05, lower.tail = TRUE)
upper_90 <- qnorm(0.05, lower.tail = FALSE)
paste(
"The lower and upper limits of the interval that covers an area of 90% around the mean are given by z-scores of",
round(lower_90, 2), "and", round(upper_90, 2), "respectively."
)
```

`## [1] "The lower and upper limits of the interval that covers an area of 90% around the mean are given by z-scores of -1.64 and 1.64 respectively."`

For a confidence level of 90 % (\(\alpha = 0.1\)) the equation from above evaluates to

\[CI_{90\%}: \text{Point estimate} \pm 1.64 \times \frac{\sigma}{\sqrt{n}}\]

**Confidence level of 95 % (\(\alpha
= 0.05\))**

```
lower_95 <- qnorm(0.025, lower.tail = TRUE)
upper_95 <- qnorm(0.025, lower.tail = FALSE)
paste(
"The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of",
round(lower_95, 2), "and", round(upper_95, 2), "respectively."
)
```

`## [1] "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of -1.96 and 1.96 respectively."`

For a confidence level of 95 % (\(\alpha = 0.05\)) the equation from above evaluates to

\[CI_{95\%}: \text{Point estimate} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}\]

**Confidence level of 99 % (\(\alpha
= 0.01\))**

```
lower_99 <- qnorm(0.005, lower.tail = TRUE)
upper_99 <- qnorm(0.005, lower.tail = FALSE)
paste(
"The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of",
round(lower_99, 2), "and", round(upper_99, 2), "respectively."
)
```

`## [1] "The lower and upper limits of the interval that covers an area of 95% around the mean are given by z-scores of -2.58 and 2.58 respectively."`

For a confidence level of 99 % (\(\alpha = 0.01\)) the equation from above evaluates to \[CI_{99\%}: \text{Point estimate} \pm 2.58 \times \frac{\sigma}{\sqrt{n}}\]

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*