The **standard normal distribution** is a special case of the normal distribution. For the standard normal distribution, the value of the mean is equal to zero (\(\mu = 0\)), and the value of the standard deviation is equal to 1 (\(\sigma = 1\)).

Thus, by plugin \(\mu = 0\) and \(\sigma = 1\) in the PDF of the normal distribution, the equation simplifies to

\[\begin{align} f(x)& = \frac{1}{\sigma \sqrt{2 \pi}}e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} \\ & =\frac{1}{1 \times \sqrt{2 \pi}}e^{-\frac{1}{2}\left(\frac{x-0}{1}\right)^2} \\ & = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}x^2} \end{align}\]

The random variable that possesses the standard normal distribution is denoted by \(z\). Consequently units for the standard normal distribution curve are denoted by \(z\) and are called the **\(z\)-values** or **\(z\)-scores**. They are also called **standard units** or **standard scores**.

The **cumulative distribution function (CDF)** of the standard normal distribution, corresponding to the area under the cure for the interval \((-\infty, z]\), usually denoted with the capital Greek letter \(\phi\), is given by

\[F(x<z) = \phi (z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z}e^{-\frac{1}{2}x^2}dx\]

where \(e \approx 2.71828\) and \(\pi \approx 3.14159\).

The standard normal curve is a special case of the normal distribution, and thus as well a probability distribution curve. Therefore basic properties of the normal distribution hold true for the standard normal curve as well (Weiss 2010).

- The total area under the standard normal curve is 1 (this property is shared by all density curves).
- The standard normal curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis as it does so.
- The standard normal curve is is bell shaped, is centered at \(z=0\). Almost all the area under the standard normal curve lies between \(z=-3\) and \(z=3\).

The \(z\)-values on the right side of the mean are positive and those on the left side are negative. The \(z\)-value for a point on the horizontal axis gives the distance between the mean (\(z=0\)) and that point in terms of the standard deviation. For example, a point with a value of \(z=2\) is two standard deviations to the right of the mean. Similarly, a point with a value of \(z=-2\) is two standard deviations to the left of the mean.

The concept of determining probabilities by calculating the area under the standard normal curve is extensively applied. That is why there exist probability tables to look up the area for a particular \(z\)-value. However, R is such a powerful tool, that we can calculate the area under the curve for any particular \(z\) score.

To calculate the area under the curve for a standard normal distribution we apply the `pnorm`

function. The `pnorm`

function is defined as `pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)`

. We disregard the `log.p = FALSE`

. For the moment we keep the default `lower.tail = TRUE`

. Further, we see that the defaults for the mean and the standard deviation is \(0\) and and \(1\), respectively. Thus, the `pnorm`

function, applied to the standard normal distribution, simplifies to `pnorm(q)`

. We calculate the the area under the curve for \(z = -3, -2, -1, 0, 1, 2, 3\) or written more formally:

\[P(x\le z) \qquad \text{for } z \in (-3, -2, -1, 0, 1, 2, 3)\]

`pnorm(-3)`

`## [1] 0.001349898`

`pnorm(-2)`

`## [1] 0.02275013`

`pnorm(-1)`

`## [1] 0.1586553`

`pnorm(0)`

`## [1] 0.5`

`pnorm(1)`

`## [1] 0.8413447`

`pnorm(2)`

`## [1] 0.9772499`

`pnorm(3)`

`## [1] 0.9986501`

Perfect! We confirmed some of the above stated properties of a standard normal curve. Recall, we confirmed the default value for the `lower.tail`

argument in the `pnorm`

function, `lower.tail = TRUE`

. This means we calculated the area below the curve for the interval \((-\infty, z]\). Calling `pnorm(-3)`

yields very low number. Only about 0.1% of the total area under the curve are found left to \(z=-3\), which corresponds to the distance of 3 times the standard deviation from the mean. Moreover, `pnorm(0)`

yields 50%. Awesome! Thus, we conclude that the area under the cure for the interval \((-\infty, 0]\) is the same as the area under the cure for the interval \([0, \infty)\), and that the area under the curve sums up to \(1\). Again, we confirmed one of the above stated properties of a standard normal curve. And finally, calling `pnorm(3)`

yields a high number close to 1. Thus, approximately 99.9% of the area under the cure can be found in the interval \((-\infty, 3]\). Only little left for the area beyond \(z = 3\).

Recall, that we may explicitly calculate the area under the curve for any interval of interest

\[\begin{align} P(a \le z \le b) & = P(z \le b) - P(z \le a) \\ & =\int_{a}^{b}f(z)dz \\ & = \int_{-\infty}^{b}f(z)dz - \int_{-\infty}^{a}f(z)dz \end{align}\]

Let us calculate the area under the curve for the following intervals: \([-1,1], [-2,2], [-3,3]\). Or in words, let us determine the area under the curve for \(\pm 1\) standard deviation, for \(\pm 2\) standard deviations, and for \(\pm 3\) standard deviations.

```
# 1 standard deviation
pnorm(1) - pnorm(-1)
```

`## [1] 0.6826895`

```
# 2 standard deviations
pnorm(2) - pnorm(-2)
```

`## [1] 0.9544997`

```
# 3 standard deviation
pnorm(3) - pnorm(-3)
```

`## [1] 0.9973002`

Awesome, we just confirmed the Empirical Rule, also known as the **68-95-99.7 rule**, which relates to the Chebyshev`s theorem. For a bell-shaped distribution the 3 rules are, that approximately

- 68% of the observations lie within one standard deviation of the mean,
- 95% of the observations lie within two standard deviations of the mean, and
- 99.7% of the observations lie within three standard deviations of the mean.

To strengthen our intuition, the Empirical rule is visualized below.