So far, we have used $z$-scores to calculate the area under the curve. Now we do it the other way around. We calculate the $z$-score(s) corresponding to a specified area under the standard normal curve. Finding the $z$-score that has a specified area is so frequently that there is a special notation. The symbol $z_\alpha$ is used to denote the $z$-score that has an area of $\alpha$ (alpha) to its right under the standard normal curve.
Let us find $z_{0.05}$, the $z$-score that has an area of $0.05$ to its right under the standard normal curve. The value of $\alpha$ corresponds to the probability of obtaining a particular value corresponding the the interval $[z, \infty)$. Because the area to its right is $0.05$. The area to its left is $1 - 0.05 = 0.95$, corresponding the the interval $(-\infty, z]$ (see plot below).
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
mu = 0
sigma = 1
cut_a = 0.05
cut_b = 4
x = np.arange(-4, 4.01, 0.01)
yy = stats.norm.pdf(x, mu, sigma)
plt.figure(figsize=(10, 5))
plt.plot(x, yy, color="black")
plt.fill_between(
x=x,
y1=yy,
where=(x >= stats.norm.ppf(1 - 0.05) * 1.1),
color="red",
edgecolor="black",
alpha=0.75,
)
plt.text(2.5, 0.15, "area = 0.05", fontsize=16)
plt.text(-0.7, 0.2, "area = 0.95", fontsize=16)
plt.text(1.9, 0.07, "$z_{0.05}$", fontsize=16)
plt.arrow(
2.7,
0.13,
-0.5,
-0.12,
length_includes_head=True,
head_width=0.02,
head_length=0.1,
color="black",
)
plt.xlabel("z score")
plt.yticks([])
plt.show()
In order to get the corresponding $z$-score one may look it up in a probability table, or make use of Python. Therefore, we apply the norm.ppf()
method. The ppf
method is written as norm.ppf(q, loc=0, scale=1)
. We keep the default values for the arguments loc
(mean), scale
(standard deviation). However, we have to be careful with calculating with lower and upper tails. If we want to know the lower tail, we get that $z$-score, where the q
argument is the area left to $z$. Therefore, we have to calculate norm.ppf(q)
. In contrast, if we get the $z$-score where the q
argument is the area right to $z$ if we want to get the upper tail. Therefore, we have to calculate norm.ppf(1 - q)
. Let us turn to Python to make this more clearly.
stats.norm.ppf(0.05) # lower tail
-1.6448536269514729
stats.norm.ppf(1 - 0.05) # upper tail
1.6448536269514722
As the standard normal distribution is symmetric we get twice the same number, but with a different sign. This means the for a $z$-value of approximately 1.64, 95% of all values are left to $z_{0.05}$ and 5% of all values on the right side of it. In contrast for a $z$-value of approximately - 1.64, 5% of all values are left to $z_{0.05}$ and 95% of all values on the right side of it. If we combine these we get the interval $z \in [-1.64, 1.64]$, which covers 90% of all values.
mu = 0
sigma = 1
cut_a = 0.05
cut_b = 4
x = np.arange(-4, 4.01, 0.01)
yy = stats.norm.pdf(x, mu, sigma)
plt.figure(figsize=(10, 5))
plt.plot(x, yy, color="black")
plt.fill_between(
x=x,
y1=yy,
where=(x >= stats.norm.ppf(1 - 0.05) * 1.1),
color="red",
edgecolor="black",
alpha=0.75,
)
plt.fill_between(
x=x,
y1=yy,
where=(x <= stats.norm.ppf(0.05) * 1.1),
color="red",
edgecolor="black",
alpha=0.75,
)
plt.text(2.5, 0.15, "area = 0.05", fontsize=16)
plt.text(-1, 0.13, "area = 0.9 = 90%", fontsize=16)
plt.text(-4, 0.15, "area = 0.05", fontsize=16)
plt.text(1.9, 0.07, "1.64", fontsize=16)
plt.text(-2.5, 0.07, "-1.64", fontsize=16)
plt.arrow(
2.7,
0.13,
-0.5,
-0.12,
length_includes_head=True,
head_width=0.02,
head_length=0.1,
color="black",
)
plt.arrow(
-2.7,
0.13,
0.5,
-0.12,
length_includes_head=True,
head_width=0.02,
head_length=0.1,
color="black",
)
plt.xlabel("z score")
plt.yticks([])
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.