The shape of the distribution of a random variable may be visualized with a smooth curve. Such curves, which represent the distribution of continuous variables, are called probability density functions (PDF) or just density functions. Probability density functions have three main properties (Mann 2012, Weiss, 2010):
The area under the curve is computed by the integral of the value $x$ from $-\infty$ to $+\infty$, which yields 1. $$\int_{-\infty}^{+\infty} f(x)dx = 1$$
shape = 7
rate = 1
cut_a = 1
cut_b = 4.5
plt.figure(figsize=(10, 5))
plt.plot(
np.arange(0, 30, 0.1),
stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
color="black",
)
plt.fill_between(
x=np.arange(0, 30, 0.1),
y1=stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
color="red",
alpha=0.9,
)
plt.axhline(0, color="black")
plt.axis("off")
plt.text(14, 0.12, "Colored area is 1.0\nor 100%", fontsize=14)
plt.text(14, 0.07, "$\int_{-\infty}^{+\infty} f(x)dx = 1$", fontsize=14)
plt.show()
The probability that a continuous random variable $x$ takes a value within a certain interval is given by the area under the curve between the two limits of the interval. The colored area under the curve for the interval $(-\infty, a]$ (left panel) and for the interval $[a,\infty)$ (right panel) is shown in the figure below.
The probability that $x$ falls in the interval $(-\infty, a]$ is
$$P(X \le a) = \int_{-\infty}^{a}f(x)dx$$and the probability that $x$ falls in the interval $[a, \infty)$ is
$$P(X \ge a) = 1 - P(X \le a) = \int_{a}^{\infty}f(x)dx$$shape = 7
rate = 1
cut_a = 1
cut_b = 4.5
fig, ax = plt.subplots(1, 2, figsize=(15, 5))
## plot 1
ax[0].plot(
np.arange(0, 30, 0.1),
stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
color="black",
)
ax[0].fill_between(
x=np.arange(0, 30, 0.1),
y1=stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
where=(np.arange(0, 30, 0.1) <= cut_b),
color="red",
alpha=0.9,
)
ax[0].axhline(0, color="black")
ax[0].axis("off")
ax[0].text(14, 0.12, "Colored area is \nbetween 0 and 1", fontsize=14)
ax[0].text(14, 0.07, "$P(X\leq a) = \int_{-\infty}^{a}f(x)dx$", fontsize=14)
## plot 2
ax[1].plot(
np.arange(0, 30, 0.1),
stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
color="black",
)
ax[1].fill_between(
x=np.arange(0, 30, 0.1),
y1=stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
where=(np.arange(0, 30, 0.1) >= cut_b),
color="red",
alpha=0.9,
)
ax[1].axhline(0, color="black")
ax[1].axis("off")
ax[1].text(14, 0.12, "Colored area is \nbetween 0 and 1", fontsize=14)
ax[1].text(14, 0.07, "$P(X\geq a) = \int_{a}^{\infty}f(x)dx$", fontsize=14)
plt.show()
The probability that a continuous random variable $x$ assumes a value within a certain interval is given by the area under the curve between the two limits of the interval. The colored area under the curve from $a$ to $b$ in the figure below gives the probability that $x$ falls in the interval $[a,b]$.
shape = 7
rate = 1
cut_a = 3.7
cut_b = 11
plt.figure(figsize=(10, 5))
plt.plot(
np.arange(0, 30, 0.1),
stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
color="black",
)
plt.fill_between(
x=np.arange(0, 30, 0.1),
y1=stats.gamma.pdf(np.arange(0, 30, 0.1), shape, rate),
where=(np.arange(0, 30, 0.1) >= cut_a) & (np.arange(0, 30, 0.1) <= cut_b),
color="red",
alpha=0.9,
)
plt.axhline(0, color="black")
plt.axis("off")
plt.text(
14,
0.12,
"Colored area gives\n the probability\n $P(a \\leq x \\leq b)$",
fontsize=14,
)
plt.text(14, 0.07, "$P(a \\leq x \\leq b) = \\int_a^b f(x)dx$", fontsize=14)
plt.text(3.7, -0.01, "a", fontsize=14)
plt.text(11, -0.01, "b", fontsize=14)
plt.show()
Note that the interval $a \le x \le b$ states that $x$ is greater than or equal to $a$ but less than or equal to $b$.
For a continuous probability distribution, the probability is always calculated for an interval. The probability that a continuous random variable $x$ assumes a single value is always zero. This is because, the probability to pick exactly one value out of an infinite number of value $\in \mathbb R$ is zero. In a geometric sense this means that the area of a line, which represents a single point, is zero:
$$P(x) = 0 \, .$$From this we can deduce that for a continuous random variable
$$P(a \le x \le b) = P(a < x < b)\, . $$In other words, the probability that $x$ assumes a value in the interval $a$ to $b$ is the same whether or not the values $a$ and $b$ are included in the interval.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.