Let us apply the Poisson Distribution in form of an example. We focus on the one-hundred-year flood, a concept that is widely used in river engineering to design flood control measures.
Let us recall the mathematical notation of a poisson random variable:
$$ P(X = x) = e^{-\lambda}\frac{\lambda x}{x!}, \qquad x = 0, 1, 2, . . . ,$$where $\lambda$ is a positive real number, which represents the average number of events occurring during a fixed time interval, and $e \approx `r exp(1)`$.
The 100-year flood is a shorthand term for a flood with an annual exceedance probability of 1% and an average recurrence interval of 100 years. However, the term can be confusing for people because, not unreasonably, as they imagine the term describes floods, which occur once every 100-years. However, this is not true. A flood with an annual exceedance probability of 1% indicates that a with a probability of 0.01 a flood of a magnitude corresponding to a 100 year flood occurs in any given year.
Putting this in the context of a Poisson distribution the expected value, $E(x) = \lambda$, of such a flood during the fixed interval of 100 years is set to $\lambda= 100\times 0.01 = 1$. Consequently, the Poisson random variable $X$ is the number of occurrences, which of course may take several values depending our question. We may be interested in the probability that such an flood event will not occur during the 100-year interval $P(x=0)$, or we want to know the probability the such an flood event will occur exactly once during the 100-year interval, thus $P(x=1)$, or we want to know the probability that two or more of such flood events occur during the 100-year interval, thus $P(x \ge 2)$. Plugging these values in the equation from above yields
$\lambda = 1, x = 0,1,2,..,n$
$$ P(X = 0) = e^{-1}\frac{1 \times 0}{0!}, \qquad \text{for } x = 0$$$$ P(X = 1) = e^{-1}\frac{1 \times 1}{1!}, \qquad \text{for } x = 1$$$$ P(X \ge 2) = \sum_{i=2}^n e^{-1}\frac{1 \times x_i}{x_i!}, \qquad \text{for } x_i = 2,3,...,n$$We turn to Python to do the calculations. We will make us of the poisson()
functions implemented in the scipy.stats
module. Note that this function comes with different methods. Call the help()
for further information.
Note: The
poisson.pmf(k, mu)
takes $\lambda$ as the shape parameter denoted asmu
.
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
lambda_pois = 1 # set expected value
k = [0, 1] # set Poisson random variables of interest
stats.poisson.pmf(k[0], mu=lambda_pois)
0.36787944117144233
stats.poisson.pmf(k[1], mu=lambda_pois)
0.36787944117144233
Recall: When calculating the lower or upper tail probability, there is no additional argument within the
cdf
function. Python will calculate the left/lower tail probabilities by default ($P(X \le x)$). To calculate the upper tail probability, one has to tweak the input by entering1- poisson.cdf()
($P(X > x)$).
1 - stats.poisson.cdf(k[1], mu=lambda_pois)
0.26424111765711533
The results indicate that the probability that no flood $P(X = 0)$ of a magnitude corresponding to a 100 year flood will occur during a period of 100 years is 0.36, which is interestingly as likely as the occurrence of exactly one flood $P(X = 1)$. The probability that two or more $(P(X \ge 2))$ of such flood events occur during the 100-year interval is 0.26, and thus lower. However, please note that the probability that two or more $(P(X \ge 2))$ of such flood events occur during the 100-year interval is approximately 26.4%!
For a sanity check we sum up the probabilities $P(x=0)$, $P(x=1)$, and $P(x \ge 2)$, which should yield 1,
(
stats.poisson.pmf(k[0], mu=lambda_pois)
+ stats.poisson.pmf(k[1], mu=lambda_pois)
+ (1 - stats.poisson.cdf(k[1], mu=lambda_pois))
)
1.0
For a better intuition we plot the probabilities of the Poisson random variable $x = 0,1,2,3,4,\ge 5$.
lambda_pois = 1 # set expected value
out = []
k = np.arange(0, 6)
out.append(stats.poisson.pmf(k=k[0], mu=lambda_pois))
out.append(stats.poisson.pmf(k=k[1], mu=lambda_pois))
out.append(stats.poisson.pmf(k=k[2], mu=lambda_pois))
out.append(stats.poisson.pmf(k=k[3], mu=lambda_pois))
out.append(stats.poisson.pmf(k=k[4], mu=lambda_pois))
out.append(1 - stats.poisson.cdf(k=k[5], mu=lambda_pois))
# plot
plt.figure(figsize=(9, 5))
plt.bar(k, out, color="lightblue", edgecolor="black")
plt.xlabel("Number of floods (x)")
plt.ylabel("Probability $(P=x)$")
plt.title(
'Probability of the occurence of "100 year floods" \nduring a period of 100 years'
)
plt.xticks(k, (["0", "1", "2", "3", "4", ">= 5"]))
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.