For situations in which $n$ is large and $p$ is very small, the Poisson distribution can be used to approximate the binomial distribution. Recall the binomial probability distribution:
$$ P(X = x) = {n \choose x}p^x(1-p)^{n-x}, \qquad x = 0, 1, 2, . . . , n \, .$$# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import math
import scipy.stats as stats
import random
import pandas as pd
pp = 0.01
pn = 100
px = 1
stats.poisson.pmf(k=1, mu=1)
0.36787944117144233
As in the "100 year flood" example above, $n$ is a large number (100) and $p$ is a small number (0.01). Plugging into the equation from above $P(x = 0.01)$ yields
\begin{align} \ P(X = 1) & = {100 \choose 1}\times 0.01^1\times (1 - 0.01)^{100 -1} \\ & = 100 \times 0.01 \times 0.3697296 \\ & = 0.3697296 \end{align}The result is very close to the result obtained above stats.poisson.pmf(k = 1, mu=1)
$= 0.36787944117144233$. The appropriate Poisson distribution is the one whose mean is the same as that of the binomial distribution; that is, $\lambda = np$, which in our example is $\lambda = 100 \times 0.01 = 1$.
In order to conclude this section, and in order to give you some intuition of the shapes of different Poisson probability distributions, three different Poisson probability and its corresponding cumulative Poisson probability distributions for $\lambda =2.5$, $\lambda =7$, $\lambda = 12$ are given below.
lambda_pois = 2.5 # probability of success
n = 100000 # number of random samples
random_poisson_numbers = pd.Series(stats.poisson.rvs(lambda_pois, size=n))
fig, ax = plt.subplots(figsize=(10, 5))
plt.title("$\lambda$ = 2.5", fontsize=16)
ax.hist(
random_poisson_numbers,
bins=len(np.unique(random_poisson_numbers)),
color="white",
edgecolor="black",
density=1,
)
ax.set_xlabel("Number of occurence (x)", fontsize=14)
ax.set_ylabel("Probability (P=X)", fontsize=14)
# twin object for two different y-axis on the sample plot
ax2 = ax.twinx()
# make a plot with different y-axis using second axis object
ax2.plot(
np.arange(0, 31),
stats.poisson.cdf(k=np.arange(0, 31), mu=lambda_pois),
linewidth=3,
color="black",
)
ax2.set_ylabel("Cummulative probability", fontsize=14)
plt.show()
lambda_pois = 7 # probability of success
n = 100000 # number of random samples
random_poisson_numbers = pd.Series(stats.poisson.rvs(lambda_pois, size=n))
fig, ax = plt.subplots(figsize=(10, 5))
plt.title("$\lambda$ = 7", fontsize=16)
ax.hist(
random_poisson_numbers,
bins=len(np.unique(random_poisson_numbers)),
color="white",
edgecolor="black",
density=1,
)
ax.set_xlabel("Number of occurence (x)", fontsize=14)
ax.set_ylabel("Probability (P=X)", fontsize=14)
# twin object for two different y-axis on the sample plot
ax2 = ax.twinx()
# make a plot with different y-axis using second axis object
ax2.plot(
np.arange(0, 31),
stats.poisson.cdf(k=np.arange(0, 31), mu=lambda_pois),
linewidth=3,
color="black",
)
ax2.set_ylabel("Cummulative probability", fontsize=14)
plt.show()
lambda_pois = 12 # probability of success
n = 100000 # number of random samples
random_poisson_numbers = pd.Series(stats.poisson.rvs(lambda_pois, size=n))
fig, ax = plt.subplots(figsize=(10, 5))
plt.title("$\lambda$ = 12", fontsize=16)
ax.hist(
random_poisson_numbers,
bins=len(np.unique(random_poisson_numbers)),
color="white",
edgecolor="black",
density=1,
)
ax.set_xlabel("Number of occurence (x)", fontsize=14)
ax.set_ylabel("Probability (P=X)", fontsize=14)
# twin object for two different y-axis on the sample plot
ax2 = ax.twinx()
# make a plot with different y-axis using second axis object
ax2.plot(
np.arange(0, 31),
stats.poisson.cdf(k=np.arange(0, 31), mu=lambda_pois),
linewidth=3,
color="black",
)
ax2.set_ylabel("Cummulative probability", fontsize=14)
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.