For cases in which \(n\) is large and \(p\) is very small the Poisson distribution can be used to approximate the binomial distribution. Recall the binomial probability distribution:
\[ P(X = x) = {n \choose x}p^x(1-p)^{n-x}, \qquad x = 0, 1, 2, . . . , n\]
In the “100-year flood” example from the previous section \(n\) is a large number (100) and \(p\) is a small number (0.01). Plugging into the equation from above \(P(X = 1)\) yields
\[\begin{align} \ P(X = 1) & = {100 \choose 1}\times 0.01^1\times (1 - 0.01)^{100 -1} \\ & = 100\times 0.01\times 0.3697296 \\ & = 0.3697296 \end{align}\]
The result is very close to the result obtained using the Poisson
distribution dpois(x = 1, lambda = 1)
\(= 0.3678794\). The appropriate Poisson
distribution is the one whose mean is the same as that of the binomial
distribution; that is \(\lambda = np\),
which in our example is \(\lambda = 100 \times
0.01 = 1\).
Exercise: In a certain region within the Sahel zone, with a population of roughly 2.5 million people, an average of 500 cases of tuberculosis per year have been reported within a long term record. We assume that the occurence frequency can be described as a Poisson process. Last year the responsible public health office reported 531 cases. Is the probability of this many cases occuring small enough to suspect an emerging epidemic?!
### your code here
lamda <- 500
q <- 530
prob <- ppois(q = q, lambda = lamda, lower.tail = F)
paste("The probability to record 531 or more cases of tuberculosis per year is ",
round(prob, digits = 4) * 100, " %.",
sep = ""
)
## [1] "The probability to record 531 or more cases of tuberculosis per year is 8.73 %."
## [1] "If we assume a significance level of 5 % an emerging epidemic cannot be confirmed."
In order to conclude this section and to give you some intuition of the shapes of different Poisson probability distributions, three different Poisson probability distributions as well as their corresponding cumulative Poisson probability distributions for \(\lambda =2.5\), \(\lambda =7\), \(\lambda = 12\) are given below.
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.