We often want to determine the proportion (percentage) of members of a finite population that have a specified attribute. Generally, the population under consideration is too large for the population proportion to be found by taking a census. Suppose that a simple random sample of size $n$ is taken from a population in which the proportion of members that have a specified attribute is $p$. Then a random variable of primary importance in the estimation of $p$ is the number of members sampled that have the specified attribute, which we denote $X$. The exact probability distribution of $X$ depends on whether the sampling is done with or without replacement. If sampling is done with replacement, the sampling process constitutes Bernoulli trials: Each selection of a member from the population corresponds to a trial. A success occurs on a trial if the member selected in that trial has the specified attribute; otherwise, a failure occurs. The trials are independent because the sampling is done with replacement. The success probability remains the same from trial to trial, it always equals the proportion of the population that has the specified attribute. Therefore the random variable $X$ has the binomial distribution with parameters $n$ (the sample size) and $p$ (the population proportion) (Weiss 2010).
In reality, however, sampling is ordinarily done without replacement. Under these circumstances, the sampling process does not constitute Bernoulli trials because the trials are not independent and the success probability varies from trial to trial. In other words, the random variable $X$ does not have a binomial distribution. Its distribution is referred to as a hypergeometric distribution (Weiss 2010).
In practice, however, a hyper-geometric distribution can usually be approximated by a binomial distribution. The reason is that, if the sample size does not exceed 5% of the population size, there is little difference between sampling with and without replacement (Weiss 2010).
Suppose that a simple random sample of size $n$ is taken from a finite population in which the proportion of members that have a specified attribute is $p$. Then the number of members sampled that have the specified attribute
has exactly a binomial distribution with parameters $n$ and $p$ if the sampling is done with replacement and
has approximately a binomial distribution with parameters $n$ and $p$ if the sampling is done without replacement and the sample size does not exceed 5% of the population size (Weiss 2010).
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.