An important application

OK, we had enough of gambling, advertisement and exam preparation. There is also a serious application of the hypergeometric distribution: It may help to decide if a unusual distribution is likely between gender or minorities influencing their chance to participate.

Therefore, let’s have a look into the help of R concerning the hypergeometric distribution. Here, we find the following explanation:

x, q: vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.

m: the number of white balls in the urn.

n: the number of black balls in the urn.

k: the number of balls drawn from the urn.

How can we use this description for a meaningful application?

Imagine, we have a qualified job offer for an executive position and there is a number of people (urn) consists of 10 women (m, white balls) and 10 men (n, black balls) who are qualified for the job. k=10 of both gender finally applied for the job. How likely are the cases for x= 0,1,…,10 women are within the group of applicants? Let us plot the density for that experiment:

barplot(height = dhyper(0:10,10,10,10),names.arg = 0:10,xlab = "Number of female applicants",ylab = "Probablity")

Nice, half of possible candidates under parity finally applied most likely. Thus we would expect, that five out of ten are within the group of applicants.

But, what will happen if the “pool” of qualified men and women is not balanced?

Let us assume that four times more women are qualified:

barplot(height = dhyper(10:20,40,10,20),names.arg = 10:20,xlab = "Number of female applicants",ylab = "Probablity")
abline(h = 0.025)

If we apply again the 95% rule for the expected range of usual application distribution, up to 19 out of 20 application appears likely. Of course, the expected number of 16 female is equal to the 4/5 ration of qualified female. Even 13 female does not indicate an unusual portion by chance. Thus, the usual range of expectation by 95% of chance ranges from 13 to 19 female.

Exercise: Analyse the probability distribution for the unbalanced gender ratio of 3 female per male within a qualified group of 40 and 400 people in a suitable range of x. Compare the results using the Binominal distribution with prob= m/(m+n) and size = k.

### your code here
Show code for a qualified group of 40
x <- 0:10
m <- 30
n <- 10
k <- 10

barplot(height = dhyper(x = x, m = m, n = n, k = k),
        names.arg = x, xlab = "Number of female applicants", ylab = "Probablity",
        col = '#00CCCC')
Show code for a qualified group of 400
x <- 0:10
m <- 300
n <- 100
k <- 10

barplot(height = dhyper(x = x, m = m, n = n, k = k),
        names.arg = x, xlab = "Number of female applicants", ylab = "Probablity",
        col = '#00CCCC')

Exercise: Compare the results using the Binominal distribution with prob= m/(m+n) and size = k.

### your code here
Show code for a qualified group of 400
x <- 0:10
prob <- m/(m+n)
size <- k

barplot(height = dbinom(x=x, size = size,prob = prob),
        names.arg = x, xlab = "Number of female applicants", ylab = "Probablity",
        col = '#00CCCC')

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.