The main functions regarding the $F$-distribution are f.rvs(), f.pdf(), f.cdf(), f.ppf() from the scipy.stats package. The f.pdf() function gives the density, the f.cdf() function gives the distribution function, the f.ppf() function gives the quantile function, which is the inverse of cdf-percentiles, and the f.rvs() function generates random deviates.

We use the f.pdf(x, dfn, dfd) to calculate the density at the value of 1.2 of a $F$-curve with $dfn=10$ and $dfd=20$.

In [2]:
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
In [3]:
stats.f.pdf(1.2, dfn=10, dfd=20)
Out[3]:
0.5626124566227062

First, we use the pdf() to calculate the area under the curve for the interval $[0,1.5]$ and the interval $[1.5, +\infty)$ of a $F$-curve with with $dfn=10$ and $dfd=20$. Further we ask Python if the sum of the intervals $[0,1.5]$ and $[1.5, +\infty)$ sums up to 1:

In [4]:
x = 1.5
dfn = 10
dfd = 20

stats.f.pdf(x, dfn, dfd)  ## lower tail of the distribution --> [0,1.5]
Out[4]:
0.3581610916591196
In [5]:
1 - stats.f.pdf(x, dfn, dfd)  ## upper tail of the distribution --> [1.5, + infinity]
Out[5]:
0.6418389083408804
In [6]:
(1 - stats.f.pdf(x, dfn, dfd)) + stats.f.pdf(x, dfn, dfd) == 1
Out[6]:
True

We use the f.pdf() to calculate the quantile for a given area (= probability) under the curve for a $F$-curve with $dfn=10$ and $dfd=20$ that corresponds to $q = 0.25, 0.5, 0.75$ and $0.999$. This time, we do not set 1 - f.pdf() in order the get the area for the interval $[0, q]$, which is the lower tail of the distribution.

In [7]:
q = [0.25, 0.5, 0.75, 0.999]
dfn = 10
dfd = 20
stats.f.pdf(q[0], dfn, dfd)
Out[7]:
0.20881240583589708
In [8]:
stats.f.pdf(q[1], dfn, dfd)
Out[8]:
0.6878819621273636
In [9]:
stats.f.pdf(q[2], dfn, dfd)
Out[9]:
0.8336594552286112
In [10]:
stats.f.pdf(q[3], dfn, dfd)
Out[10]:
0.7150707286950534

We use f.rvs()function to generate 100,000 random values from the $F$-distribution with $v_1=10$ and $v_2=20$. Thereafter we plot a histogram and compare it to the probability density function of the $F$-distribution with $v_1=10$ and $v_2=20$ (orange line).

In [11]:
rand_f_samples = stats.f.rvs(dfn=10, dfd=20, size=100000)

plt.figure(figsize=(10, 5))
plt.hist(
    rand_f_samples,
    density=True,
    color="lightgrey",
    edgecolor="darkgrey",
    bins="scott",
)


plt.plot(
    np.arange(0, 4, 0.1),
    stats.f.pdf(np.arange(0, 4, 0.1), dfn=10, dfd=20),
    "-",
    linewidth=2,
    color="orange",
)

plt.xlim(0, 4)

plt.title(
    "Histogram for a $F$-distribution with $v_1 = 10$ and $v_2 = 20$ degrees of freedom (df)"
)

plt.show()

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.