The main functions regarding the $F$-distribution are f.rvs()
, f.pdf()
, f.cdf()
, f.ppf()
from the scipy.stats
package. The f.pdf()
function gives the density, the f.cdf()
function gives the distribution function, the f.ppf()
function gives the quantile function, which is the inverse of cdf-percentiles, and the f.rvs()
function generates random deviates.
We use the f.pdf(x, dfn, dfd)
to calculate the density at the value of 1.2 of a $F$-curve with $dfn=10$ and $dfd=20$.
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
stats.f.pdf(1.2, dfn=10, dfd=20)
0.5626124566227062
First, we use the pdf()
to calculate the area under the curve for the interval $[0,1.5]$ and the interval $[1.5, +\infty)$ of a $F$-curve with with $dfn=10$ and $dfd=20$. Further we ask Python if the sum of the intervals $[0,1.5]$ and $[1.5, +\infty)$ sums up to 1:
x = 1.5
dfn = 10
dfd = 20
stats.f.pdf(x, dfn, dfd) ## lower tail of the distribution --> [0,1.5]
0.3581610916591196
1 - stats.f.pdf(x, dfn, dfd) ## upper tail of the distribution --> [1.5, + infinity]
0.6418389083408804
(1 - stats.f.pdf(x, dfn, dfd)) + stats.f.pdf(x, dfn, dfd) == 1
True
We use the f.pdf()
to calculate the quantile for a given area (= probability) under the curve for a $F$-curve with $dfn=10$ and $dfd=20$ that corresponds to $q = 0.25, 0.5, 0.75$ and $0.999$. This time, we do not set 1 - f.pdf()
in order the get the area for the interval $[0, q]$, which is the lower tail of the distribution.
q = [0.25, 0.5, 0.75, 0.999]
dfn = 10
dfd = 20
stats.f.pdf(q[0], dfn, dfd)
0.20881240583589708
stats.f.pdf(q[1], dfn, dfd)
0.6878819621273636
stats.f.pdf(q[2], dfn, dfd)
0.8336594552286112
stats.f.pdf(q[3], dfn, dfd)
0.7150707286950534
We use f.rvs()
function to generate 100,000 random values from the $F$-distribution with $v_1=10$ and $v_2=20$. Thereafter we plot a histogram and compare it to the probability density function of the $F$-distribution with $v_1=10$ and $v_2=20$ (orange line).
rand_f_samples = stats.f.rvs(dfn=10, dfd=20, size=100000)
plt.figure(figsize=(10, 5))
plt.hist(
rand_f_samples,
density=True,
color="lightgrey",
edgecolor="darkgrey",
bins="scott",
)
plt.plot(
np.arange(0, 4, 0.1),
stats.f.pdf(np.arange(0, 4, 0.1), dfn=10, dfd=20),
"-",
linewidth=2,
color="orange",
)
plt.xlim(0, 4)
plt.title(
"Histogram for a $F$-distribution with $v_1 = 10$ and $v_2 = 20$ degrees of freedom (df)"
)
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.