The main functions to use the \(F\)-distribution in R are
df()
, pf()
, qf()
and
rf()
. The df()
function gives the density, the
pf()
function gives the distribution function, the
qf()
function gives the quantile function and the
rf()
function generates random deviates.
We use the df()
function to calculate the density of a
\(F\)-curve with \(v_1=10\) and \(v_2=20\) at the value of 1.2:
df(1.2, df1 = 10, df2 = 20)
## [1] 0.5626125
We use the pf()
function to calculate the area under the
curve for the interval \([0,1.5]\) and
the interval \([1.5, +\infty)\) of a
\(F\)-curve with with \(v_1=10\) and \(v_2=20\):
x <- 1.5
v1 <- 10
v2 <- 20
# interval [0,1.5]
pf(x, df = v1, df2 = v2, lower.tail = TRUE)
## [1] 0.7890535
# interval [1.5,+inf)
pf(x, df = v1, df2 = v2, lower.tail = FALSE)
## [1] 0.2109465
Further, we check if the sum of the intervals \([0,1.5]\) and \([1.5, +\infty)\) sums up to 1:
pf(x, df = v1, df2 = v2, lower.tail = TRUE) + pf(x, df = v1, df2 = v2, lower.tail = FALSE) == 1
## [1] TRUE
We use the qf()
function to calculate the quantile for a
given area (= probability) under the curve, that corresponds to \(q = 0.25, 0.5, 0.75\) and \(0.999\), for a \(F\)-curve with \(v_1=10\) and \(v_2=20\). We set
lower.tail = TRUE
in order the get the area for the
interval \([0, q]\).
q <- c(0.25, 0.5, 0.75, 0.999)
v1 <- 10
v2 <- 20
qf(q[1], df1 = v1, df2 = v2, lower.tail = TRUE)
## [1] 0.6563936
qf(q[2], df1 = v1, df2 = v2, lower.tail = TRUE)
## [1] 0.9662639
qf(q[3], df1 = v1, df2 = v2, lower.tail = TRUE)
## [1] 1.399487
qf(q[4], df1 = v1, df2 = v2, lower.tail = TRUE)
## [1] 5.075246
Finally, we use the rf()
function to generate 100,000
random values from the \(F\)-distribution with \(v_1=10\) and \(v_2=20\). Thereafter, we plot a histogram
and compare it to the probability density function of the \(F\)-distribution with \(v_1=10\) and \(v_2=20\) (pink line).
x <- rf(100000, df1 = 10, df2 = 20)
hist(x,
breaks = "Scott",
freq = FALSE,
xlim = c(0, 3),
ylim = c(0, 1),
xlab = "",
main = (TeX("Histogram for a $\\F$-distribution with $\\v_1 = 10$ and $\\v_2 = 20$ degrees of freedom (df)")),
cex.main = 0.9
)
curve(df(x, df1 = 10, df2 = 20), from = 0, to = 4, n = 5000, col = "pink", lwd = 2, add = T)
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.