Python provides access to the t-distribution within the scipy.stats package by the t.pdf(), t.cdf(), t.ppf() and t.rvs() functions. Apply the help() function on these functions for further information or see the documentation here.

The t.rvs() function generates random deviates of the t-distribution and is written as t.rvs(df, loc=0, scale=1, size=1). We may easily generate n number of random samples (size). Recall that he number of degrees of freedom (df) for a t-distribution is equal to the sample size minus one, that is,

$$df = n - 1\text{.}$$
In [2]:
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
In [3]:
n = 30
df = n - 1
stats.t.rvs(df, loc=0, scale=1, size=n)
Out[3]:
array([ 0.95544828,  0.37078233, -0.18390197, -1.12389442,  0.66589859,
        1.35468731,  0.4142192 ,  0.9110111 , -0.37727128,  0.81129497,
        0.34850557,  0.60532365, -0.20578076, -1.29671915,  0.13190164,
       -2.16378482, -0.26282869, -0.2769083 , -1.54400695,  0.28656318,
       -0.22978442, -0.01746411, -0.23847418, -0.83386572,  1.06076188,
        0.91941202, -0.45496332,  0.29924681, -0.63563942,  1.19456363])

Further we may generate a very large number of random samples and plot them as a histogram.

In [4]:
n = 10000
df = n - 1
rand_t_samples = stats.t.rvs(df, loc=0, scale=1, size=n)
plt.figure(figsize=(10, 5))
plt.hist(
    rand_t_samples, density=True, color="lightgrey", edgecolor="darkgrey", bins="scott"
)
plt.xlabel("samples")
plt.ylabel("Density")
plt.show()

By using the t.pdf() function we may calculate the probability density function, and thus, the vertical distance between the horizontal axis and the t-curve at any point. For the purpose of demonstration we construct a t-distribution with $df=5$ and calculate the probability density function at $t = -4,-2,0,2,4$.

In [5]:
x = np.arange(-4, 5, 2)
stats.t.pdf(x, df=5)
Out[5]:
array([0.00512373, 0.06509031, 0.37960669, 0.06509031, 0.00512373])
In [6]:
x = np.arange(-5, 5.01, 0.01)
yy = stats.t.pdf(x, df=5)

plt.figure(figsize=(10, 5))
plt.plot(x, yy, color="black")

plt.title("t-Distribution, df = 5")
plt.ylim(-0.01, 0.5)

plt.xlabel("t-values")
plt.ylabel("Probability density")


xpos = [-4, -2, 0, 2, 4]
ypos = [0.005, 0.065, round(max(yy), 3), 0.065, 0.005]
for px, py in zip(xpos, ypos):
    plt.vlines(x=px, ymin=0, ymax=py, color="red", linestyle="--")
    plt.text(px - 0.3, py + 0.025, f"{py}", fontsize=14)


plt.show()

Another very useful function is the t.cdf() function, which returns the area under the t-curve for any given interval. Let us calculate the area under the curve for the intervals $j_i = (-\infty, -2], (-\infty, 0], (-\infty, 2]$ and $k_i = [-2, \infty),[0, \infty), [2, \infty)$ for a random variable following a t-distribution with $df=5$.

In [7]:
df = 5
ji = [-2, 0, 2]
stats.t.cdf(ji, df=df)
Out[7]:
array([0.05096974, 0.5       , 0.94903026])
In [8]:
df = 5
fx = [-2, 0, 2]
fy = stats.t.pdf(fx, df)
out = stats.t.cdf(ji, df=df)
x = np.arange(-6, 6.01, 0.05)
yy = stats.t.pdf(x, df)

fig, ax = plt.subplots(1, 3, figsize=(14, 5))
plt.title("Area under the curve for $j_i$")

###### Plot 1
ax[0].plot(x, yy, color="black")
ax[0].set_ylim(-0.01, 0.45)
ax[0].set_ylabel("Probability density")
ax[0].fill_between(
    x=x,
    y1=yy,
    where=(x <= -2),
    color="red",
    alpha=0.75,
)
ax[0].text(
    1,
    0.25,
    f"Colored area is \n {round(out[0]*100,2)}%",
    fontsize=12,
)

ax[0].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{-\infty}^{j_i}f(x)dx $",
    fontsize=12,
)

#### Plot 2
ax[1].plot(x, yy, color="black")

ax[1].set_ylim(-0.01, 0.45)
ax[1].fill_between(
    x=x,
    y1=yy,
    where=(x <= 0),
    color="red",
    alpha=0.75,
)
ax[1].text(
    1,
    0.25,
    f"Colored area is \n {round(out[1]*100,2)}%",
    fontsize=12,
)

ax[1].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{-\infty}^{j_i}f(x)dx $",
    fontsize=12,
)

##### Plot 3
ax[2].plot(x, yy, color="black")
ax[2].set_ylim(-0.01, 0.45)
ax[2].fill_between(
    x=x,
    y1=yy,
    where=(x <= 2),
    color="red",
    alpha=0.75,
)
ax[2].text(
    1,
    0.25,
    f"Colored area is \n {round(out[2]*100,2)}%",
    fontsize=12,
)

ax[2].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{-\infty}^{j_i}f(x)dx $",
    fontsize=12,
)

plt.show()
In [9]:
df = 5
fx = [-2, 0, 2]
fy = stats.t.pdf(fx, df)
out = stats.t.cdf(ji, df=df)
x = np.arange(-6, 6.01, 0.05)
yy = stats.t.pdf(x, df)

fig, ax = plt.subplots(1, 3, figsize=(14, 5))
plt.title("Area under the curve for $k_i$")

###### Plot 1
ax[0].plot(x, yy, color="black")
ax[0].set_ylim(-0.01, 0.45)
ax[0].set_ylabel("Probability density")
ax[0].fill_between(
    x=x,
    y1=yy,
    where=(x >= -2),
    color="red",
    alpha=0.75,
)
ax[0].text(
    1,
    0.25,
    f"Colored area is \n {round(out[2]*100,2)}%",
    fontsize=12,
)

ax[0].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{k_i}^{\infty}f(x)dx $",
    fontsize=12,
)

#### Plot 2
ax[1].plot(x, yy, color="black")

ax[1].set_ylim(-0.01, 0.45)
ax[1].fill_between(
    x=x,
    y1=yy,
    where=(x >= 0),
    color="red",
    alpha=0.75,
)
ax[1].text(
    1,
    0.25,
    f"Colored area is \n {round(out[1]*100,2)}%",
    fontsize=12,
)

ax[1].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{k_i}^{\infty}f(x)dx $",
    fontsize=12,
)

##### Plot 3
ax[2].plot(x, yy, color="black")
ax[2].set_ylim(-0.01, 0.45)
ax[2].fill_between(
    x=x,
    y1=yy,
    where=(x >= 2),
    color="red",
    alpha=0.75,
)
ax[2].text(
    1,
    0.25,
    f"Colored area is \n {round(out[0]*100,2)}%",
    fontsize=12,
)

ax[2].text(
    -4,
    0.4,
    "$P(X \leq j_i) =\\int_{k_i}^{\infty}f(x)dx $",
    fontsize=12,
)

plt.show()

The t.ppf(q, df) function returns the quantile function, and thus it is the reversing function of t.cdf(). For the intervals $j_i = (-\infty, -2], (-\infty, 0], (-\infty, 2]$ of a random variable following a t-distribution with $df=5$, the t.ppf() function yields...

In [10]:
x = np.arange(-2, 2.01, 2)
ji = stats.t.cdf(x, 5)
ji
Out[10]:
array([0.05096974, 0.5       , 0.94903026])
In [11]:
stats.t.ppf(ji, df=5)
Out[11]:
array([-2.00000000e+00,  6.97600362e-17,  2.00000000e+00])

... and for the intervals $k_i = [-2, \infty),[0, \infty), [2, \infty)$ of a random variable following a t-distribution with $df=5$, the t.ppf() function returns

In [12]:
x = np.arange(-2, 2.01, 2)
ki = 1 - stats.t.cdf(x, 5)
ki
Out[12]:
array([0.94903026, 0.5       , 0.05096974])
In [13]:
stats.t.ppf(ki, df=5)
Out[13]:
array([ 2.00000000e+00,  6.97600362e-17, -2.00000000e+00])

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.