Python provides access to the uniform distribution within the scipy.stats package
by the uniform.pdf()
, uniform.cdf()
, uniform.ppf()
and uniform.rvs()
functions. Apply the help()
function on these functions for further information or see the documentation here.
The uniform.rvs()
function generates random deviates of the uniform distribution and is written as uniform.rvs(loc=0, scale=1, size=n)
. We may easily generate n
number of random samples within the scopes of a mean (loc
) and a standard deviation (scale
).
%load_ext nb_black
The nb_black extension is already loaded. To reload it, use: %reload_ext nb_black
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
stats.uniform.rvs(loc=0, scale=1, size=40)
array([0.45403509, 0.70278742, 0.21910807, 0.09913479, 0.85098736, 0.48649517, 0.98790227, 0.9543566 , 0.78594082, 0.77364634, 0.96023167, 0.46980974, 0.08692602, 0.15617203, 0.19281265, 0.11422663, 0.03041832, 0.24657492, 0.37807096, 0.29216436, 0.66778093, 0.21880304, 0.717221 , 0.99501647, 0.37157025, 0.58162895, 0.45139485, 0.09517478, 0.81554555, 0.54348508, 0.48010381, 0.70338999, 0.54389102, 0.31159956, 0.50374936, 0.50493137, 0.41412099, 0.47522944, 0.36498424, 0.97895737])
We may approximate the density function for $X \sim U(-2, 0.8)$ by using the uniform.rvs()
function and plot the results as a histogram.
rand_unif = stats.uniform.rvs(loc=-2, scale=2.8, size=1000)
plt.figure(figsize=(10, 5))
plt.hist(rand_unif, density=True, color="lightgrey", edgecolor="darkgrey")
plt.xlabel("x")
plt.ylabel("Density")
plt.title('Histogram of rand_unif')
plt.show()
Further, we plot both, the density histogram from above as well as the uniform probability distribution for the interval [-2,0.8], by applying the unform.pdf()
function.
a = -2
b = 0.8
plt.figure(figsize=(10, 5))
plt.hist(rand_unif, density=True, color="lightgrey", edgecolor="darkgrey")
plt.title("Uniform distribution for the interval [-2,0.8]", fontsize =16)
x = np.arange(-5, 5, 0.001)
plt.plot(x, stats.uniform.pdf(x, loc=a, scale=2.8), color="darkblue")
plt.ylim(-0.01, 0.4)
plt.xlim(-3, 3)
plt.xlabel("x", fontsize =14)
plt.ylabel("Density", fontsize =14)
plt.show()
The figure indicates that our the 10000 samples randomly drawn from a uniform distribution (histogram plot) approximate well the uniform probability distribution $X \sim U(-2, 0.8)$ (line plot).
Further, we can use the uniform.cdf()
function to calculate the area under the curve for a given threshold value or we can use the uniform.ppf()
function to return a threshold value for a particular probability.
unif_lower = -3
unif_upper = 5.5
Consider the uniform probability distribution given by $X \sim U(-3, 5.5)$.
Question 1
What is the mean, $\mu$, for the given uniform distribution?
unif_mean = (-3 + 5.5) / 2
unif_mean
1.25
The mean, $\mu$, for the uniform probability distribution given by $X \sim U(-3, 5.5)$ is 1.25.
Question 2
What is value of $x$ corresponding to the value that divides the given uniform distribution into two equal parts, or written more formally $P(X<?) = 0.5$?
px_05 = stats.uniform.ppf(0.5, loc=-3, scale=8.5)
px_05
1.25
Not a surprise at all. The value of $x$ that divides the uniform distribution into two equal parts is 1.25, and is thus equal to $\mu$.
Question 3
Given that the distribution from above describes a physical phenomenon. If we take a measurement of that physical process governing the phenomenon, what is the probability of measuring a value $>=4$, or written more formally $P(X>=4)$? Owing to the nature of a uniform distribution the measurement of any particular value within the interval $[-3, 5.5]$ is equally likely.
We will solve that question in two ways, numerically and analytically. First, in order to solve the question numerically we need to conduct an experiment. We will repeat our measurement for a large number of times, and then count the number of times we registered a value $>= 4$. Thanks to the power of Python and the integrated random number generator (uniform.rvs()
for uniformly distributed data) the repetition task is very easy, however, be aware that in real life applications quite often there is only a very limited number of measurements available.
measurements = stats.uniform.rvs(
size=10000, loc=-3, scale=8.5
) # take 10,000 measurements
above_threshold = sum(
measurements[measurements >= 4]
) # count the number of values >= 4
above_threshold / len(measurements) # calculate the proportion of values >= 4
0.8198950321226872
The results shows that approximately 88.5% of measurements yield values $>=4$.
Second, in order to solve the question analytically, we make use of the cumulative probability density function, which is implemented in R for uniform distributions by the uniform.cdf()
function. Make sure to set the 1 - uniform.cdf()
to calculate the result for the upper tail, as we are looking for the probability to measure values $>= 4$, thus we are interested in the area under the curve right to the value of $x = 4$.
result = 1 - stats.uniform.cdf(4, loc=-3, scale=8.5)
result
0.17647058823529416
The analytic approach yields a result of <IPython.core.display.Javascript object> or in other words, with a probability of <IPython.core.display.Javascript object>% we obtain values $>= 4$, thus $P(X>=4)\approx 0.18$.
Obviously both approaches yield very similar results. However, be aware that the result of the numerical approach is an approximation to the analytic result. Keep in mind that the quality of such an approximation is very sensitive on the number of random variables constituting the sample, in our case the number of measurements.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.