If the random variables which make up $y_t$ are uncorrelated, have means 0 and variance $\sigma^2$, then $y_t$ is stationary with an autocovariance function of
$$ \gamma_w(s,t) = \text{cov}(w_s, w_t)= \begin{cases} \sigma^2_w & s=t \\ 0 & \text{otherwise} \end{cases} $$This type of series is referred to as white noise. The designation white originates from the analogy with white light and indicates that all possible periodic oscillations are present with equal strength (Shumway and Stoffer 2011).
A particularly useful white noise series is Gaussian white noise, wherein the $w_t$ are independent and identically distributed (iid) normal random variables, with mean 0 and variance $\sigma^2$.
$$w_t \sim \text{iid } N(0, \sigma^2_w) $$We may easily generate a Gaussian white noise in Python:
import matplotlib.pyplot as plt
import pandas as pd
import random
from pandas import Series
from pandas.plotting import autocorrelation_plot
# seed random number generator
random.seed(1)
# create white noise series
series = [random.gauss(0.0, 1.0) for i in range(100)]
series = Series(series)
# summary stats
print(series.describe())
count 100.000000 mean -0.050576 std 0.944013 min -2.835791 25% -0.609139 50% 0.010431 75% 0.556549 max 2.389112 dtype: float64
# plot
plt.figure(figsize=(18, 6))
fig, axs = plt.subplots(2, 1, figsize=(10, 8))
series.plot(ax=axs[0])
axs[0].set_title("Gaussian white noise series")
axs[0].set_xlabel("Time")
axs[0].set_ylabel("w")
# histogram plot
series.hist(ax=axs[1])
axs[1].set_title("Gaussian white histogram")
plt.tight_layout()
plt.show()
<Figure size 1800x600 with 0 Axes>
Let us plot the correlogram by using the autocorrelation_plot()
function. Finally, we check for any autocorrelation with lag variables.
# autocorrelation
plt.figure(figsize=(18, 6))
autocorrelation_plot(series)
plt.title("Serial Correlation of Gaussian white noise")
plt.show()
The correlogram does not show any obvious autocorrelation pattern.
There are some spikes above the 95% and 99% confidence level, but these are a statistical fluke.
Summary:
Autocorrelation is simply the correlation of a series with its own lags. If a series is significantly autocorrelated, that means, the previous values of the series (lags) may be helpful in predicting the current value.
Partial Autocorrelation also conveys similar information but it conveys the pure correlation of a series and its lag, excluding the correlation contributions from the intermediate lags.
A lag plot is a scatter plot of a time series against a lag of itself. It is normally used to check for autocorrelation. If there is any pattern existing in the series like the one you see below, the series is autocorrelated. If there is no such pattern, the series is likely to be random white noise. The pandas.plotting
provides the lag_plot
function for easy plotting.
from pandas.plotting import lag_plot
plt.rcParams.update({"ytick.left": False, "axes.titlepad": 10})
# Import example time series, Antartica ice core data
ice_core = pd.read_json(
"http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/Antartica_Ice_Core.json"
)
ice_core = (
pd.read_json(
"http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/Antartica_Ice_Core.json"
)
.set_index("ky_BP_AICC2012")
.squeeze()
)
ice_core
# Plot
fig, axes = plt.subplots(1, 4, figsize=(18, 6), sharex=True, sharey=True, dpi=100)
for i, ax in enumerate(axes.flatten()[:4]):
lag_plot(ice_core, lag=i + 1, ax=ax, c="firebrick")
ax.set_title("Lag " + str(i + 1))
fig.suptitle("Lag Plots of $CO_2$ (ppm) in the Antartica ice core ", y=1.05)
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.