The moving average process of order $q$ is denoted MA$(q)$ and defined by
$$y_t = w_t+\theta_1w_{t-1}+...+\theta_qw_{t-q},$$which can be rewritten as
$$y_t = \sum_{j=0}^q\theta_jw_{t-j},$$where $\theta_0 =1$, $\theta_1, \theta_2, ..., \theta_q$ are fixed constants, and $w_t$ denotes a random variable with mean $0$ and variance $\sigma^2$. In the MA model the current value of the series is a weighted sum of past white noise terms. Think of the white noise series as being stochastically uncorrelated information which appears at each time step, and which is combined with other information to provide the observation at $y_t$. Hence, moving average models are useful for modeling series that are affected by a variety of unpredictable events where the effect of these events have an immediate effect as well as possible long-term effects.
The moving average process is a stationary process with serial correlation zero at lags greater than $q$.
$$ \gamma(k) = \begin{cases} 0, & \vert k\vert > q \\ \sigma^2\sum_{j=0}^{q-\vert k\vert}\theta_j\theta_{j+k} & \vert k\vert \le q \end{cases} $$We already got to know Python's DataFrame.rolling()
function to calculate moving average series.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import random
# seed random number generator
random.seed(250)
# create white noise series
mean = 0
std = 1
n = 200
w = np.random.normal(mean, std, size=n)
## moving average
w_series = pd.Series(w) # Convert array of integers to pandas series
window_size = 3
windows = w_series.rolling(window_size)
moving_averages = windows.mean() # Create a series of moving averages of each window
## plot
plt.figure(figsize=(18, 6))
plt.plot(moving_averages)
plt.title("Moving Average Series", fontsize=14)
plt.xlabel("Time", fontsize=14)
plt.show()
Another, very convenient way to simulate moving average processes in Python is to use the class statsmodels.tsa.arima.model.ARIMA
.
Where ARIMA stands for:
We start by using statsmodels.tsa.arima_process.ArmaProcess
, which can handle AR and MA models.
We start with generating two MA(1) series (MA model of order $q=1$):
$$y_t = w_t+\theta_1w_{t-1}\text{,}$$where we consider once $\theta_1 = 0.6$ and the other time $\theta_1 = -0.6$.
We store the weights in ma1
. Finally, we simulate the process by generating a 100 data points:
from statsmodels.tsa.arima_process import ArmaProcess
# MA parameter: 0.6
ar1 = np.array([1]) # we must include the zero-lag coefficient of 1
ma1 = np.array([1, +0.6])
MA_object1 = ArmaProcess(ar1, ma1)
simulated_ma1_1 = MA_object1.generate_sample(nsample=100)
# MA parameter: -0.6
ar2 = np.array([1])
ma2 = np.array([1, -0.6])
MA_object2 = ArmaProcess(ar2, ma2)
simulated_ma1_2 = MA_object2.generate_sample(nsample=100)
We plot the two series:
fig, ax = plt.subplots(2, 1, figsize=(18, 6))
ax[0].plot(simulated_ma1_1)
ax[0].set_title("MA(1), $\\theta$ = 0.6")
ax[1].plot(simulated_ma1_2)
ax[1].set_title("MA(1), $\\theta$ = -0.6")
ax[0].grid()
ax[1].grid()
plt.show()
For $\theta = 0.6$, $y_t$ and $y_{t-1}$ are positively correlated, and for $\theta = -0.6$, $y_t$ and $y_{t-1}$ are negatively correlated. This correlational structure can be shown by plotting the correlogram.
from statsmodels.graphics.tsaplots import plot_acf
fig, ax = plt.subplots(2, 1, figsize=(18, 6))
plot_acf(simulated_ma1_1, ax=ax[0])
ax[0].set_title("Serial Correlation MA(1), $\\theta$ = 0.6")
plot_acf(simulated_ma1_2, ax=ax[1])
ax[1].set_title("Serial Correlation MA(1), $\\theta$ = -0.6")
plt.show()
A typical feature of the autocorrelation function of a MA process is that it drops to nearly zero beyond $q$.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.