A symmetric moving average $(m_t)$ for the observation $x_t$ is given by
$$m_t = \sum_{j=-k}^k a_jx_{t-j}\text{,}$$where $a_j = a_{-j}$ and $\sum_{j=-k}^ka_j = 1$.
Consider a monthly time series. A 3-month low pass filter can be written in mathematical notation as
$$m_t = \frac{1}{3}(x_{t-1}+x_t+x_{t+1})$$# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
The simple moving average is the unweighted mean of the previous data points (3 in the example above). The selection of M (moving window) depends on the degree of smoothing desired. Increasing the value of M improves the smoothing, but at the same time decreases the accuracy.
A symmetric 3-month low pass filter works according to the following principle:
In contrast, a symmetric one-year filter is specified by:
In Python calculating the moving average is fairly easy accomplished by using the pandas.Series.rolling()
function. On the resulting windows, we can perform calculations using a statistical function. The size of the window is specified by the argument window
.
Read more about the functionality of the method here.
Let us build three low pass filters:
# read the Earth Surface Temperature data t_global.json, processed in the previous section
# read the data from .json file
t_global = pd.read_json(
"http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/t_global.json"
)
t_global["Date"] = pd.to_datetime(t_global["Date"], format="%Y-%m-%d", errors="coerce")
# subset
temp_global = t_global.set_index("Date")["1850-01-01":"2021-12-31"][
"Monthly Anomaly_global"
]
temp_global
Date 1850-01-01 -1.886 1850-02-01 -0.149 1850-03-01 -0.536 1850-04-01 -1.130 1850-05-01 -1.358 ... 2021-08-01 1.198 2021-09-01 1.275 2021-10-01 1.577 2021-11-01 1.307 2021-12-01 1.351 Name: Monthly Anomaly_global, Length: 2064, dtype: float64
Rolling window operations are an important transformation for time series data. Rolling windows overlap and "roll" along at the same frequency as the data, so the transformed time series is at the same frequency as the original time series. The window
size is the same as the frequency in the data. E.g. here we are dealing with monthly aggregated data, this means a one-year rolling window has the window size 12.
By default, all data points within a window are equally weighted in the aggregation, but this can be changed by specifying window types such as Gaussian, triangular, and others. We will stick to the standard equally weighted window here.
We use the center=True
argument to label each window at its midpoint, so the rolling windows are:
# Compute the centered 1 year rolling mean
temp_global_f1 = temp_global.rolling(min_periods=1, window=12, center=True).mean()
# Compute the centered 5 year rolling mean
temp_global_f5 = temp_global.rolling(min_periods=1, window=12 * 5, center=True).mean()
# Compute the centered 10 year rolling mean
temp_global_f10 = temp_global.rolling(min_periods=1, window=12 * 10, center=True).mean()
## plotting
fig, ax = plt.subplots(figsize=(18, 6))
ax.plot(
temp_global,
linestyle="-",
linewidth=0.3,
label="Monthly Data",
color="grey",
)
ax.plot(
temp_global_f1,
marker="o",
markersize=1,
linestyle="-",
label="one-year filter",
)
ax.plot(
temp_global_f5, marker=".", markersize=1, linestyle="-", label="five-year filter"
)
ax.plot(
temp_global_f10, marker=".", markersize=1, linestyle="-", label="five-year filter"
)
ax.legend()
<matplotlib.legend.Legend at 0x1aaaeb4ab00>
The rolling()
function is quite flexible, hence it is straightforward to define a custom filter.
Exercise: Design a custom symmetric seven-year filter for a monthly time series that gives full weight to the measurement year, three-quarters weight to adjacent year, half weight to years two removed, and quarter weight to years three removed.
Unfortunately, there is no built-in function if we want to use custom weights on the data. But we can apply a weighted moving average with a corresponding rolling window (make use of the apply()
function).
## Your code here...
# construct the pattern
weights = np.array([0.25, 0.5, 0.75, 1, 0.75, 0.5, 0.25])
Let us apply our custom symmetric seven-year filter and plot the result.
temp_global_weigths = temp_global.rolling(7).apply(
lambda x: np.average(x, weights=weights)
)
## plotting
fig, ax = plt.subplots(figsize=(18, 6))
ax.plot(
temp_global,
linestyle="-",
linewidth=0.3,
label="Monthly Data",
color="grey",
)
ax.plot(
temp_global_weigths,
marker="o",
markersize=1,
linestyle="-",
label="custom seven-year filter",
)
ax.legend()
<matplotlib.legend.Legend at 0x1aaaec5f430>
With the functionality of the rolling()
we may for example analyse the time series of the earth surface temperature anomalies with respect to the question if the variability in the monthly mean temperature changes within a window of 30 years.
Therefore, we first define our reference period. Let us say our reference period is the 30-year period from 1986 to 2021.
# reference period
ref_period = t_global.set_index("Date")["1986-01-01":"2021-12-31"][
"Monthly Anomaly_global"
]
# reference period standard deviation
ref_period_sd = ref_period.std()
ref_period_sd
0.44274650376216834
In the next step we apply the rolling()
function in combination with the mean()
function.
The rolling()
function has no additional argument for applying a non-overlapping sliding window. We got a few options for adding an non-overlapping window.
For one, we could compute the usual rolling()
function and then skip every $n^{th}$ row (here: n = 30).
w = 30 * 12
## not overlapping rolling window:
std_30years = temp_global.rolling(min_periods=0, window=w).apply(lambda x: np.std(x))[
w - 1 :: w
]
std_30years
Date 1879-12-01 0.516120 1909-12-01 0.378498 1939-12-01 0.348118 1969-12-01 0.308923 1999-12-01 0.393496 Name: Monthly Anomaly_global, dtype: float64
fig, ax = plt.subplots(figsize=(18, 6))
std_30years.plot(kind="bar", ax=ax, color="lightgrey", edgecolor="grey")
# generate more informative labeling
ax.set_xticklabels(
["1850-1879", "1880-1909", "1910-1939", "1940-1969", "1970-1999"],
rotation=0,
)
plt.axhline(
y=ref_period_sd,
linewidth=1,
linestyle="dashed",
color="r",
label="stdev reference period (1986-2015)",
)
plt.title(
"Standard deviation of mean monthly temperature anomalies for \na 30-year non-overlapping sliding window"
)
plt.legend()
plt.show()