STL is an acronym for "Seasonal and Trend decomposition using Loess", while loess (locally weighted regression and scatterplot smoothing) is a method for estimating nonlinear relationships.
In Python the seasonal_decompose() function from the statsmodels.tsa.seasonal module, which performs decomposition of a time series into seasonal, trend and irregular components using Loess. The function provides the optional argument model, which can be either "additive" or "multiplicative" and specifies the type of seasonal component.
Let us apply the seasonal_decompose() function and plot the result with the convenient .plot() function.
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
df_co2 = pd.read_json(
"http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/KeelingCurve.json"
)
df_co2 = df_co2.set_index("Date") # set datetimeindex
from statsmodels.tsa.seasonal import seasonal_decompose
# Time Series Decomposition
stl_co2 = seasonal_decompose(
df_co2["interpolated"], model="additive", extrapolate_trend="freq"
)
print(stl_co2.trend)
print(stl_co2.seasonal)
print(stl_co2.resid)
print(stl_co2.observed)
Date
1958-03-01 315.017963
1958-04-01 315.078159
1958-05-01 315.138354
1958-06-01 315.198549
1958-07-01 315.258744
...
2022-05-01 418.189994
2022-06-01 418.356774
2022-07-01 418.523554
2022-08-01 418.690335
2022-09-01 418.857115
Name: trend, Length: 775, dtype: float64
Date
1958-03-01 1.426096
1958-04-01 2.578268
1958-05-01 3.023456
1958-06-01 2.324759
1958-07-01 0.661465
...
2022-05-01 3.023456
2022-06-01 2.324759
2022-07-01 0.661465
2022-08-01 -1.482930
2022-09-01 -3.167891
Name: seasonal, Length: 775, dtype: float64
Date
1958-03-01 -0.744059
1958-04-01 -0.206427
1958-05-01 -0.651809
1958-06-01 -0.283308
1958-07-01 -0.060210
...
2022-05-01 -0.223449
2022-06-01 0.308467
2022-07-01 -0.285020
2022-08-01 -0.017404
2022-09-01 0.260776
Name: resid, Length: 775, dtype: float64
Date
1958-03-01 315.70
1958-04-01 317.45
1958-05-01 317.51
1958-06-01 317.24
1958-07-01 315.86
...
2022-05-01 420.99
2022-06-01 420.99
2022-07-01 418.90
2022-08-01 417.19
2022-09-01 415.95
Name: interpolated, Length: 775, dtype: float64
These four time series can be plotted directly from the result object by calling the plot() function. For example:
## with seasonal_decompose().plot change the figure size after creating:
fig = stl_co2.plot()
fig.set_size_inches((18, 8)) ##
plt.show()
The plot shows four panels. In the uppermost we see the raw data, the Keeling curve. In the subsequent plot we see the trend component, the seasonal component and the remainder. We realize a very strong linear trend in the data set. Let us redo the analysis, however this time we focus on the 21st century.
df_co2_2000_2020 = df_co2["2000-01-01":"2021-12-31"]
# Time Series Decomposition
stl_co2_21 = seasonal_decompose(
df_co2_2000_2020["interpolated"], model="additive", extrapolate_trend="freq"
)
fig = stl_co2_21.plot()
fig.set_size_inches((18, 8)) ##
plt.show()
Nice plot! This time it is much easier to spot the seasonal oscillation in the CO2 concentration. The remainder seems quite noisy and is devoid of a particular pattern. This indicates that the seasonal decomposition did a good job in extracting the trend and seasonal components.
Exercise: Plot the data set for the 21st and overlay it with the linear trend extracted by STL.
## Your code here...
plt.figure(figsize=(18, 6))
plt.plot(stl_co2_21.observed, color="grey", linewidth=1)
plt.plot(stl_co2_21.trend, color="red", linewidth=1, linestyle="dashed")
plt.ylabel("$CO_2$ (ppm)")
plt.xlabel("Time")
plt.show()
A very typical use case is to apply STL for detrending a time series. We detrend our data set by subtracting the trend from the original data.
$$y_t^* = y_t- T_t$$detrended = df_co2_2000_2020["interpolated"] - stl_co2_21.trend
plt.figure(figsize=(18, 6))
detrended.plot()
plt.title("Seasonal variability of CO2 in the atmosphere")
plt.ylabel("CO2")
Text(0, 0.5, 'CO2')
Let us investigate the remainder and review if the residuals are normally distributed.
plt.figure(figsize=(18, 6))
plt.plot(stl_co2_21.resid)
plt.title("Residbual/Remainder")
plt.ylabel("CO2")
Text(0, 0.5, 'CO2')
Exercise: Create a QQ-plot using the
scipy.stats.probplot()function.
## Your code here...
import scipy.stats as stats
plt.figure(figsize=(18, 6))
plt.figure()
stats.probplot(stl_co2_21.resid, dist="norm", plot=plt)
plt.title("QQ-Plot")
plt.show()
<Figure size 1800x600 with 0 Axes>
# histogram plot
plt.figure(figsize=(9, 3))
plt.hist(stl_co2_21.resid)
plt.title("Residuals")
plt.show()
The residuals seem fairly well normally distributed. However, let us apply a Shapiro–Wilk test and check for normality.
Exercise: Apply a Shapiro–Wilk test and check the remainder for normality.
## Your code here...
# Shapiro-Wilk Test
from scipy.stats import shapiro
stat, p = shapiro(stl_co2_21.resid)
print("Statistics=%.3f, p=%.3f" % (stat, p))
# interpret
alpha = 0.05
if p > alpha:
print("Sample looks Gaussian (fail to reject H0)")
else:
print("Sample does not look Gaussian (reject H0)")
Statistics=0.987, p=0.017 Sample does not look Gaussian (reject H0)
The small p-value supports our believe, that the seasonal decomposition did a good job in extracting the trend and the seasonal components, as the residuals seems to be just noise.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.