STL is an acronym for "Seasonal and Trend decomposition using Loess", while loess (locally weighted regression and scatterplot smoothing) is a method for estimating nonlinear relationships.

In Python the seasonal_decompose() function from the statsmodels.tsa.seasonal module, which performs decomposition of a time series into seasonal, trend and irregular components using Loess. The function provides the optional argument model, which can be either "additive" or "multiplicative" and specifies the type of seasonal component.

Let us apply the seasonal_decompose() function and plot the result with the convenient .plot() function.

In [2]:
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
In [3]:
df_co2 = pd.read_json(
    "http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/KeelingCurve.json"
)
df_co2 = df_co2.set_index("Date")  # set datetimeindex
In [4]:
from statsmodels.tsa.seasonal import seasonal_decompose

# Time Series Decomposition
stl_co2 = seasonal_decompose(
    df_co2["interpolated"], model="additive", extrapolate_trend="freq"
)
In [5]:
print(stl_co2.trend)
print(stl_co2.seasonal)
print(stl_co2.resid)
print(stl_co2.observed)
Date
1958-03-01    315.017963
1958-04-01    315.078159
1958-05-01    315.138354
1958-06-01    315.198549
1958-07-01    315.258744
                 ...    
2022-05-01    418.189994
2022-06-01    418.356774
2022-07-01    418.523554
2022-08-01    418.690335
2022-09-01    418.857115
Name: trend, Length: 775, dtype: float64
Date
1958-03-01    1.426096
1958-04-01    2.578268
1958-05-01    3.023456
1958-06-01    2.324759
1958-07-01    0.661465
                ...   
2022-05-01    3.023456
2022-06-01    2.324759
2022-07-01    0.661465
2022-08-01   -1.482930
2022-09-01   -3.167891
Name: seasonal, Length: 775, dtype: float64
Date
1958-03-01   -0.744059
1958-04-01   -0.206427
1958-05-01   -0.651809
1958-06-01   -0.283308
1958-07-01   -0.060210
                ...   
2022-05-01   -0.223449
2022-06-01    0.308467
2022-07-01   -0.285020
2022-08-01   -0.017404
2022-09-01    0.260776
Name: resid, Length: 775, dtype: float64
Date
1958-03-01    315.70
1958-04-01    317.45
1958-05-01    317.51
1958-06-01    317.24
1958-07-01    315.86
               ...  
2022-05-01    420.99
2022-06-01    420.99
2022-07-01    418.90
2022-08-01    417.19
2022-09-01    415.95
Name: interpolated, Length: 775, dtype: float64

These four time series can be plotted directly from the result object by calling the plot() function. For example:

In [6]:
## with seasonal_decompose().plot change the figure size after creating:

fig = stl_co2.plot()
fig.set_size_inches((18, 8))  ##
plt.show()

The plot shows four panels. In the uppermost we see the raw data, the Keeling curve. In the subsequent plot we see the trend component, the seasonal component and the remainder. We realize a very strong linear trend in the data set. Let us redo the analysis, however this time we focus on the 21st century.

In [7]:
df_co2_2000_2020 = df_co2["2000-01-01":"2021-12-31"]
In [8]:
# Time Series Decomposition
stl_co2_21 = seasonal_decompose(
    df_co2_2000_2020["interpolated"], model="additive", extrapolate_trend="freq"
)
In [9]:
fig = stl_co2_21.plot()

fig.set_size_inches((18, 8))  ##
plt.show()

Nice plot! This time it is much easier to spot the seasonal oscillation in the CO2 concentration. The remainder seems quite noisy and is devoid of a particular pattern. This indicates that the seasonal decomposition did a good job in extracting the trend and seasonal components.


Exercise: Plot the data set for the 21st and overlay it with the linear trend extracted by STL.

In [10]:
## Your code here...
In [11]:
plt.figure(figsize=(18, 6))


plt.plot(stl_co2_21.observed, color="grey", linewidth=1)
plt.plot(stl_co2_21.trend, color="red", linewidth=1, linestyle="dashed")

plt.ylabel("$CO_2$ (ppm)")
plt.xlabel("Time")

plt.show()

A very typical use case is to apply STL for detrending a time series. We detrend our data set by subtracting the trend from the original data.

$$y_t^* = y_t- T_t$$
In [12]:
detrended = df_co2_2000_2020["interpolated"] - stl_co2_21.trend
In [13]:
plt.figure(figsize=(18, 6))

detrended.plot()
plt.title("Seasonal variability of CO2 in the atmosphere")
plt.ylabel("CO2")
Out[13]:
Text(0, 0.5, 'CO2')