Lowess (locally weighted scatterplot smoothing) is a non-parametric method of smoothing, closely related to nearest neighbor regression. The method uses nearby (in time) points to obtain the smoothed estimate of $f_t$.

First, a certain proportion of nearest neighbors to $x_t$ are weighting, where values closer to $x_t$ in time get more weight. Then, a robust weighted regression is used to predict $x_t$ and obtain the smoothed estimate of $f_t$. The degree of smoothing is controlled by the size of the neighborhood. The larger the fraction of nearest neighbors included, the smoother the estimate.

In [2]:
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
import statsmodels.api as sm  # module to build a LOWESS model

In Python Lowess is implemented in the lowess() function. The argument frac gives the proportion of data points which influence the smoothing at each value. Larger values give more smoothness. The default value is frac = 2/3.

See the documentation of the function here.

In [3]:
# read the Earth_Surface_Temperature data, processed in the previous section
# read the data from .json file
t_global = pd.read_json(
    "http://userpage.fu-berlin.de/soga/soga-py/300/307000_time_series/t_global.json"
)
t_global["Date"] = pd.to_datetime(t_global["Date"], format="%Y-%m-%d", errors="coerce")

# subset
temp_global = t_global.set_index("Date")["1850-01-01":"2021-12-31"][
    "Monthly Anomaly_global"
]
In [4]:
lowess = sm.nonparametric.lowess

## note, default frac=2/3
temp_lowess = lowess(endog=temp_global.values, exog=temp_global.index)
## convert to pd.Series with datatimeindex for easy plotting
temp_lowess = pd.Series(temp_lowess[:, 1], index=temp_global.index)


## frac = 0.1
temp_lowess_1 = lowess(endog=temp_global.values, exog=temp_global.index, frac=0.1)
temp_lowess_1 = pd.Series(temp_lowess_1[:, 1], index=temp_global.index)

## frac = 0.01
temp_lowess_2 = lowess(endog=temp_global.values, exog=temp_global.index, frac=0.01)
temp_lowess_2 = pd.Series(temp_lowess_2[:, 1], index=temp_global.index)
In [5]:
# Plot daily and weekly resampled time series together
fig, ax = plt.subplots(figsize=(18, 6))

ax.plot(
    temp_global,
    marker=".",
    linestyle="-",
    linewidth=0.5,
    label="Daily Data",
    color="lightgrey",
)

ax.plot(
    temp_lowess,
    marker="o",
    markersize=2,
    linestyle="-",
    label="frac = 2/3 (default)",
)


ax.plot(
    temp_lowess_1,
    marker="o",
    markersize=2,
    linestyle="-",
    label="frac = 0.1",
)

ax.plot(
    temp_lowess_2,
    marker="o",
    markersize=2,
    linestyle="-",
    label="frac = 0.01",
)


ax.set_title("Locally weighted scatterplot smoothing (LOWESS)")
ax.legend()

plt.show()

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.