In many applications for time series analysis we are interested in the behavior of a time series, but without short-term fluctuations. Smoothing or sometimes referred to as low-pass filtering are methods to accomplish the task of softening the high frequency signal and amplifying the low frequency signal. Smoothing involves averaging data points relative to their neighbors in a series, such as a time series, resulting in the minimization of sharp edges. In time series analysis these methods are especially useful because by applying a low-pass filtering overall trends and/or deviations from the overall trend become more easily detectable.

For the purpose of demonstration we discuss some smoothing methods by applying them to the global mean monthly temperature anomalies provided by the Berkeley Earth Surface Temperature Study. Revisit the section on data sets used to remind yourself how we downloaded and extracted the data of interest.

First, we load to data set (t_global), examine its structure and then we plot it.

In [2]:
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
In [3]:
t_global = pd.read_json("../data/t_global.json")
t_global["Date"] = pd.to_datetime(t_global["Date"], format="%Y-%m", errors="coerce")
t_global
Out[3]:
Date Monthly Anomaly_global Monthly Unc_global
0 1750-01-01 -0.927 3.253
1 1750-02-01 -1.645 3.612
2 1750-03-01 -0.150 3.448
3 1750-04-01 -0.427 1.731
4 1750-05-01 -1.750 1.331
... ... ... ...
3273 2022-10-01 1.581 0.059
3274 2022-11-01 0.731 0.074
3275 2022-12-01 0.971 0.073
3276 2023-01-01 1.209 0.097
3277 2023-02-01 1.360 0.059

3278 rows × 3 columns

Next, we plot the time series directly such that the date on the x-axis is perfectly formatted:

In [4]:
### plot ###

t_global.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", fontsize=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", fontsize=12)
Out[4]:
Text(0.5, 1.0, 'Global mean monthly temperature anomalies')

Temperatures are given in Celsius and are reported as anomalies relative to the period January 1951 to December 1980 average.

Exercise: Subset the time series to the period from 1850 until 2015 and plot it.

In [5]:
## Your code here...
In [6]:
t_global
Out[6]:
Date Monthly Anomaly_global Monthly Unc_global
0 1750-01-01 -0.927 3.253
1 1750-02-01 -1.645 3.612
2 1750-03-01 -0.150 3.448
3 1750-04-01 -0.427 1.731
4 1750-05-01 -1.750 1.331
... ... ... ...
3273 2022-10-01 1.581 0.059
3274 2022-11-01 0.731 0.074
3275 2022-12-01 0.971 0.073
3276 2023-01-01 1.209 0.097
3277 2023-02-01 1.360 0.059

3278 rows × 3 columns

In [7]:
### period from 1850 to 2015 daily data ###
t_global_1850_2015 = t_global.set_index(["Date"])
t_global_1850_2015 = t_global_1850_2015["1850-01-01":"2015-12-31"][
    "Monthly Anomaly_global"
]

### plot ###
plt.figure(figsize=(18, 6))
t_global_1850_2015.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", size=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", size=12)
Out[7]:
Text(0.5, 1.0, 'Global mean monthly temperature anomalies')

We store the data set in a .json file for further processing.

In [8]:
# convert series to dataframe
t_global_1850_2015 = t_global_1850_2015.to_frame(name="temp").reset_index()
In [9]:
# store as json
t_global_1850_2015.to_json("../data/t_global_1850_2015.json", date_format="iso")
In [10]:
# if you want to use the file:
# read the respective json file
t_global_1850_2015 = pd.read_json("../data/t_global_1850_2015.json")
t_global_1850_2015
Out[10]:
Date temp
0 1850-01-01 -1.859
1 1850-02-01 -0.104
2 1850-03-01 -0.478
3 1850-04-01 -1.079
4 1850-05-01 -1.311
... ... ...
1987 2015-08-01 0.948
1988 2015-09-01 0.952
1989 2015-10-01 1.586
1990 2015-11-01 1.373
1991 2015-12-01 1.861

1992 rows × 2 columns


Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.