In many applications for time series analysis we are interested in the behavior of a time series, but without short-term fluctuations. Smoothing or sometimes referred to as low-pass filtering are methods to accomplish the task of softening the high frequency signal and amplifying the low frequency signal. Smoothing involves averaging data points relative to their neighbors in a series, such as a time series, resulting in the minimization of sharp edges. In time series analysis these methods are especially useful because by applying a low-pass filtering overall trends and/or deviations from the overall trend become more easily detectable.
For the purpose of demonstration we discuss some smoothing methods by applying them to the global mean monthly temperature anomalies provided by the Berkeley Earth Surface Temperature Study. Revisit the section on data sets used to remind yourself how we downloaded and extracted the data of interest.
First, we load to data set (t_global
), examine its structure and then we plot it.
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
t_global = pd.read_json("../data/t_global.json")
t_global["Date"] = pd.to_datetime(t_global["Date"], format="%Y-%m", errors="coerce")
t_global
Date | Monthly Anomaly_global | Monthly Unc_global | |
---|---|---|---|
0 | 1750-01-01 | -0.927 | 3.253 |
1 | 1750-02-01 | -1.645 | 3.612 |
2 | 1750-03-01 | -0.150 | 3.448 |
3 | 1750-04-01 | -0.427 | 1.731 |
4 | 1750-05-01 | -1.750 | 1.331 |
... | ... | ... | ... |
3273 | 2022-10-01 | 1.581 | 0.059 |
3274 | 2022-11-01 | 0.731 | 0.074 |
3275 | 2022-12-01 | 0.971 | 0.073 |
3276 | 2023-01-01 | 1.209 | 0.097 |
3277 | 2023-02-01 | 1.360 | 0.059 |
3278 rows × 3 columns
Next, we plot the time series directly such that the date on the x-axis is perfectly formatted:
### plot ###
t_global.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", fontsize=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", fontsize=12)
Text(0.5, 1.0, 'Global mean monthly temperature anomalies')
Temperatures are given in Celsius and are reported as anomalies relative to the period January 1951 to December 1980 average.
Exercise: Subset the time series to the period from 1850 until 2015 and plot it.
## Your code here...
t_global
Date | Monthly Anomaly_global | Monthly Unc_global | |
---|---|---|---|
0 | 1750-01-01 | -0.927 | 3.253 |
1 | 1750-02-01 | -1.645 | 3.612 |
2 | 1750-03-01 | -0.150 | 3.448 |
3 | 1750-04-01 | -0.427 | 1.731 |
4 | 1750-05-01 | -1.750 | 1.331 |
... | ... | ... | ... |
3273 | 2022-10-01 | 1.581 | 0.059 |
3274 | 2022-11-01 | 0.731 | 0.074 |
3275 | 2022-12-01 | 0.971 | 0.073 |
3276 | 2023-01-01 | 1.209 | 0.097 |
3277 | 2023-02-01 | 1.360 | 0.059 |
3278 rows × 3 columns
### period from 1850 to 2015 daily data ###
t_global_1850_2015 = t_global.set_index(["Date"])
t_global_1850_2015 = t_global_1850_2015["1850-01-01":"2015-12-31"][
"Monthly Anomaly_global"
]
### plot ###
plt.figure(figsize=(18, 6))
t_global_1850_2015.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", size=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", size=12)
Text(0.5, 1.0, 'Global mean monthly temperature anomalies')
We store the data set in a .json
file for further processing.
# convert series to dataframe
t_global_1850_2015 = t_global_1850_2015.to_frame(name="temp").reset_index()
# store as json
t_global_1850_2015.to_json("../data/t_global_1850_2015.json", date_format="iso")
# if you want to use the file:
# read the respective json file
t_global_1850_2015 = pd.read_json("../data/t_global_1850_2015.json")
t_global_1850_2015
Date | temp | |
---|---|---|
0 | 1850-01-01 | -1.859 |
1 | 1850-02-01 | -0.104 |
2 | 1850-03-01 | -0.478 |
3 | 1850-04-01 | -1.079 |
4 | 1850-05-01 | -1.311 |
... | ... | ... |
1987 | 2015-08-01 | 0.948 |
1988 | 2015-09-01 | 0.952 |
1989 | 2015-10-01 | 1.586 |
1990 | 2015-11-01 | 1.373 |
1991 | 2015-12-01 | 1.861 |
1992 rows × 2 columns
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.