307060_Smoothing

In many applications for time series analysis we are interested in the behavior of a time series, but without short-term fluctuations. Smoothing or sometimes referred to as low-pass filtering are methods to accomplish the task of softening the high frequency signal and amplifying the low frequency signal. Smoothing involves averaging data points relative to their neighbors in a series, such as a time series, resulting in the minimization of sharp edges. In time series analysis these methods are especially useful because by applying a low-pass filtering overall trends and/or deviations from the overall trend become more easily detectable.

For the purpose of demonstration we discuss some smoothing methods by applying them to the global mean monthly temperature anomalies provided by the Berkeley Earth Surface Temperature Study. Revisit the section on data sets used to remind yourself how we downloaded and extracted the data of interest.

First, we load to data set (t_global), examine its structure and then we plot it.

In [2]:

# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime

In [3]:

t_global = pd.read_json("../data/t_global.json")
t_global["Date"] = pd.to_datetime(t_global["Date"], format="%Y-%m", errors="coerce")
t_global

Out[3]:

	Date	Monthly Anomaly_global	Monthly Unc_global
0	1750-01-01	-0.927	3.253
1	1750-02-01	-1.645	3.612
2	1750-03-01	-0.150	3.448
3	1750-04-01	-0.427	1.731
4	1750-05-01	-1.750	1.331
...	...	...	...
3273	2022-10-01	1.581	0.059
3274	2022-11-01	0.731	0.074
3275	2022-12-01	0.971	0.073
3276	2023-01-01	1.209	0.097
3277	2023-02-01	1.360	0.059

3278 rows × 3 columns

Next, we plot the time series directly such that the date on the x-axis is perfectly formatted:

In [4]:

### plot ###

t_global.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", fontsize=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", fontsize=12)

Out[4]:

Text(0.5, 1.0, 'Global mean monthly temperature anomalies')

Temperatures are given in Celsius and are reported as anomalies relative to the period January 1951 to December 1980 average.

Exercise: Subset the time series to the period from 1850 until 2015 and plot it.

In [5]:

## Your code here...

In [6]:

t_global

Out[6]:

	Date	Monthly Anomaly_global	Monthly Unc_global
0	1750-01-01	-0.927	3.253
1	1750-02-01	-1.645	3.612
2	1750-03-01	-0.150	3.448
3	1750-04-01	-0.427	1.731
4	1750-05-01	-1.750	1.331
...	...	...	...
3273	2022-10-01	1.581	0.059
3274	2022-11-01	0.731	0.074
3275	2022-12-01	0.971	0.073
3276	2023-01-01	1.209	0.097
3277	2023-02-01	1.360	0.059

3278 rows × 3 columns

In [7]:

### period from 1850 to 2015 daily data ###
t_global_1850_2015 = t_global.set_index(["Date"])
t_global_1850_2015 = t_global_1850_2015["1850-01-01":"2015-12-31"][
    "Monthly Anomaly_global"
]

### plot ###
plt.figure(figsize=(18, 6))
t_global_1850_2015.plot(x="Date", y="Monthly Anomaly_global", legend=False)
plt.xlabel("Date", size=12)
plt.grid(color="lightgrey", linestyle="-", linewidth=0.5)
plt.title("Global mean monthly temperature anomalies", size=12)

Out[7]:

Text(0.5, 1.0, 'Global mean monthly temperature anomalies')

We store the data set in a .json file for further processing.

In [8]:

# convert series to dataframe
t_global_1850_2015 = t_global_1850_2015.to_frame(name="temp").reset_index()

In [9]:

# store as json
t_global_1850_2015.to_json("../data/t_global_1850_2015.json", date_format="iso")

In [10]:

# if you want to use the file:
# read the respective json file
t_global_1850_2015 = pd.read_json("../data/t_global_1850_2015.json")
t_global_1850_2015

Out[10]:

	Date	temp
0	1850-01-01	-1.859
1	1850-02-01	-0.104
2	1850-03-01	-0.478
3	1850-04-01	-1.079
4	1850-05-01	-1.311
...	...	...
1987	2015-08-01	0.948
1988	2015-09-01	0.952
1989	2015-10-01	1.586
1990	2015-11-01	1.373
1991	2015-12-01	1.861

1992 rows × 2 columns

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.