In the following sections we introduce basic operations for time series analysis. We disuses the following topics:

  • Subsetting and indexing
  • Summary statistics
  • Aggregation of time series data

As mentioned in the previous section there exist few different ways to work with time series in Python. Hence, it is very important to be aware of the object class and respectively, the data representation. This representation dictates which functions will be available for loading, processing, analyzing, printing, and plotting the time series data.


Loading the sample data¶

For the purpose of demonstration we load the monthly (ts_FUB_monthly), daily (ts_FUB_daily) and hourly (ts_FUB_hourly) time series data for the weather station Berlin-Dahlem (FU) into Python. We can do that by using the pandas.read_json() function. Check out the previous section on data sets used to remind yourself how we processed the data.

In [2]:
# First, let's import the needed libraries.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime
In [3]:
ts_FUB_monthly = pd.read_json("../data/ts_FUB_monthly.json")
ts_FUB_monthly["Date"] = pd.to_datetime(
    ts_FUB_monthly["Date"], format="%Y-%m-%d", errors="coerce"
)

ts_FUB_daily = pd.read_json("../data/ts_FUB_daily.json")
ts_FUB_daily["MESS_DATUM"] = pd.to_datetime(
    ts_FUB_daily["MESS_DATUM"], format="%Y-%m-%d", errors="coerce"
)

ts_FUB_hourly = pd.read_json("../data/ts_FUB_hourly.json")
ts_FUB_hourly["MESS_DATUM"] = pd.to_datetime(
    ts_FUB_hourly["MESS_DATUM"], format="%Y-%m-%d", errors="coerce"
)

First, we check to object classes for the three data sets:

In [4]:
print(type(ts_FUB_monthly))
print(str(ts_FUB_monthly))
<class 'pandas.core.frame.DataFrame'>
           Date  rainfall
0    1719-01-01      2.80
1    1719-02-01      1.10
2    1719-03-01      5.20
3    1719-04-01      9.00
4    1719-05-01     15.10
...         ...       ...
3631 2021-08-01     17.43
3632 2021-09-01     15.55
3633 2021-10-01     10.49
3634 2021-11-01      6.28
3635 2021-12-01      2.19

[3636 rows x 2 columns]
In [5]:
print(type(ts_FUB_daily))
print(str(ts_FUB_daily))
<class 'pandas.core.frame.DataFrame'>
      MESS_DATUM  Temp  Rain
0     1950-01-01  -3.2   2.2
1     1950-01-02   1.0  12.6
2     1950-01-03   2.8   0.5
3     1950-01-04  -0.1   0.5
4     1950-01-05  -2.8  10.3
...          ...   ...   ...
26293 2021-12-27  -3.7   0.0
26294 2021-12-28  -0.5   1.5
26295 2021-12-29   4.0   0.3
26296 2021-12-30   9.0   3.2
26297 2021-12-31  12.8   5.5

[26298 rows x 3 columns]
In [6]:
print(type(ts_FUB_hourly))
print(str(ts_FUB_hourly))
<class 'pandas.core.frame.DataFrame'>
                MESS_DATUM  rainfall
0      2002-01-28 11:00:00       0.0
1      2002-01-28 13:00:00       0.0
2      2002-01-28 15:00:00       1.7
3      2002-01-28 18:00:00       1.1
4      2002-01-28 21:00:00       0.0
...                    ...       ...
174018 2021-12-31 19:00:00       0.7
174019 2021-12-31 20:00:00       0.7
174020 2021-12-31 21:00:00       0.1
174021 2021-12-31 22:00:00       0.1
174022 2021-12-31 23:00:00       0.0

[174023 rows x 2 columns]

The data sets are of class pandas.Series.


Plotting¶

Now let us plot the monthly data with the plot() function.

In [7]:
plt.figure(figsize=(18, 6))
plt.plot(ts_FUB_monthly.Date, ts_FUB_monthly.rainfall)
plt.show()

Exercise: Plot the daily and hourly data sets using the plot() function

In [8]:
## Your code here...
In [9]:
fig, ax = plt.subplots(2, 1, figsize=(18, 8))

ax[0].plot(ts_FUB_daily["Temp"])
ax[0].set_title("Temp")

ax[1].plot(ts_FUB_daily["Rain"], color="orange")
ax[1].set_title("Rain")
plt.show()
In [10]:
## Your code here...
In [11]:
plt.figure(figsize=(18, 6))
plt.plot(ts_FUB_hourly.MESS_DATUM, ts_FUB_hourly.rainfall)
plt.show()