Sometimes it is necessary to reshape the time series to coarser intervals than given. Another very useful bundle of functions for the aggregation of time series data is provided by the xts package. These functions are:

This family of functions applies a function to each distinct period in a given time series object. Thus, we may easily calculate a time series of monthly or annual rainfall based on daily data:

library(xts)
library(lubridate)
## load the data
load(url("https://userpage.fu-berlin.de/soga/300/30100_data_sets/DWD_FUB.RData"))
## subset the data
daily.2000.2009 <- ts.FUB.daily['2000/2009', ]

### monthly time series ###
mothly.rain <- apply.monthly(daily.2000.2009[, 'Rain'], sum)

barplot(mothly.rain, 
        main = "Time series of monthly rainfall \nfor Berlin-Dahlem for the period 2000-2009", 
        col = 'blue',
        cex.main = 0.8, # font size title
        ylab = 'Rainfall',
        cex.names = 0.7) # font size x-labels

### annual time series ###
annual.rain <- apply.yearly(daily.2000.2009[, 'Rain'], sum)

barplot(annual.rain, 
        main = "Time series of annual rainfall \nfor the Berlin-Dahlem for the period 2000-2009", 
        col = 'blue',
        cex.main = 0.8, 
        ylab = 'Rainfall',
        cex.names = 0.7, 
        las = 2) # rotate label x-axis


Split-Apply-Combine

So far we learned about summarizing and aggregating time series data. Now we combine these approaches in order to get a more sophisticated representation of our data. Therefore we incorporate basic ideas of the so called split-apply-combine strategy for data wrangling.

We elaborate this analytic strategy in two exercises using hourly rainfall data for the station Berlin-Dahlem.

library(ggfortify)
autoplot(ts.FUB.hourly) + 
  labs(y = 'Rainfall') +
  ggtitle('Hourly rainfall data for the station Berlin-Dahlem') +
  theme_bw()

1. Exercise

Our goal is calculate the mean monthly rainfall at the weather station Berlin-Dahlem for the period 2002 to 2016.

Our analytic strategy can be summarized as

For sure we could achieve that task by writing for loops, but we will probably spend a lot of time on bookkeeping tasks. In base R there exists the family of apply functions, such as apply(), aggregate(), tapply(), and by(), among others. These are, no doubt, excellent tools, however, thanks to Hadley Wickham the author of “The split-apply-combine strategy for data analysis”, and software developer of major data analysis tools for R, we may use a collection of R packages, sometimes referred to as tidyverse packages. Out of those we pick the dplyr package, a package for data manipulation. You can learn more about the dplyr package by typing vignette("dplyr") into your console.

In addition to the functionality of the dplyr package we make use of the pipe operator (%>%). This allows us to write less code which is even more readable. Learn more on the pipe operator by visiting the free online book R for Data Science by Garrett Grolemund and Hadley Wickham (Chapter 18: Pipes). First we load the required packages.

library(dplyr)

Then we stack the time series data, which is of object class xts, into a data frame, as the dplyr package is optimized to work on data.frame objects.

# stack time series data into a data frame 
df.hour <- data.frame('Datetime' = index(ts.FUB.hourly), 
                      coredata(ts.FUB.hourly))

Now we run the analysis:

# initial dataframe
mean.monthly.rainfall <- df.hour %>%
  #### DATA WRANGLING ####
  # generate two more columns and populate them with the year and month
  mutate(year = year(Datetime),
         month = month(Datetime)) %>%
  # group the data by these new columns
  group_by(year, month) %>%
  # calculate the monthly rainfall for each year
  summarise(year.month.rainfall = sum(Rain))  %>%
  # group the data by the month
  group_by(month) %>%
  # calculate the mean annual monthly rainfall
  summarise(months.count = n(),
            mean.month.rainfall = mean(year.month.rainfall, na.rm = T))
mean.monthly.rainfall
## # A tibble: 12 x 3
##    month months.count mean.month.rainfall
##    <dbl>        <int>               <dbl>
##  1     1           15                49.3
##  2     2           15                31.3
##  3     3           15                33.2
##  4     4           15                26.4
##  5     5           15                60.4
##  6     6           15                53.3
##  7     7           15                83.5
##  8     8           15                65.9
##  9     9           15                43.1
## 10    10           15                47.0
## 11    11           15                44.7
## 12    12           15                42.8

We wrote in total five lines of code and calculated the mean monthly rainfall at the weather station Berlin-Dahlem for the period of 15 years based on hourly data.

Now we visualize the data using the ggplot() function.

mean.monthly.rainfall %>%
  #### PLOTTING ####
  # add column with month names for labeling
  mutate(label.month = month(month,label = TRUE, abbr = TRUE)) %>%
  # plot using ggplot()
  ggplot(aes(x = label.month, y = mean.month.rainfall)) + 
  # design the bar plot
  geom_bar(stat = "identity", width = 0.6, color = "black") + 
  # label axes
  labs(y = "Rainfall", x = "") +
  # add title
  ggtitle(paste('Mean monthly rainfall at Berlin-Dahlem for the period',
          year(xts::first(ts.FUB.hourly)), # calculate start year of series
          'to',
          year(xts::last(ts.FUB.hourly)))) + # calculate end year of series
  # pick black and white background theme
  theme_bw()


2. Exercise

Our goal of the second exercise is to calculate the mean diurnal monthly rainfall at Berlin-Dahlem for the period 2002 to 2016.

Again we make use of the dplyr package and the pipe operator %>%. The code is very similar to the code from above, however with minor modifications we can solve a very distinct data analysis task.

df.hour %>% 
  ### DATA WRANGLING ###
  mutate(month = month(Datetime), 
         hour = hour(Datetime)) %>% 
  group_by(month, hour) %>%
  summarize(sum.hour = sum(Rain, na.rm = TRUE)) %>%
  mutate(sum.hour = sum.hour/15) %>% # normalize to annual mean
  ### PLOTTING ###
  mutate(label.month = month(month, label = TRUE, abbr = TRUE)) %>%
  ggplot(aes(x = hour, y = sum.hour)) + 
  geom_bar(stat = "identity", width = 0.1, color = "black") + 
  facet_wrap( ~ label.month, ncol = 4) + 
  labs(y = "Rainfall (mm)", x = "Hour of the day") +
  ggtitle(paste('Mean diurnal monthly rainfall at Berlin-Dahlem for the period',
          year(xts::first(ts.FUB.hourly)),
          'to',
          year(xts::last(ts.FUB.hourly)))) +
  theme_bw()