A very important and recurrent task in time series analysis is the calculation of summary statistics. In order to introduce summary statistics for time series analysis we revisit the monthly, daily and hourly data sets from the weather station Berlin-Dahlem (FU). Check out the previous section on data sets used to remind yourself how we processed the data.

Let us load the monthly (ts_FUB_monthly), daily (ts_FUB_daily) and hourly (ts_FUB_hourly) time series data for the weather station Berlin-Dahlem (FU) into R.

load(url("https://userpage.fu-berlin.de/soga/data/r-data/DWD_FUB.RData"))

For the sake of simplicity we reduce the monthly and daily data sets and focus on the 10-year period from 2000 to 2009.

library(xts)
library(lubridate)
library(ggfortify)
library(gridExtra)

### 10-year period from 2000 to 2009 daily data ###
daily_2000_2009 <- ts_FUB_daily["2000/2009", ]


### 10-year period from 2000 to 2009 monthly data ###
monthly_2000_2009 <- ts_FUB_monthly["2000/2009"]

### PLOTTING ###
p1 <- autoplot(daily_2000_2009,
  main = "Daily rainfall and mean daily \ntemperature at Berlin-Dahlem"
) +
  theme(plot.title = element_text(size = 12))

p2 <- autoplot(monthly_2000_2009,
  main = "Mean monthly temperature \nat Berlin-Dahlem"
) + theme(plot.title = element_text(size = 12))

grid.arrange(p1, p2, ncol = 2)

To get a quick overview on the statistical characteristics of time series we use the summary() function. The function returns basic statistics for the whole data set.

summary(monthly_2000_2009)
##      Index            monthly_2000_2009
##  Min.   :2000-01-01   Min.   :-3.590   
##  1st Qu.:2002-06-23   1st Qu.: 4.093   
##  Median :2004-12-16   Median : 9.780   
##  Mean   :2004-12-15   Mean   : 9.938   
##  3rd Qu.:2007-06-08   3rd Qu.:15.678   
##  Max.   :2009-12-01   Max.   :23.200
summary(daily_2000_2009)
##      Index                 Temp              Rain       
##  Min.   :2000-01-01   Min.   :-15.100   Min.   : 0.000  
##  1st Qu.:2002-07-02   1st Qu.:  4.000   1st Qu.: 0.000  
##  Median :2004-12-31   Median : 10.300   Median : 0.000  
##  Mean   :2004-12-31   Mean   :  9.976   Mean   : 1.684  
##  3rd Qu.:2007-07-02   3rd Qu.: 16.000   3rd Qu.: 1.500  
##  Max.   :2009-12-31   Max.   : 27.200   Max.   :63.200

Another useful function is the monthplot() function, which plots seasonal (monthly by default) sub-series of a time series. For each season (month) a time series is plotted and a defined function, such as the mean (default), the median or the standard deviation, among others, is applied to the sub-series. The default method assumes observations come in groups of 12.

monthplot(monthly_2000_2009,
  ylab = "Temperature",
  xlab = "Month",
  base = mean,
  col.base = "red"
)

The horizontal bars represent the mean monthly temperature and the lines represent the time series for each particular sub-series (month).

Exercise: Create a monthplot of the time series monthly_2000_2009 with the standard deviation for each month

## Your code here...
Show code
monthplot(monthly_2000_2009,
  ylab = "Temperature",
  xlab = "Month",
  base = sd,
  col.base = "red"
)


A more recent implementation of the same logic is provided by the ggfreqplot() function, which allows us to specify a confidence interval. To showcase the function we first subset the monthly time series to the 20th by using the window() function. Then we plot this new time series using the ggfreqplot() function.

### subsetting ###
monthly_21th <- ts_FUB_monthly["2000/2015"]

# function expects a ts object
monthly_21th_ts <- ts(monthly_21th,
  start = year(xts::first(index(monthly_21th))),
  frequency = 12
)

### plotting ###
ggfreqplot(monthly_21th_ts,
  freq = 12,
  conf.int = TRUE,
  conf.int.value = 0.95
) +
  ggtitle(expression("21"^{
    st
  } ~ "century mean monthly temperatures at Berlin-Dahlem")) +
  theme(plot.title = element_text(size = 12))

The resulting plot lets us immediately figure out which months show more variability and asses if the variability is statistically significant.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.