A very important and recurrent task in time series analysis is the calculation of summary statistics. In order to introduce summary statistics for time series analysis we revisit the monthly, daily and hourly data sets from the weather station Berlin-Dahlem (FU). Check out the previous section on data sets used to remind yourself how we processed the data.
Let us load the monthly (ts_FUB_monthly
), daily
(ts_FUB_daily
) and hourly (ts_FUB_hourly
) time
series data for the weather station Berlin-Dahlem (FU) into R.
load(url("https://userpage.fu-berlin.de/soga/data/r-data/DWD_FUB.RData"))
For the sake of simplicity we reduce the monthly and daily data sets and focus on the 10-year period from 2000 to 2009.
library(xts)
library(lubridate)
library(ggfortify)
library(gridExtra)
### 10-year period from 2000 to 2009 daily data ###
daily_2000_2009 <- ts_FUB_daily["2000/2009", ]
### 10-year period from 2000 to 2009 monthly data ###
monthly_2000_2009 <- ts_FUB_monthly["2000/2009"]
### PLOTTING ###
p1 <- autoplot(daily_2000_2009,
main = "Daily rainfall and mean daily \ntemperature at Berlin-Dahlem"
) +
theme(plot.title = element_text(size = 12))
p2 <- autoplot(monthly_2000_2009,
main = "Mean monthly temperature \nat Berlin-Dahlem"
) + theme(plot.title = element_text(size = 12))
grid.arrange(p1, p2, ncol = 2)
To get a quick overview on the statistical characteristics of time
series we use the summary()
function. The function returns
basic statistics for the whole data set.
summary(monthly_2000_2009)
## Index monthly_2000_2009
## Min. :2000-01-01 Min. :-3.590
## 1st Qu.:2002-06-23 1st Qu.: 4.093
## Median :2004-12-16 Median : 9.780
## Mean :2004-12-15 Mean : 9.938
## 3rd Qu.:2007-06-08 3rd Qu.:15.678
## Max. :2009-12-01 Max. :23.200
summary(daily_2000_2009)
## Index Temp Rain
## Min. :2000-01-01 Min. :-15.100 Min. : 0.000
## 1st Qu.:2002-07-02 1st Qu.: 4.000 1st Qu.: 0.000
## Median :2004-12-31 Median : 10.300 Median : 0.000
## Mean :2004-12-31 Mean : 9.976 Mean : 1.684
## 3rd Qu.:2007-07-02 3rd Qu.: 16.000 3rd Qu.: 1.500
## Max. :2009-12-31 Max. : 27.200 Max. :63.200
Another useful function is the monthplot()
function,
which plots seasonal (monthly by default) sub-series of a time series.
For each season (month) a time series is plotted and a defined function,
such as the mean (default), the median or the standard deviation, among
others, is applied to the sub-series. The default method assumes
observations come in groups of 12.
monthplot(monthly_2000_2009,
ylab = "Temperature",
xlab = "Month",
base = mean,
col.base = "red"
)
The horizontal bars represent the mean monthly temperature and the lines represent the time series for each particular sub-series (month).
Exercise: Create a monthplot of the time series
monthly_2000_2009
with the standard deviation for each month
## Your code here...
monthplot(monthly_2000_2009,
ylab = "Temperature",
xlab = "Month",
base = sd,
col.base = "red"
)
A more recent implementation of the same logic is provided by the
ggfreqplot()
function, which allows us to specify a
confidence interval. To showcase the function we first subset the
monthly time series to the 20th by using the
window()
function. Then we plot this new time series using
the ggfreqplot()
function.
### subsetting ###
monthly_21th <- ts_FUB_monthly["2000/2015"]
# function expects a ts object
monthly_21th_ts <- ts(monthly_21th,
start = year(xts::first(index(monthly_21th))),
frequency = 12
)
### plotting ###
ggfreqplot(monthly_21th_ts,
freq = 12,
conf.int = TRUE,
conf.int.value = 0.95
) +
ggtitle(expression("21"^{
st
} ~ "century mean monthly temperatures at Berlin-Dahlem")) +
theme(plot.title = element_text(size = 12))
The resulting plot lets us immediately figure out which months show more variability and asses if the variability is statistically significant.
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.