A symmetric moving average \((m_t)\) for the observation \(x_t\) is given by

\[m_t = \sum_{i=-k}^k a_ix_{t-i}\text{,}\]

where \(a_i = a_{-i}\) and \(\sum_{i=-k}^ka_i = 1\).

Consider a monthly time series. A 3-month low pass filter can be written in mathematical notation as

\[m_t = \frac{1}{3}(x_{t-1}+x_t+x_{t+1})\]

In order to construct a symmetric 3-month low pass filter with R we build a vector of that form:

rep(1, 3) / 3
## [1] 0.3333333 0.3333333 0.3333333

In contrast, a symmetric one-year filter is specified by

rep(1, 12) / (12)
##  [1] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
##  [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333


A simple way to apply a moving average smoother (low-pass filter) in R is to apply a linear filter, using the filter() function. By default the method uses a moving average of values on both sides (before and after) each measurement.

To specify the nature of our low-pass filter we have to specify a vector of filter coefficients, which we provide to the filter argument in the filter() function.

Let us build three low pass filters:

As the function name filter is a quite common word, we explicitly state that we refer to the filter() function of the stats package by typing stats::filter().

library(xts)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/Earth_Surface_Temperature.RData"))
temp_global_f1 <- stats::filter(temp_global,
  filter = rep(1, 12) / 12
)
temp_global_f5 <- stats::filter(temp_global,
  filter = rep(1, 12 * 5) / (12 * 5)
)
temp_global_f10 <- stats::filter(temp_global,
  filter = rep(1, 12 * 10) / (12 * 10)
)

## plotting ##
plot.ts(temp_global, lwd = 2, col = "gray")
lines(temp_global_f1, col = "red")
lines(temp_global_f5, col = "green")
lines(temp_global_f10, col = "blue")
abline(h = 0)
legend("topleft",
  legend = c(
    "one-year filter",
    "five-year filter",
    "ten-year filter"
  ),
  col = c("red", "green", "blue"),
  lty = 1, cex = 0.7
)

The filter() function is quite flexible, hence it is straightforward to define a custom filter.

Exercise: Design a custom symmetric seven-year filter for a monthly time series that gives full weight to the measurement year, three-quarters weight to adjacent year, half weight to years two removed, and quarter weight to years three removed.

To solve this task the rep() function and its additional argument each may be useful.

## Your code here...
f <- NULL

Our custom filter should look like this:

Show code
# construct pattern
f <- c(0.25, 0.5, 0.75, 1, 0.75, 0.5, 0.25)
# account for monthly data structure of the time series
f <- rep(f, each = 12)
# normalization
f <- f / sum(f) # weights
plot(f, xlab = "", xaxt = "n") # plot for a better understanding


Let us apply our custom symmetric seven-year filter and plot the result.

temp_global_f <- stats::filter(temp_global, filter = f)
plot.ts(temp_global, lwd = 2, col = "gray")
lines(temp_global_f, col = "red")
legend("topleft",
  legend = c(
    "monthly data",
    "custom seven-year filter"
  ),
  col = c("gray", "red"),
  lty = 1,
  lwd = c(2, 1),
  cex = 0.7
)
abline(h = 0)

The concept of filtering time series is implemented in many different R packages and often referred to as moving average.

rollmean()

In the zoo package the low-pass filter / moving average is implemented in the rollmean() function. Let us redo the example from above and plot moving averages of one, 5 and 10 years. The nice thing about the rollmean() function is, that it returns a zoo object. Hence, it is easy to plot them using the advanced machinery of the zoo package.

library(zoo)
temp_global_f1 <- rollmean(temp_global, 12)
temp_global_f5 <- rollmean(temp_global, 12 * 5)
temp_global_f10 <- rollmean(temp_global, 12 * 10)

plot.zoo(cbind(
  temp_global,
  temp_global_f1,
  temp_global_f5,
  temp_global_f10
),
plot.type = "single",
col = c("gray", "red", "green", "blue"),
main = "Earth Surface Temperature Anomalies",
ylab = "", xlab = ""
)
legend("topleft",
  legend = c(
    "one-year filter",
    "five-year filter",
    "ten-year filter"
  ),
  col = c("red", "green", "blue"),
  lty = 1, cex = 0.65
)

If we skip the plot.type argument in the plot.zoo() function we get back by default a panel plot.

plot.zoo(cbind(
  temp_global,
  temp_global_f1,
  temp_global_f5,
  temp_global_f10
),
# plot.type = "single",
col = c("gray", "red", "green", "blue"),
main = "Earth Surface Temperature Anomalies",
ylab = "", xlab = ""
)

Be aware that rollmean() needs full k observations to compute the mean, thus the output is normally missing a few initial data points. We may pass the na.pad = TRUE argument to the function and it pad the missing output with NA values.


rollapply()

Another, very similar, but more flexible function from the zoo package is the rollapply() function. This function calls a function on a window of the data set, saves the result, and moves to the next window, and repeats this pattern for the entire input. By default, rollapply() recalculates the function at successive data points. However, by the additional by = n argument we define a step size for the sequence of calculations. This causes the calculations being applied on separate, not overlapping, segments of the time series.

With the functionality of the rollapply() we may for example analyse the time series of the earth surface temperature anomalies with respect to the question if the variability in the monthly mean temperature changes within a window of 30 years.

Therefore, we first define our reference period. Let us say our reference period is the 30-year period from 1986 to 2015.

# reference period
ref_period <- window(temp_global,
  start = "1986-01-01",
  end = "2015-12-31"
)

# reference period standard deviation
ref_period_sd <- sd(ref_period)
ref_period_sd
## [1] 0.3982009

In the next step we apply the rollapply() function.

# define integration region

width <- by <- 12 * 30 # 30 years
sd_30years <- rollapply(
  data = temp_global,
  width = width, #  window width (in numbers of observations)
  by = by, # calculate function (FUN) at every by-th time point
  FUN = sd,
  align = "left"
)
res_30years <- as.data.frame(sd_30years[!is.na(sd_30years)])
res_30years
##          Monthly_Anomaly_Global
## Jan 1850              0.5172489
## Jan 1880              0.3816578
## Jan 1910              0.3498584
## Jan 1940              0.3089930
## Jan 1970              0.3936729
library(lubridate)

# generate more informative labeling
x <- year(as.yearmon(row.names(res_30years)))
y <- year(as.yearmon(row.names(res_30years))) + 29

# plot
barplot(res_30years$Monthly_Anomaly_Global,
  names.arg = paste0(x, "-", y),
  cex.names = 0.70,
  main = "Standard deviation of mean monthly temperatur anomalies for \na 30-year non-overlapping sliding window",
  cex.main = 0.9
)
abline(h = ref_period_sd, col = "red", lty = 2)
legend(
  x = 4, y = 0.48,
  legend = c("stdev reference period (1986-2015)"),
  col = "red",
  lty = 2,
  cex = 0.5
)

A nice plot. The plot suggest that the variability in mean monthly temperatures during the reference period 1986-2015 was higher, compared to the 30-year time periods of 1880-1909, 1910-1939, 1940-1969, 1970-1999. However, during the 30-year periods of 1850-1879 the variability of mean monthly temperatures appears to be even higher than during the reference period of 1986-2015.

Now we can take this analysis even a step further.

Exercise: Repeat the analysis from above, this time however, evaluate the standard deviation of mean monthly temperature anomalies for a 30-year period for each year in the time series (overlapping sliding window)

The code is nearly the same as the code above.

# define integration region
## Your code here...
width <- NULL
by <- NULL
Show code
# define integration region
width <- 12 * 30 # 30 year
by <- 12 # one years
sd_30years_multiple <- rollapply(
  data = temp_global,
  width = width,
  by = by,
  FUN = sd,
  align = "center"
)

# save results
res_30years_multiple <- sd_30years_multiple[!is.na(sd_30years_multiple)]

# plotting
plot(year(res_30years_multiple),
  coredata(res_30years_multiple),
  type = "l",
  xlab = "", ylab = "",
  main = "Standard deviation of mean monthly
     temperatur\nanomalies for a 30-year sliding window", cex.main = 0.9
)
abline(h = ref_period_sd, col = "red", lty = 2)
legend("topright",
  legend = c("stdev reference period (1986-2015)"),
  col = "red",
  lty = 2,
  cex = 0.8
)


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.