A symmetric moving average \((m_t)\) for the observation \(x_t\) is given by
\[m_t = \sum_{i=-k}^k a_ix_{t-i}\text{,}\]
where \(a_i = a_{-i}\) and \(\sum_{i=-k}^ka_i = 1\).
Consider a monthly time series. A 3-month low pass filter can be written in mathematical notation as
\[m_t = \frac{1}{3}(x_{t-1}+x_t+x_{t+1})\]
In order to construct a symmetric 3-month low pass filter with R we build a vector of that form:
rep(1, 3) / 3
## [1] 0.3333333 0.3333333 0.3333333
In contrast, a symmetric one-year filter is specified by
rep(1, 12) / (12)
## [1] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
A simple way to apply a moving average smoother (low-pass filter) in
R is to apply a linear filter, using the filter() function.
By default the method uses a moving average of values on both sides
(before and after) each measurement.
To specify the nature of our low-pass filter we have to specify a
vector of filter coefficients, which we provide to the
filter argument in the filter() function.
Let us build three low pass filters:
As the function name filter is a quite common word, we
explicitly state that we refer to the filter() function of
the stats package by typing
stats::filter().
library(xts)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/Earth_Surface_Temperature.RData"))
temp_global_f1 <- stats::filter(temp_global,
filter = rep(1, 12) / 12
)
temp_global_f5 <- stats::filter(temp_global,
filter = rep(1, 12 * 5) / (12 * 5)
)
temp_global_f10 <- stats::filter(temp_global,
filter = rep(1, 12 * 10) / (12 * 10)
)
## plotting ##
plot.ts(temp_global, lwd = 2, col = "gray")
lines(temp_global_f1, col = "red")
lines(temp_global_f5, col = "green")
lines(temp_global_f10, col = "blue")
abline(h = 0)
legend("topleft",
legend = c(
"one-year filter",
"five-year filter",
"ten-year filter"
),
col = c("red", "green", "blue"),
lty = 1, cex = 0.7
)
The filter() function is quite flexible, hence it is
straightforward to define a custom filter.
Exercise: Design a custom symmetric seven-year filter for a monthly time series that gives full weight to the measurement year, three-quarters weight to adjacent year, half weight to years two removed, and quarter weight to years three removed.
To solve this task the rep() function and its additional
argument each may be useful.
## Your code here...
f <- NULL
Our custom filter should look like this:
# construct pattern
f <- c(0.25, 0.5, 0.75, 1, 0.75, 0.5, 0.25)
# account for monthly data structure of the time series
f <- rep(f, each = 12)
# normalization
f <- f / sum(f) # weights
plot(f, xlab = "", xaxt = "n") # plot for a better understanding
Let us apply our custom symmetric seven-year filter and plot the result.
temp_global_f <- stats::filter(temp_global, filter = f)
plot.ts(temp_global, lwd = 2, col = "gray")
lines(temp_global_f, col = "red")
legend("topleft",
legend = c(
"monthly data",
"custom seven-year filter"
),
col = c("gray", "red"),
lty = 1,
lwd = c(2, 1),
cex = 0.7
)
abline(h = 0)
The concept of filtering time series is implemented in many different R packages and often referred to as moving average.
rollmean()In the zoo package the low-pass filter / moving average
is implemented in the rollmean() function. Let us redo the
example from above and plot moving averages of one, 5 and 10 years. The
nice thing about the rollmean() function is, that it
returns a zoo object. Hence, it is easy to plot them using
the advanced machinery of the zoo package.
library(zoo)
temp_global_f1 <- rollmean(temp_global, 12)
temp_global_f5 <- rollmean(temp_global, 12 * 5)
temp_global_f10 <- rollmean(temp_global, 12 * 10)
plot.zoo(cbind(
temp_global,
temp_global_f1,
temp_global_f5,
temp_global_f10
),
plot.type = "single",
col = c("gray", "red", "green", "blue"),
main = "Earth Surface Temperature Anomalies",
ylab = "", xlab = ""
)
legend("topleft",
legend = c(
"one-year filter",
"five-year filter",
"ten-year filter"
),
col = c("red", "green", "blue"),
lty = 1, cex = 0.65
)
If we skip the plot.type argument in the
plot.zoo() function we get back by default a panel
plot.
plot.zoo(cbind(
temp_global,
temp_global_f1,
temp_global_f5,
temp_global_f10
),
# plot.type = "single",
col = c("gray", "red", "green", "blue"),
main = "Earth Surface Temperature Anomalies",
ylab = "", xlab = ""
)
Be aware that rollmean() needs full k
observations to compute the mean, thus the output is normally missing a
few initial data points. We may pass the na.pad = TRUE
argument to the function and it pad the missing output with
NA values.
rollapply()Another, very similar, but more flexible function from the
zoo package is the rollapply() function. This
function calls a function on a window of the data set, saves the result,
and moves to the next window, and repeats this pattern for the entire
input. By default, rollapply() recalculates the function at
successive data points. However, by the additional by = n
argument we define a step size for the sequence of calculations. This
causes the calculations being applied on separate, not overlapping,
segments of the time series.
With the functionality of the rollapply() we may for
example analyse the time series of the earth surface temperature
anomalies with respect to the question if the variability in the monthly
mean temperature changes within a window of 30 years.
Therefore, we first define our reference period. Let us say our reference period is the 30-year period from 1986 to 2015.
# reference period
ref_period <- window(temp_global,
start = "1986-01-01",
end = "2015-12-31"
)
# reference period standard deviation
ref_period_sd <- sd(ref_period)
ref_period_sd
## [1] 0.3982009
In the next step we apply the rollapply() function.
# define integration region
width <- by <- 12 * 30 # 30 years
sd_30years <- rollapply(
data = temp_global,
width = width, # window width (in numbers of observations)
by = by, # calculate function (FUN) at every by-th time point
FUN = sd,
align = "left"
)
res_30years <- as.data.frame(sd_30years[!is.na(sd_30years)])
res_30years
## Monthly_Anomaly_Global
## Jan 1850 0.5172489
## Jan 1880 0.3816578
## Jan 1910 0.3498584
## Jan 1940 0.3089930
## Jan 1970 0.3936729
library(lubridate)
# generate more informative labeling
x <- year(as.yearmon(row.names(res_30years)))
y <- year(as.yearmon(row.names(res_30years))) + 29
# plot
barplot(res_30years$Monthly_Anomaly_Global,
names.arg = paste0(x, "-", y),
cex.names = 0.70,
main = "Standard deviation of mean monthly temperatur anomalies for \na 30-year non-overlapping sliding window",
cex.main = 0.9
)
abline(h = ref_period_sd, col = "red", lty = 2)
legend(
x = 4, y = 0.48,
legend = c("stdev reference period (1986-2015)"),
col = "red",
lty = 2,
cex = 0.5
)
A nice plot. The plot suggest that the variability in mean monthly temperatures during the reference period 1986-2015 was higher, compared to the 30-year time periods of 1880-1909, 1910-1939, 1940-1969, 1970-1999. However, during the 30-year periods of 1850-1879 the variability of mean monthly temperatures appears to be even higher than during the reference period of 1986-2015.
Now we can take this analysis even a step further.
Exercise: Repeat the analysis from above, this time however, evaluate the standard deviation of mean monthly temperature anomalies for a 30-year period for each year in the time series (overlapping sliding window)
The code is nearly the same as the code above.
# define integration region
## Your code here...
width <- NULL
by <- NULL
# define integration region
width <- 12 * 30 # 30 year
by <- 12 # one years
sd_30years_multiple <- rollapply(
data = temp_global,
width = width,
by = by,
FUN = sd,
align = "center"
)
# save results
res_30years_multiple <- sd_30years_multiple[!is.na(sd_30years_multiple)]
# plotting
plot(year(res_30years_multiple),
coredata(res_30years_multiple),
type = "l",
xlab = "", ylab = "",
main = "Standard deviation of mean monthly
temperatur\nanomalies for a 30-year sliding window", cex.main = 0.9
)
abline(h = ref_period_sd, col = "red", lty = 2)
legend("topright",
legend = c("stdev reference period (1986-2015)"),
col = "red",
lty = 2,
cex = 0.8
)
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.