STL is an acronym for “Seasonal and Trend decomposition using Loess”, while loess (locally weighted regression and scatterplot smoothing) is a method for estimating nonlinear relationships.
We start our analysis with the creation of a ts
object.
This is done by the ts()
function. We make use of the
first
and last
function from the
xts
package and the year
and
month
function from the lubridate
package to
extract the starting year/month (start
) and the ending
year/month (end
) combinations of the data set
programmatically.
# load the co2 data set
library(xts)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/KeelingCurve.Rdata"))
library(lubridate)
start <- c(year(xts::first(co2)), month(xts::first(co2)))
start
## [1] 1958 3
end <- c(year(xts::last(co2)), month(xts::last(co2)))
end
## [1] 2022 6
# creation of a ts object
co2 <- ts(data = as.vector(coredata(co2)),
start = start,
end = end, frequency = 12)
In R the stl()
function performs decomposition of a time
series into seasonal, trend and irregular components using Loess.
The function requires the s.window
argument, which is
either the character string "periodic"
or the span (in
lags), an odd number, of the loess window for seasonal extraction. The
default value is 7. Type help(stl)
into your console for
more information on the stl()
function.
Let us apply the stl()
function and plot the result with
the convenient autoplot()
function.
# set up stl function
fit <- stl(co2, s.window = "periodic")
# plot stl
library(ggfortify)
autoplot(fit)
The plot shows four panels. In the uppermost we see the raw data, the Keeling curve. In the subsequent plot we see the trend component, the seasonal component and the remainder (in opposite order). We realize a very strong linear trend in the data set. Let us redo the analysis, however this time we focus on the 21st century.
co2_2000 <- window(co2,
start = c(2000, 1),
end = c(2015, 12)
)
fit_2000 <- stl(co2_2000, s.window = "periodic")
autoplot(fit_2000)
Nice plot! The bars at its right side are of equal heights in user coordinates. This time it is much easier to spot the seasonal oscillation in the CO2 concentration. The remainder seems quite noisy and is devoid of a particular pattern. This indicates that the seasonal decomposition did a good job in extracting the trend and seasonal components.
Exercise: Plot the data set for the 21st and overlay it with the linear trend extracted by STL.
The decomposed parts of the time series are stored in the
stl
object and can be assessed with the
$time.series
notation.
## Your code here...
plot.ts(co2_2000,
col = "gray",
ylab = expression("CO"[2] * " (ppm)")
)
lines(fit_2000$time.series[, "trend"],
col = "red", lwd = 1, lty = 2
)
A very typical use case is to apply STL for detrending a time series. We detrend our data set by subtracting the trend from the original data.
\[y_t^* = y_t- T_t\]
detrended <- co2_2000 - fit_2000$time.series[, "trend"]
plot.ts(detrended,
ylab = "CO2",
main = "Seasonal variability of CO2 in the atmosphere",
cex.main = 0.85
)
Let us investigate the remainder and review if the residuals are normally distributed.
plot(fit_2000$time.series[, "remainder"], main = "Remainder", ylab = "")
Exercise: Create a QQ-plot using the
qqnorm()
and theqqline()
function.
## Your code here...
qqnorm(fit_2000$time.series[, "remainder"])
qqline(fit_2000$time.series[, "remainder"])
The residuals seem fairly well normally distributed. However, let us
apply a Shapiro–Wilk test and check for normality.
Exercise: Apply a Shapiro–Wilk test and check the remainder for normality.
## Your code here...
shapiro.test(fit_2000$time.series[, "remainder"])
##
## Shapiro-Wilk normality test
##
## data: fit_2000$time.series[, "remainder"]
## W = 0.99013, p-value = 0.2099
The large p-value supports our believe, that the seasonal decomposition did a good job in extracting the trend and the seasonal components, as the residuals seems to be just noise.
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.