STL is an acronym for “Seasonal and Trend decomposition using Loess”, while loess (locally weighted regression and scatterplot smoothing) is a method for estimating nonlinear relationships.

We start our analysis with the creation of a ts object. This is done by the ts() function. We make use of the first and last function from the xts package and the year and month function from the lubridate package to extract the starting year/month (start) and the ending year/month (end) combinations of the data set programmatically.

# load the co2 data set
library(xts)
load(url("https://userpage.fu-berlin.de/soga/data/r-data/KeelingCurve.Rdata"))

library(lubridate)
start <- c(year(xts::first(co2)), month(xts::first(co2)))
start
## [1] 1958    3
end <- c(year(xts::last(co2)), month(xts::last(co2)))
end
## [1] 2022    6
# creation of a ts object
co2 <- ts(data = as.vector(coredata(co2)),
          start = start,
          end = end, frequency = 12)

In R the stl() function performs decomposition of a time series into seasonal, trend and irregular components using Loess.

The function requires the s.window argument, which is either the character string "periodic" or the span (in lags), an odd number, of the loess window for seasonal extraction. The default value is 7. Type help(stl) into your console for more information on the stl() function.

Let us apply the stl() function and plot the result with the convenient autoplot() function.

# set up stl function
fit <- stl(co2, s.window = "periodic")

# plot stl
library(ggfortify)
autoplot(fit)

The plot shows four panels. In the uppermost we see the raw data, the Keeling curve. In the subsequent plot we see the trend component, the seasonal component and the remainder (in opposite order). We realize a very strong linear trend in the data set. Let us redo the analysis, however this time we focus on the 21st century.

co2_2000 <- window(co2,
  start = c(2000, 1),
  end = c(2015, 12)
)

fit_2000 <- stl(co2_2000, s.window = "periodic")
autoplot(fit_2000)

Nice plot! The bars at its right side are of equal heights in user coordinates. This time it is much easier to spot the seasonal oscillation in the CO2 concentration. The remainder seems quite noisy and is devoid of a particular pattern. This indicates that the seasonal decomposition did a good job in extracting the trend and seasonal components.


Exercise: Plot the data set for the 21st and overlay it with the linear trend extracted by STL.

The decomposed parts of the time series are stored in the stl object and can be assessed with the $time.series notation.

## Your code here...
Show code
plot.ts(co2_2000,
  col = "gray",
  ylab = expression("CO"[2] * " (ppm)")
)

lines(fit_2000$time.series[, "trend"],
  col = "red", lwd = 1, lty = 2
)



A very typical use case is to apply STL for detrending a time series. We detrend our data set by subtracting the trend from the original data.

\[y_t^* = y_t- T_t\]

detrended <- co2_2000 - fit_2000$time.series[, "trend"]
plot.ts(detrended,
  ylab = "CO2",
  main = "Seasonal variability of CO2 in the atmosphere",
  cex.main = 0.85
)


Let us investigate the remainder and review if the residuals are normally distributed.

plot(fit_2000$time.series[, "remainder"], main = "Remainder", ylab = "")

Exercise: Create a QQ-plot using the qqnorm() and the qqline() function.

## Your code here...
Show code
qqnorm(fit_2000$time.series[, "remainder"])
qqline(fit_2000$time.series[, "remainder"])


The residuals seem fairly well normally distributed. However, let us apply a Shapiro–Wilk test and check for normality.

Exercise: Apply a Shapiro–Wilk test and check the remainder for normality.

## Your code here...
Show code
shapiro.test(fit_2000$time.series[, "remainder"])
## 
##  Shapiro-Wilk normality test
## 
## data:  fit_2000$time.series[, "remainder"]
## W = 0.99013, p-value = 0.2099

The large p-value supports our believe, that the seasonal decomposition did a good job in extracting the trend and the seasonal components, as the residuals seems to be just noise.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.