A very common task in time series analysis is the task of subsetting a time series. Let us first load our working data sets for the monthly (ts_FUB_monthly), daily (ts_FUB_daily) and hourly (ts_FUB_hourly) time series for the weather station Berlin-Dahlem (FU) into R, and some libraries.

load(url("https://userpage.fu-berlin.de/soga/data/r-data/DWD_FUB.RData"))

library(xts)
library(lubridate)

Again, for the sake of this tutorial we transform the monthly data set (ts_FUB_monthly) into a ts object and present the operations of subsetting and indexing for both, the xts and the ts object type. See the previous section for details.

class(ts_FUB_monthly)
## [1] "xts" "zoo"
start <- c(
  year(index(ts_FUB_monthly)[01]),
  month(index(ts_FUB_monthly)[01])
)
ts_FUB_monthly <- ts(coredata(ts_FUB_monthly),
  start = start,
  frequency = 12
)
class(ts_FUB_monthly)
## [1] "ts"
class(ts_FUB_daily)
## [1] "xts" "zoo"
class(ts_FUB_hourly)
## [1] "xts" "zoo"

The oldest or newest observations

In order to extract the oldest or newest observations we can apply the head() and tail() function on both object types.

# ts object
head(ts_FUB_monthly, 10) # print the first 10 observations
##       Series 1
##  [1,]      2.8
##  [2,]      1.1
##  [3,]      5.2
##  [4,]      9.0
##  [5,]     15.1
##  [6,]     19.0
##  [7,]     21.4
##  [8,]     18.8
##  [9,]     13.9
## [10,]      9.0
# xts object
tail(ts_FUB_daily, 10) # print the last 10 observations
##            Temp Rain
## 2021-12-22 -0.8  0.0
## 2021-12-23 -1.1  8.8
## 2021-12-24  2.0  6.1
## 2021-12-25 -6.1  0.0
## 2021-12-26 -6.8  0.0
## 2021-12-27 -3.7  0.0
## 2021-12-28 -0.5  1.5
## 2021-12-29  4.0  0.3
## 2021-12-30  9.0  3.2
## 2021-12-31 12.8  5.5

With respect to the oldest or newest observations the xts object provides additional functionality. For instance, the functions first() and last() take an additional argument such as hours, days, weeks, months, quarters, and years to specify the range of oldest and newest observations, respectively.

xts::first(ts_FUB_daily, "3 days")
##            Temp Rain
## 1950-01-01 -3.2  2.2
## 1950-01-02  1.0 12.6
## 1950-01-03  2.8  0.5
xts::last(ts_FUB_daily, "1 week") # a week starts with Monday
##            Temp Rain
## 2021-12-27 -3.7  0.0
## 2021-12-28 -0.5  1.5
## 2021-12-29  4.0  0.3
## 2021-12-30  9.0  3.2
## 2021-12-31 12.8  5.5

Please note that the function names first() and last() are quite common words and thus, it may occur that another package in your namespace masked these function; thus we use the :: notation to state explicitly that we refer to the function from the xts package.


Slicing

In addition to selecting the oldest or newest observations we may index our time series in the same manner as we are used from indexing vectors, matrices and data frames. This basic indexing works on both, the xts and the ts object type.

# ts object
ts_FUB_monthly[518:521]
## [1]  2.0  2.5 12.0 15.4
# xts object
ts_FUB_daily[78:81]
##            Temp Rain
## 1950-03-19 11.0    0
## 1950-03-20 10.3    0
## 1950-03-21  9.6    0
## 1950-03-22  9.1    0

As we may store multiple time series in an xts object, in our case the times series of daily temperature and daily rainfall, we may index the columns as well.

ts_FUB_daily[78:81, "Temp"]
##            Temp
## 1950-03-19 11.0
## 1950-03-20 10.3
## 1950-03-21  9.6
## 1950-03-22  9.1

Further, the xts objects allow indexing based on a date string ("yyyy-mm-dd"):

ts_FUB_daily["1989-07-12"]
##            Temp Rain
## 1989-07-12 20.3    0
ts_FUB_daily["2001-09"]
##            Temp Rain
## 2001-09-01 17.0  0.0
## 2001-09-02 14.7  1.5
## 2001-09-03 17.0  3.7
## 2001-09-04 14.4  1.7
## 2001-09-05 13.4  0.0
## 2001-09-06 14.4  0.0
## 2001-09-07 13.3  9.7
## 2001-09-08 12.2  0.6
## 2001-09-09 10.0 27.0
## 2001-09-10 12.9  6.4
## 2001-09-11 10.6 10.5
## 2001-09-12 11.8  1.6
## 2001-09-13 11.5  8.4
## 2001-09-14 11.3  2.0
## 2001-09-15 12.6  0.4
## 2001-09-16 11.4  2.4
## 2001-09-17 10.0  0.0
## 2001-09-18  9.5 24.1
## 2001-09-19 11.6  0.0
## 2001-09-20 12.0  4.3
## 2001-09-21 12.5  6.5
## 2001-09-22 11.3  2.3
## 2001-09-23 12.3  0.0
## 2001-09-24 11.3  4.9
## 2001-09-25 11.7  0.0
## 2001-09-26 11.0  0.0
## 2001-09-27  9.6 23.0
## 2001-09-28 12.1  0.0
## 2001-09-29 14.1  0.0
## 2001-09-30 13.0  4.5
ts_FUB_daily["1989-07/1989-07-12"]
##            Temp Rain
## 1989-07-01 17.8  6.9
## 1989-07-02 17.6  2.3
## 1989-07-03 13.8  0.6
## 1989-07-04 21.0  0.0
## 1989-07-05 22.4  0.0
## 1989-07-06 25.2  0.0
## 1989-07-07 25.3  0.0
## 1989-07-08 24.2  0.0
## 1989-07-09 21.8  0.2
## 1989-07-10 18.4  0.0
## 1989-07-11 17.9  0.0
## 1989-07-12 20.3  0.0

An xts object can also be indexed by any sequence. For example we may select every seventh day in a certain time period.

dates <- seq(as.Date("2010-01-15"),
  as.Date("2010-04-12"),
  by = 7
)

ts_FUB_daily[dates]
##            Temp Rain
## 2010-01-15 -3.4  0.0
## 2010-01-22 -7.7  0.0
## 2010-01-29  0.2  4.2
## 2010-02-05  0.5  0.0
## 2010-02-12 -1.2  0.5
## 2010-02-19  2.1  0.4
## 2010-02-26  6.1  0.0
## 2010-03-05 -2.6  0.0
## 2010-03-12  1.7  3.2
## 2010-03-19 10.9  0.2
## 2010-03-26 14.8 10.2
## 2010-04-02  6.0  0.0
## 2010-04-09  8.7  0.2

Subsetting a particular part of the series

Another broadly used subsetting approach is to subset a particular part of the series. In R we use the window() function to achieve that task. The window() function works on both of our object types.

Let us apply the window() function to extract the monthly mean temperature for the station Berlin-Dahlem (FU) for the 10-year period from 1990 to 1999. Please note the nicely structured representation of our data given by the ts object.

# ts object
temp_1990_1999 <- window(ts_FUB_monthly,
  start = 1990,
  end = 1999
)

temp_1990_1999
##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
## 1990  3.86  6.23  7.88  9.14 14.88 16.36 17.44 18.75 12.27 10.52  5.26  1.13
## 1991  2.30 -2.34  6.77  8.23 10.52 14.52 20.59 18.33 15.07  8.98  4.67  1.71
## 1992  1.56  3.73  5.25  8.96 15.56 20.29 19.78 19.91 13.75  5.97  5.05  1.13
## 1993  2.35  0.43  4.29 11.67 16.40 16.02 16.48 16.17 12.72  8.78  0.61  3.51
## 1994  3.37 -0.36  5.88  9.59 13.68 16.26 22.80 18.48 13.75  7.25  6.83  3.75
## 1995  0.73  4.70  3.87  9.19 13.31 15.41 21.42 19.64 13.63 11.62  2.35 -2.53
## 1996 -3.69 -2.34  1.16  9.95 12.19 16.32 16.25 18.70 11.47  9.89  5.27 -2.35
## 1997 -2.06  4.41  5.35  6.99 13.78 17.66 19.24 21.27 14.55  8.40  4.07  2.31
## 1998  3.17  6.02  5.11 10.59 15.48 17.41 17.41 17.00 14.28  8.68  1.60  0.80
## 1999  3.16
plot(temp_1990_1999)

We also apply the window() function on the ts_FUB_hourly xts object to extract the hourly precipitation for the station Berlin-Dahlem (FU) for a period of heavy rainfall from 29 July to 31 July 2011.

temp_2011_JUL_29_31 <- window(ts_FUB_hourly,
  start = "2011-07-29",
  end = "2011-07-31"
)

temp_2011_JUL_29_31
##                     Rain
## 2011-07-29 00:00:00  0.0
## 2011-07-29 01:00:00  0.0
## 2011-07-29 02:00:00  0.0
## 2011-07-29 03:00:00  0.0
## 2011-07-29 04:00:00  1.2
## 2011-07-29 05:00:00  3.3
## 2011-07-29 06:00:00  2.5
## 2011-07-29 07:00:00  2.5
## 2011-07-29 08:00:00  4.8
## 2011-07-29 09:00:00  1.2
## 2011-07-29 10:00:00  2.2
## 2011-07-29 11:00:00  3.4
## 2011-07-29 12:00:00  1.6
## 2011-07-29 13:00:00  5.3
## 2011-07-29 14:00:00  1.1
## 2011-07-29 15:00:00  0.7
## 2011-07-29 16:00:00  1.2
## 2011-07-29 17:00:00  2.0
## 2011-07-29 18:00:00  1.1
## 2011-07-29 19:00:00  0.5
## 2011-07-29 21:00:00  1.4
## 2011-07-29 22:00:00  1.0
## 2011-07-29 23:00:00  1.7
## 2011-07-30 00:00:00  3.2
## 2011-07-30 01:00:00  3.8
## 2011-07-30 02:00:00  4.5
## 2011-07-30 03:00:00  6.3
## 2011-07-30 04:00:00  1.7
## 2011-07-30 05:00:00  0.7
## 2011-07-30 06:00:00  0.8
## 2011-07-30 07:00:00  0.3
## 2011-07-30 08:00:00  0.3
## 2011-07-30 09:00:00  0.1
## 2011-07-30 10:00:00  0.7
## 2011-07-30 11:00:00  0.6
## 2011-07-30 12:00:00  0.2
## 2011-07-30 13:00:00  0.3
## 2011-07-30 14:00:00  0.1
## 2011-07-30 15:00:00  0.8
## 2011-07-30 16:00:00  4.4
## 2011-07-30 17:00:00  2.9
## 2011-07-30 18:00:00  3.4
## 2011-07-30 19:00:00  2.3
## 2011-07-30 20:00:00  2.1
## 2011-07-30 21:00:00  0.5
## 2011-07-30 22:00:00  0.4
## 2011-07-30 23:00:00  0.4
## 2011-07-31 00:00:00  0.6
plot(temp_2011_JUL_29_31)

In addition the xts object allows a more convenient approach for subsetting a particular part of the series.

library(ggfortify)
autoplot(ts_FUB_daily["2005/2015", "Temp"])

Exercise: Extract the rainfall data for the station Berlin-Dahlem (FU) for the 10-year period from 2006 to 2015 from the daily and hourly data sets. Compute the cumulative sums and plot the two curves.

# Your code here...
rf_daily <- NULL
rf_hourly <- NULL
Show code
rf_daily <- cumsum(ts_FUB_daily["2006/2015", "Rain"])
rf_hourly <- cumsum(ts_FUB_hourly["2006/2015"])
p1 <- autoplot(rf_daily) +
  ggtitle("Daily rainfall data") + theme_bw() +
  theme(plot.title = element_text(size = 8))
p2 <- autoplot(rf_hourly) +
  ggtitle("Hourly rainfall data") + theme_bw() +
  theme(plot.title = element_text(size = 8))
library("gridExtra")
grid.arrange(p1, p2, ncol = 2, top = "Cummulative Rainfall for the weather station Berlin-Dahlem 2006-2015")


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.