Working with time series data is relatively straight forward with R. However, one issue needs special attention. R is a object-oriented programming language. This means that we have to be aware of the data representation, referred to as object class, as this representation dictates which functions will be available for loading, processing, analysing, printing, and plotting our data.
The core data object for holding data in R is the
data.frame
object. The data.frame
object,
however, is not designed to work efficiently with time series data.
Fortunately, there are several R packages, such as ts
,
zoo
, xts
, lubridate
, and
forecast
, among others, with functions for creating,
manipulating and visualizing time date and time series objects. However,
along with the variety of available packages comes a variety of
different object classes and data representations.
The base distribution of R includes a time series class called
ts
. This object class is broadly used for the
representation of time series data, however, the associated functions
are limited in scope, usefulness, and power. This deficits caused the
emergence of many additional and substitutional third party packages,
which extend the functionality of the ts
object class, or
even provide very different forms of data representation by introducing
their own object class.
An excellent overview of functionality and applications for time series analysis, and available R packages is given by CRAN Task View on Time Series Analysis. At the time of writing there were 233 R packages available that deal or at least are associated with time series analysis.
In the subsequent section we will deal mostly with the following packages:
ts
xts
zoo
lubridate
forecast
Please be aware that the functions we apply in the subsequent sections may be implemented in other packages as well. Depending on your particular application it might useful to look out for other packages as well.
Date
classThe base R Date
class handles dates without times. The
Date
class by default represents dates internally as the
number of days since January 1, 1970. Using the as.Date()
function allows us to create Date
objects from a character
string. The default format is “YYYY/m/d”
or
“YYYY-m-d”
my_Date <- as.Date("2022/6/01")
my_Date
## [1] "2022-06-01"
class(my_Date)
## [1] "Date"
The additional argument format
allows more flexibility
in creating Date
objects.
as.Date("12/31/1999", format = "%m/%d/%Y")
## [1] "1999-12-31"
as.Date("April 13, 1978", format = "%B %d, %Y")
## [1] "1978-04-13"
as.Date("25JAN17", format = "%d%b%y")
## [1] "2017-01-25"
The standard date format codes are given below:
\[
\begin{array}{|c|l|}
\hline
\text{Code} & \text{Value} \\
\hline
\mathtt{\%d} & \text{Day of the month (number)} \\
\mathtt{\%m} & \text{Month (number)} \\
\mathtt{\%b} & \text{Month (abbreviated)} \\
\mathtt{\%B} & \text{Month (full name)} \\
\mathtt{\%y} & \text{Year (2 digit)} \\
\mathtt{\%Y} & \text{Year (4 digit)} \\
\hline
\end{array}
\] The format()
function is used to extract a
component from the Date
object.
my_Date
## [1] "2022-06-01"
format(my_Date, "%Y")
## [1] "2022"
as.numeric(format(my_Date, "%Y"))
## [1] 2022
In addition the weekdays()
, months()
and
quarters()
functions can be used to extract specific
components of Date
objects.
weekdays(my_Date)
## [1] "Wednesday"
months(my_Date)
## [1] "June"
quarters(my_Date)
## [1] "Q2"
A sequences of dates can be created with the function
seq()
. Therefore, we need to specify the starting date
(from
), the ending date (to
) and the increment
(by
) of the sequence. The by
increment is a
character string, containing one of “day”
,
“week”
, “month”
or “year”
, and
can be preceded by a (positive or negative) integer and a space.
seq(
from = as.Date("2022/6/1"),
to = as.Date("2022/7/31"),
by = "1 week"
)
## [1] "2022-06-01" "2022-06-08" "2022-06-15" "2022-06-22" "2022-06-29"
## [6] "2022-07-06" "2022-07-13" "2022-07-20" "2022-07-27"
POSIXt
classesThe base R POSIXt
classes allow for dates and times with
control for time zones. There are two POSIXt
sub‐classes
available in R: POSIXct
and POSIXlt
.
POSIXct
objectThe POSIXct
class represents date‐time values as the
signed number of seconds since midnight GMT (UTC – universal time,
coordinated) 1970‐01‐01.
The as.POSIXct()
function allows us to create
POSIXct
objects from a character string representation of a
date‐time. The default format of the date‐time is
“YYYY-mm-dd hh:mm:ss”
or “YYYY/mm/dd hh:mm:ss”
with the hour, minute and second information being optional.
my_Date_Time <- "2022-06-01 12:10:35"
my_Date_Time
## [1] "2022-06-01 12:10:35"
as.POSIXct(my_Date_Time)
## [1] "2022-06-01 12:10:35 CEST"
If no time zone specification is given in the optional argument
tz
, then the default value specifies the local system
specific time zone as given by the Sys.timezone()
function.
Sys.timezone()
## [1] "Europe/Berlin"
Again the optional format
argument is used if the
date‐time string is not in the default format.
as.POSIXct("30-6-2021 23:25", format = "%d-%m-%Y %H:%M")
## [1] "2021-06-30 23:25:00 CEST"
The most common set of format codes for representing character
date-times are listed in the help file for the function
strptime()
(type help(strptime)
into your
console).
POSIXlt
objectThe POSIXlt
class represents date‐time values as a named
list with elements for the second (sec
), minute
(min
), hour (hour
), day of the month
(mday
), month (mon
), year (year
),
day of the week (wday
), day of the year
(yday
), and daylight savings time flag
(isdst
), respectively.
A POSIXlt
object can be created using the
as.POSIXlt()
or strptime()
functions. That
allows us to extract a particular component from the
POSIXlt
object using the $
notation.
my_Date_Time_POSIXlt <- as.POSIXlt(my_Date_Time)
my_Date_Time_POSIXlt
## [1] "2022-06-01 12:10:35 CEST"
my_Date_Time_POSIXlt$sec
## [1] 35
my_Date_Time_POSIXlt$min
## [1] 10
my_Date_Time_POSIXlt$hour
## [1] 12
Converting POSIXt
objects to Date
objects
removes within day time information as well as time zone
information.
as.Date(my_Date_Time_POSIXlt)
## [1] "2022-06-01"
In the next paragraph we introduce the lubridate
package. For further information type vignette("lubridate")
into your console. The lubridate
package provides a variety
of functions that make it easier to work with dates and times in R.
The lubridate
package makes parsing of date-times easy
and fast by providing functions such as ymd()
,
ymd_hms()
, dmy()
, dmy_hms()
,
mdy()
, among others.
library(lubridate)
# convert a number into a data object
ymd(19991215) # year-month-date
## [1] "1999-12-15"
ymd_hm(199912151533) # year-month-date-hour-minute
## [1] "1999-12-15 15:33:00 UTC"
mdy("April 13, 1978") # month date year
## [1] "1978-04-13"
dmy(241216) # day-month-year
## [1] "2016-12-24"
Further, the lubridate
package provides simple functions
to get and set components of a date-time, such as year()
,
month()
, week()
, mday()
,
wday()
, yday()
,hour()
,
minute()
, and second()
:
today <- Sys.time()
today
## [1] "2023-06-05 11:16:39 CEST"
year(today) # year
## [1] 2023
month(today) # month
## [1] 6
month(today, label = TRUE) # labelled month
## [1] Jun
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
month(today, label = TRUE, abbr = FALSE) # labelled month
## [1] June
## 12 Levels: January < February < March < April < May < June < ... < December
week(today) # week
## [1] 23
mday(today) # day
## [1] 5
wday(today) # weekday
## [1] 2
wday(today, label = TRUE) # labelled weekday
## [1] Mon
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(today, label = TRUE, abbr = FALSE) # labelled weekday
## [1] Monday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
yday(today) # day of the year
## [1] 156
hour(today) # hour
## [1] 11
minute(today) # minute
## [1] 16
second(today) # second
## [1] 39.96189
In addition to the variety of functions listed above the
as.yearmon()
and the as.yearqtr()
functions
from the zoo
package are convenient when working with
regularly spaced monthly and quarterly data.
library(zoo)
as.yearmon(today)
## [1] "Jun 2023"
format(as.yearmon(today), "%B %Y")
## [1] "June 2023"
strftime(as.yearmon(today), format = "%B %Y")
## [1] "June 2023"
as.yearqtr(today)
## [1] "2023 Q2"
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.