10620_data_structures.knit

Vectors

A vector is a collection of elements. Most common are character, logical, integer, numeric or factor.

There are many ways to initialize a vector object.

x <- numeric(0)
class(x)

## [1] "numeric"

y <- "cat"
class(y)

## [1] "character"

We can add elements to a vector using the c() (combine) function

aa <- c("John", "Sahra", "I")
aa

## [1] "John"  "Sahra" "I"

Or create vectors from a sequence of numbers using either the colon operator : or the seq() function.

s1 <- 1:25
s1

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

s2 <- seq(from = 1, to = 25, by = 2)
s2

##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25

Vector Arithmetics

A very convenient feature in R is that arithmetic operations on vectors are performed member-by-member.

Suppose we have two vectors $\mathbf u$ and $\mathbf v$.

u <- c(10, 30, 50, 70, 90)
v <- c(20, 40, 60, 80, 100)

If we multiply vector $\mathbf u$ by 100, we would get a vector with each of its members multiplied by 100.

u * 100

## [1] 1000 3000 5000 7000 9000

Similarly, if we add $\mathbf u$ and $\mathbf v$, the result corresponds to the sum of the corresponding members from $\mathbf u$ and $\mathbf v$.

u + v

## [1]  30  70 110 150 190

This concept as well holds true for subtraction, multiplication and division.

u - v

## [1] -10 -10 -10 -10 -10

u * v

## [1]  200 1200 3000 5600 9000

u / v

## [1] 0.5000000 0.7500000 0.8333333 0.8750000 0.9000000

The Recycling Rule

What if we compute two vectors of unequal length?

If you encounter such a case R applies the recycling rule. Hence, any short vector operands are extended and R is recycling its values until they match the size of any other operands.

For example, the following vectors $\mathbf v$ and $\mathbf w$ have different lengths and their sum is computed by recycling values of the shorter vector $\mathbf v$.

v <- c(1, 2, 3)
w <- c(100, 200, 300, 400, 500, 600, 700, 800, 900)
v + w

## [1] 101 202 303 401 502 603 701 802 903

Consider one more example, which showcases a nice application of the recycling rule. Let us create a vector $\mathbf m$, a vector of fives of length 20. We may apply the handy rep() function to construct such a vector.

l <- 20 # length
m <- rep(5, l)
m

##  [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

Now, if we want to turn every second item into the number $-5$, we simply apply the recycling rule by multiplying $\mathbf m$ with a vector of the form [1,-1].

c(1, -1) * m

##  [1]  5 -5  5 -5  5 -5  5 -5  5 -5  5 -5  5 -5  5 -5  5 -5  5 -5

Vector Indexing

In order to retrieve values of a vector we use the single square bracket [] operator.

Let us create a vector $\mathbf s$ with five entries.

s <- c("aa", "bb", "cc", "dd", "ee")

In order to access the second vector member we use the index position 2 for retrieving the second member.

s[2]

## [1] "bb"

If we provide a negative index, the member whose position has the same absolute value as the negative index will be striped off the vector.

s[-2]

## [1] "aa" "cc" "dd" "ee"

If an index is out-of-range, a missing value will be reported via the symbol NA.

s[10]

## [1] NA

A vector can be sliced by a numeric index vector.

s[c(2, 5)]

## [1] "bb" "ee"

To produce a vector slice between two indexes, we apply the colon operator :.

s[2:5]

## [1] "bb" "cc" "dd" "ee"

A vector can be sliced from a given vector with a logical index vector, which has the same length as the original vector. Its members are TRUE if the corresponding members in the original vector are to be included in the slice, and FALSE if otherwise.

n <- c(FALSE, TRUE, FALSE, TRUE, FALSE)
s[n]

## [1] "bb" "dd"

Matrices

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. In R we create a matrix with the matrix() function. Therefore we have to specify the data argument, the desired number of rows and number of columns by the nrow and ncol arguments and the byrow argument, which specifies if the matrix is filled by columns (the default) or by rows (byrow = TRUE).

M <- matrix(
  data = c(2, 4, 3, 1, 5, 7), # the data elements
  nrow = 2, # number of rows
  ncol = 3, # number of columns
  byrow = TRUE
) # fill matrix by rows
M

##      [,1] [,2] [,3]
## [1,]    2    4    3
## [2,]    1    5    7

An element at the m^th row, n^th column of $\mathbf M$ can be accessed by the expression $\mathbf M[m,n]$.

M[1, 3] # element at 1st row, 3rd column

## [1] 3

The entire m^th row can be extracted as $\mathbf M[m,]$.

M[2, ] # the 2nd row

## [1] 1 5 7

Similarly, the entire n^th column can be extracted as $\mathbf M[,n]$.

M[, 3] # the 3rd column

## [1] 3 7

Of course we can also extract more than one rows or columns at a time.

M[, c(1, 3)]

##      [,1] [,2]
## [1,]    2    3
## [2,]    1    7

By applying the cbind() and rbind() functions we combine matrices horizontally and vertically. The function returns a matrix.

##      [,1] [,2] [,3]
## [1,]    2    4    3
## [2,]    1    5    7

cbind(M, M)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    4    3    2    4    3
## [2,]    1    5    7    1    5    7

rbind(M, M)

##      [,1] [,2] [,3]
## [1,]    2    4    3
## [2,]    1    5    7
## [3,]    2    4    3
## [4,]    1    5    7

Note that by using these commands we may easily extend any matrix with a vector of appropriate length. We check the dimensions of a matrix object by applying the dim() function, or by using the nrow() and ncol() commands.

no_of_rows <- dim(M)[1]
no_of_cols <- dim(M)[2]
dim(M)

## [1] 2 3

nrow(M)

## [1] 2

ncol(M)

## [1] 3

Let us create two appropriate vectors …

v1 <- rep(1, no_of_rows)
v2 <- rep(2, no_of_cols)

… and combine them with the matrix $\mathbf M$.

cbind(M, v1)

##            v1
## [1,] 2 4 3  1
## [2,] 1 5 7  1

rbind(M, v2)

##    [,1] [,2] [,3]
##       2    4    3
##       1    5    7
## v2    2    2    2

Matrix Algebra

R provides a rich environment for matrix algebra, also referred to as linear algebra.

Hence, we construct two matrices, $\mathbf M$, a 2 by 2 matrix, and $\mathbf N$, a 3 by 2 matrix.

M <- matrix(c(1, 2, 3, 4), nrow = 2)
M

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

N <- matrix(c(9, 8, 7, 6, 5, 4), nrow = 3)
N

##      [,1] [,2]
## [1,]    9    6
## [2,]    8    5
## [3,]    7    4

Basic algebraic operations (+ and -)

M + M

##      [,1] [,2]
## [1,]    2    6
## [2,]    4    8

N - N

##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0
## [3,]    0    0

Scalar multiplication and scalar division

M * 10

##      [,1] [,2]
## [1,]   10   30
## [2,]   20   40

N / 10

##      [,1] [,2]
## [1,]  0.9  0.6
## [2,]  0.8  0.5
## [3,]  0.7  0.4

Transpose of a matrix

We can check the dimensions of a matrix by applying the dim() function.

dim(N)

## [1] 3 2

t(N)

##      [,1] [,2] [,3]
## [1,]    9    8    7
## [2,]    6    5    4

dim(t(N))

## [1] 2 3

Element-wise multiplication and division

M * M

##      [,1] [,2]
## [1,]    1    9
## [2,]    4   16

M / M

##      [,1] [,2]
## [1,]    1    1
## [2,]    1    1

Matrix multiplication (inner product)

M %*% M

##      [,1] [,2]
## [1,]    7   15
## [2,]   10   22

Note that M %*% N will cause an error,

Error in base::"%*%"(x, y) : non-conformable arguments

because the inner dimensions of the matrices are not of the same length.

dim(M)

## [1] 2 2

dim(N)

## [1] 3 2

Transposing the matrix $\mathbf N$ solves that issue.

dim(M)

## [1] 2 2

dim(t(N))

## [1] 2 3

M %*% t(N)

##      [,1] [,2] [,3]
## [1,]   27   23   19
## [2,]   42   36   30

Inverse of a square matrix

solve(M)

##      [,1] [,2]
## [1,]   -2  1.5
## [2,]    1 -0.5

Row means and column means

As the computation of the sum and the mean of rows and columns of a matrix is a very common task, R provides for our convenience the rowMeans(), rowSums(), colMeans(), and colSums() functions

N # Matrix N

##      [,1] [,2]
## [1,]    9    6
## [2,]    8    5
## [3,]    7    4

rowMeans(N) # Returns vector of row means.

## [1] 7.5 6.5 5.5

rowSums(N) # Returns vector of row sums.

## [1] 15 13 11

colMeans(N) # Returns vector of column means.

## [1] 8 5

colSums(N) # Returns vector of column sums.

## [1] 24 15

Lists

Lists are R objects which may contain elements of different types, such as numbers, strings, vectors, another list, a matrix or a function as its elements. Hence, this data structure if often used as a container to store or organize R objects. A list is created by using the list() function.

Let us create a list L containing strings, numbers, vectors, logical values and a matrix.

v1 <- 1978
v2 <- c("Love", "Hate")
v3 <- seq(10, 100, 10)
v4 <- c(TRUE, TRUE, FALSE)
v5 <- matrix(v3, nrow = 2)
L <- list(v1, v2, v3, v4, v5)
class(L)

## [1] "list"

By calling the str() command we may inspect the structure of the list object.

str(L)

## List of 5
##  $ : num 1978
##  $ : chr [1:2] "Love" "Hate"
##  $ : num [1:10] 10 20 30 40 50 60 70 80 90 100
##  $ : logi [1:3] TRUE TRUE FALSE
##  $ : num [1:2, 1:5] 10 20 30 40 50 60 70 80 90 100

Using the single square bracket [] operator we retrieve a slice of the list.

L[5] returns the 5^th element of the list object L, which in our case corresponds to the matrix object.

L[5]

## [[1]]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100

Note that the slice is still a list object.

class(L[5])

## [1] "list"

In order to reference a list element directly, we have to use the double square bracket [[]] operator.

L[[5]]

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100

class(L[[5]])

## [1] "matrix" "array"

A very convenient feature of lists is that we may assign names to list elements, and hence, reference them by name instead of a numeric index. We can either provide the names of the elements during the construction of the list or use the names() function.

list(
  "scalar" = v1,
  "character_vector" = v2,
  "numeric_vector" = v3,
  "logical_vector" = v4,
  "matrix" = v5
)

## $scalar
## [1] 1978
## 
## $character_vector
## [1] "Love" "Hate"
## 
## $numeric_vector
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## $logical_vector
## [1]  TRUE  TRUE FALSE
## 
## $matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100

However, for the purpose of this tutorial we apply the names() function.

names(L) <- c(
  "scalar", "character_vector",
  "numeric_vector", "logical_vector",
  "matrix"
)

Therefore we can slice the list by using the element names.

L[c("scalar", "numeric_vector")]

## $scalar
## [1] 1978
## 
## $numeric_vector
##  [1]  10  20  30  40  50  60  70  80  90 100

Again, in order to reference a list element directly, we have to use the double square bracket [[]] operator.

L[["scalar"]]

## [1] 1978

Alternatively, a named list element can also be referenced directly with the $ operator.

L$matrix

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100

We can as well manipulate list objects and add, delete and update list elements. Note that we can add elements only at the end of a list.

L["new_element"] <- "I am a new element at the end of the list"
L

## $scalar
## [1] 1978
## 
## $character_vector
## [1] "Love" "Hate"
## 
## $numeric_vector
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## $logical_vector
## [1]  TRUE  TRUE FALSE
## 
## $matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100
## 
## $new_element
## [1] "I am a new element at the end of the list"

L[1] <- NULL # Remove the first element.
L

## $character_vector
## [1] "Love" "Hate"
## 
## $numeric_vector
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## $logical_vector
## [1]  TRUE  TRUE FALSE
## 
## $matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100
## 
## $new_element
## [1] "I am a new element at the end of the list"

L[3] <- "updated element"
L

## $character_vector
## [1] "Love" "Hate"
## 
## $numeric_vector
##  [1]  10  20  30  40  50  60  70  80  90 100
## 
## $logical_vector
## [1] "updated element"
## 
## $matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]   10   30   50   70   90
## [2,]   20   40   60   80  100
## 
## $new_element
## [1] "I am a new element at the end of the list"

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.