Data preparation

In this section we use two data sets:

First, we load these data sets into memory:

### LOAD DWD DATA ###
load(url("https://userpage.fu-berlin.de/soga/data/r-data/dwd_sp.RData"))
####################################

### LOAD RASTER DATA ###
## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "srtm_germany_ETRS89.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file
unzip(temp)
unlink(temp)
## read in the data
library(raster)
srtm_germany_masked_ETRS89 <- raster("srtm_germany_ETRS89.tif")
####################################
plot(srtm_germany_masked_ETRS89,
  main = "SRTM DEM and DWD weather stations"
)
plot(dwd_sp,
  add = TRUE,
  pch = 16, cex = 0.2
)

Before we proceed we make sure that all of our data shares the same coordinate reference system.

library(rgdal)
newProj <- proj4string(srtm_germany_masked_ETRS89)

# re-project spatial data sets
dwd_sp <- spTransform(dwd_sp,
  CRSobj = newProj
)


# Retrieve Federal States by the the getData() function from the raster package
germany <- getData(country = "Germany", level = 1)
germany <- spTransform(germany,
  CRSobj = newProj
)

Further, we cast the RasterLayer object srtm_germany_masked_ETRS89 to a SpatialPixelsDataFrame object and rename the column in the attribute table to Altitude.

# cast to SpatialPixelsDataFrame
srtm_germany_masked_ETRS89_spdf <- as(
  srtm_germany_masked_ETRS89,
  "SpatialPixelsDataFrame"
)
colnames(srtm_germany_masked_ETRS89_spdf@data) <- "Altitude"

Sample variogram

We can explore the spatial correlation of our predictor variable by plotting a variogram cloud. The variogram cloud is obtained by plotting all possible squared differences of observation pairs \((Z(s_i)−Z(s_j))^2\) against their separation distance \(h_{ij}\). As any point in the variogram cloud refers to a pair of points in the data set, the variogram cloud is used to point us to areas with unusually high or low variability (Bivand et al. 2008).

library(gstat)
ev <- variogram(mean_annual_rainfall ~ 1, dwd_sp, cloud = TRUE)
plot(ev)

The sample variogram plot below is nothing but a plot of averages of semivariogram cloud values over distance intervals \(h\). The variogram() is an extremely versatile function and accepts a ton of additional arguments. For the purpose of this tutorial we keep the defaults. Type ?variogram into your console to review a detailed description of the variogram() function.

ev <- variogram(mean_annual_rainfall ~ 1, dwd_sp)
plot(ev)


Variogram modelling

There are different approaches to select a variogram model:

The eyefit() function from the geoR package offers the possibility to fit a function to the empirical variogram by interactively varying the function type and its parameters. Here, we do not cover this approach, however the interested reader may visit the online documentation for more details (here).

The gstat package ships with the vgm() function, which allows us to specify a wide range of potential variogram models and specific parameters. A list of available models is printed by using the vgm() command.

vgm()
##    short                                      long
## 1    Nug                              Nug (nugget)
## 2    Exp                         Exp (exponential)
## 3    Sph                           Sph (spherical)
## 4    Gau                            Gau (gaussian)
## 5    Exc        Exclass (Exponential class/stable)
## 6    Mat                              Mat (Matern)
## 7    Ste Mat (Matern, M. Stein's parameterization)
## 8    Cir                            Cir (circular)
## 9    Lin                              Lin (linear)
## 10   Bes                              Bes (bessel)
## 11   Pen                      Pen (pentaspherical)
## 12   Per                            Per (periodic)
## 13   Wav                                Wav (wave)
## 14   Hol                                Hol (hole)
## 15   Log                         Log (logarithmic)
## 16   Pow                               Pow (power)
## 17   Spl                              Spl (spline)
## 18   Leg                            Leg (Legendre)
## 19   Err                   Err (Measurement error)
## 20   Int                           Int (Intercept)

The vgm() function is specified by a range of additional arguments such as psill, which corresponds to the (partial) sill of the variogram model, model, which specifies the model type, e.g. “Exp”, “Sph”, “Gau”, “Mat” (see above), range, which specifies the range parameter of the variogram model, kappa, which is a smoothness parameter for the Matern class of variogram models, nugget, which adds the nugget component to the model, anis, a set of parameters to account for anisotropy, among others. Depending on the data set and on the model of choice we have to specify initial values for the different parameters. If we choose initial values too far off from reasonable values, the fit will not succeed.

If we consider the sample variogram from above, an exponential model appears to be a good fit. By eyeballing the sample variogram we set psill = 30000, model = "Exp", range = 100000, and nugget= 25000.

vm <- vgm(psill = 30000, model = "Exp", range = 100000, nugget = 25000)
vm
##   model psill range
## 1   Nug 25000 0e+00
## 2   Exp 30000 1e+05

To verify our parameter choice we plot the sample variogram and overlay it with the variogram model as specified above.

plot(ev, vm)

The variogram model looks reasonable, but for sure not perfect.

We may select the variogram model and its parameters statically, by minimizing some goodness-of-fit criterion. For example the fit.variogram() function from the gstat package allows us to find an appropriate variogram model.

The fit.variogram() function need as arguments at least a sample variogram (object), e.g. the output of the variogram() function, and a model, which specifies the model type, e.g. “Exp”, “Sph”, “Gau”, “Mat”, among others (see output of vgm()). The function fit.variogram() returns an object of class variogramModel extending data.frame. The data.frame object has a logical attribute singular, which indicates whether the non-linear fit converged (FALSE).

Let us apply the fit.variogram() to obtain the parameters of an exponential variogram model.

vm_exp <- fit.variogram(ev, vgm("Exp"))
vm_exp
##   model    psill    range
## 1   Nug 26296.91      0.0
## 2   Exp 31955.10 113131.6

OK, compare the result with our eyeballed parametrization. We have not been too much off.

Let us check whether the fit converges.

attr(vm_exp, "singular")
## [1] FALSE

Perfect, recall that FALSE indicates model convergence.

Note that we may specify the fitting method, used by gstat by setting the additional arguments fit.method. There are four choices implemented:

\[ \begin{array}{c|c} \mathtt{fit.method} & \text{Weight} \\ \hline 1 & N_j \\ 2 & N_j\{\gamma(h_j)\}^2 \\ 6 & 1 \\ 7 & N_j/h_j^2 \end{array} \]

The default value chosen by gstat is fit.method = 7.

Exercise: Apply the fit.variogram() fuction to obtain the parameters for the spherical and the Gaussian variogram model and check each of them for convergence.

## Your code here...
vm_sph <- NULL
vm_gau <- NULL
Show code
vm_sph <- fit.variogram(ev, vgm("Sph"))
attr(vm_exp, "singular")
## [1] FALSE
vm_gau <- fit.variogram(ev, vgm("Gau"))
attr(vm_exp, "singular")
## [1] FALSE

Nice! Both models converge.


Now, let us visualize our three different variogram models we built above.

plot(ev, vm_exp, main = "Exponential variogram model")

Exercise: Visualize the spherical and the Gaussian variogram model. Which of the three models seems to be the best fit by visual inspection?

## Your code here...
Show code
plot(ev, vm_sph, main = "Spherical variogram model")

plot(ev, vm_gau, main = "Gaussian variogram model")

Based on the visual inspection, the exponential variogram model (vm_exp) appears to be the best fit.


In addition to the fit.variogram() function the automap package provides the autofitVariogram(). The function accepts a formula (formula) and a data argument (input_data). The following expression fits a variogram model to our data:

library(automap)
vm_auto <- autofitVariogram(
  formula = mean_annual_rainfall ~ 1,
  input_data = dwd_sp
)
plot(vm_auto)

In the plot above the points show the sample empirical variogram. The numbers indicate the number of pairs assigned to the bin, and the continuous line is the fitted variogram model. In this case, the selected model was Matérn, M. Stein’s parameterization (code name “Ste”), with the four parameter values stated in the bottom-right corner of the image.

The function returns an object of type autofitVariogram. This object contains the fitted variogram model, which can be accessed with the $ operator.

vm_auto$var_model
##   model    psill    range kappa
## 1   Nug 15442.86      0.0   0.0
## 2   Ste 50425.68 283865.8   0.2

Anisotropy

So far we assumed that the spatial correlation structure is the same in all directions (isotropic). Hence, the covariance function depends only on the magnitude of the lag vector \(h\) and not on the direction. In many cases, however, the autocorrelation shows a structure that is different in different directions. This property of a spatial process is referred to as anisotropy.

The most commonly employed model for anisotropy is geometric anisotropy, with the semivariogram reaching the same sill in all directions, but at different ranges. To check for directional dependence, we use directional empirical semivariograms, which include only those values for data pairs falling within certain directional bands as well as falling within the prescribed lag limits.

With the gstat package we can account for anisotropy by specifying directional bands by an azimuthal direction. The variogram() function accepts an alpha parameter, which indicates the direction in plane in positive degrees clockwise. For example, we may specify the main directions to be \(0^\circ\), \(45^\circ\), \(90^\circ\), and \(135^\circ\) with angular tolerance of \(22.5^\circ\).

alpha <- (0:3) * 45
alpha
## [1]   0  45  90 135

If we plug the additional argument alpha into the variogram() function call and plot it, we see a panel plot, where each plot corresponds to a certain directional band. Note that directional semivariograms are often noisier due to the reduced number of data pairs used for the estimation.

ev_anis <- variogram(mean_annual_rainfall ~ 1, dwd_sp, alpha = alpha)
plot(ev_anis)

The plot does not show overwhelming evidence of anisotropy. Let us add our exponential variogram model we fitted in the previous section (vm_exp) to see if it fits the data in all four directional bands.

plot(ev_anis, vm_exp)