R includes several functions that can be used for reading, visualizing, and analyzing spatial data. However, only the contribution of the open-source software community makes R such an extremely powerful analysis and visualization tool for spatial data. CRAN Task View - Analysis of Spatial Data lists and describes many of those contributed packages. In the subsequent sections, we deal primarily with the following packages:

Please be aware that the methods we apply in the subsequent sections may also be implemented in other packages. Depending on your particular application it might be useful to look out for other packages.


1. The terra package

The terra package provides bindings to the Geospatial Data Abstraction Library (GDAL) and access to projection and transformation operations from the PROJ library. This software is the basis for spatial software applications and geodata analysis within R and is the successor of the raster package. Its bindings allow fast I/O (input/output) operations for GDAL-supported raster formats, OGR-supported vector formats and numerous utility functions to convert geographic longitude and latitude coordinates into Cartesian coordinates (and vice versa). Furthermore, the terra package simplifies the access to large raster data sets and extends the analytic tools available for both raster and vector data. Additionally, the rasterVis package provides enhanced visualization opportunities.

Working with Vectordata over the SpatVector class

Vector data is handled within the terra package over the SpatVector class representation. Since we need to deal with vector-based spatial data for point pattern analysis, we want to introduce some basic programming concepts using the terra package.

Firstly, let us create a SpatVector class object by ourselves. Therefore, we need point coordinates and to decide on the coordinate reference system for our data. For the first task we use the geocode_OSM() [function]((https://rdrr.io/cran/tmaptools/man/geocode_OSM.html) in the tmaptools() package. This function allows us to geocode a location (i.e., to find its latitude and longitude) using the OpenStreetMap-API.

Exercise: Retrieve the coordinates for the GeoCampus Lankwitz using the geocode_OSM(...,projection = "+init=epsg:4326",...) function for the OSM API. Hint: You will get a list with three entries. Extract the xy-coordinates and transform them into a data frame with “long” and “lat” column headers.

# your code here
geocampus <- NULL
Show code
library(tmaptools)
geocampus <- geocode_OSM("Malteserstrasse 74-100, Berlin,Germany",
  projection = "+init=epsg:4326",
  return.first.only = TRUE,
  details = FALSE,
  as.data.frame = NA,
  as.sf = FALSE,
  server = "http://nominatim.openstreetmap.org"
)

geocampus <- as.data.frame(t(as.matrix(geocampus$coords)))

colnames(geocampus) <- c("long", "lat")
geocampus
##       long      lat
## 1 13.35803 52.42625

Nice!

In order to convert the string-based coordinates as a data.frame representation to a SpatVector object, we have to add a coordinate reference system (CRS). In general, a CRS can be defined in several ways, for example as Well-known text (WKT) format, JSON format, GML format or in the Proj4 format, among many others.

In R, we use the Proj4 format for spatial objects. This format is a generic, string-based description of a CRS, understood by the proj.4 library. It defines projection types and defines parameter values for particular projections. For instance, the Proj4 format string for the European Terrestrial Reference System 1989 (ETRS89) is \[\mathtt {+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no\_defs }.\]

With respect to the enormous number of existing CRSs, it is impossible to remember all of those specifications. For historical reasons the International Association of Oil & Gas Producers (IOGP), formerly known as European Petroleum Survey Group (EPSG), built a collection of definitions for global, regional, national and local coordinate reference systems and coordinate transformations, the EPSG Geodetic Parameter Dataset. Each coordinate reference system gets a unique integer identifier within this collection, commonly denoted as EPSG. For instance, the EPSG identifier for the ETRS89 is simply \(\mathtt 4258\) (for other commonly used representations of ETRS89 go here or here).

To conclude, we feed the Spatial objects with the Proj4 format string (proj4string) or simply provide the EPSG identifier. For our example, we use the EPSG identifier \(4326\), which corresponds to the latest revision of the World Geodetic System (WGS84).

The function crs() of the terra package transforms the character string into a coordinate reference system (crs) object.

library(terra)
latlong <- crs("+init=epsg:4326")

geocampus_vect <- vect(geocampus, 
                       geom = c("long", "lat"), 
                       crs = latlong)
geocampus_vect
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 1, 0  (geometries, attributes)
##  extent      : 13.35803, 13.35803, 52.42625, 52.42625  (xmin, xmax, ymin, ymax)
##  coord. ref. : lon/lat WGS 84

Geographic coordinates are difficult to understand without a geographical context. Thus, we plot the location of the GeoCampus Lankwitz using the mapview package.

library(mapview)
library(sf)
img <- "http://www.osa.fu-berlin.de/geographie_bsc/_medien/bilder_geocampus/bild_010/010_546.jpeg"

mapview(sf::st_as_sf(geocampus_vect),
  map.types = "Esri.WorldImagery",
  popup = leafpop::popupImage(img, src = "remote")
)

Well, the geocoding did a quite reasonable job!

Working with attribute values within SpatVector objects

Furthermore, the SpatVector class allows us to store additional information, similar to the concept of an attribute table in a GIS.

The additional information is kept in the spatial data representation as data.frame. To showcase the data handling, we read in an ESRI shapefile with point information.

The shapefile gis_osm_pois_free_1.shp stores points of interests in Berlin provided by OpenStreetMap (OSM). The data was downloaded from GEOFABRIK on August 27, 2022, and contains OpenStreetMap data as of August 26, 2022 (see here).

# build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "berlin-latest-free.shp.zip"

## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
library(terra)
pois <- vect("gis_osm_pois_free_1.shp")

Several methods are available to access the associated data within the spatial SpatVector object. Firstly, over the ext() function the object’s spatial extent is retrieved.

ext(pois)
## SpatExtent : 13.0840658, 13.7602896, 52.3379366, 52.6726975 (xmin, xmax, ymin, ymax)

The str() function shows the structure of the Spatial class object.

str(pois)
## S4 class 'SpatVector' [package "terra"]

The summary() method shows the bounding box, whether the object is projected, and the number of rows of coordinates.

summary(pois)
##     osm_id               code         fclass              name          
##  Length:83123       Min.   :2001   Length:83123       Length:83123      
##  Class :character   1st Qu.:2303   Class :character   Class :character  
##  Mode  :character   Median :2701   Mode  :character   Mode  :character  
##                     Mean   :2604                                        
##                     3rd Qu.:2902                                        
##                     Max.   :2964

We can assess the object’s attribute data via data.frame syntax, for example over the $ sign.

Exercise: List the first 15 entries of the attribute table stored in the data slot using the head function.

# your code here
Show code
head(pois, 15)
##      osm_id code              fclass                            name
## 1  16541597 2907 camera_surveillance                            Aral
## 2  26735749 2301          restaurant                            Aida
## 3  26735753 2006           telephone                            <NA>
## 4  26735759 2301          restaurant                      Madame Ngo
## 5  26735763 2301          restaurant                      Thanh Long
## 6  26754448 2701        tourist_info                            <NA>
## 7  26865440 2307          biergarten                   Spinnerbrücke
## 8  26867409 2031     recycling_glass                            <NA>
## 9  26972366 2724            memorial                     Konrad Zuse
## 10 27001739 2907 camera_surveillance                            <NA>
## 11 27318009 2307          biergarten                         Loretta
## 12 27378775 2002        fire_station Freiwillige Feuerwehr Bohnsdorf
## 13 27431777 2724            memorial              Luftbrückendenkmal
## 14 27431777 2725             artwork              Luftbrückendenkmal
## 15 28968292 2601                bank              Berliner Volksbank

The str() function reveals that we are actually dealing with a data.frame object.

str(pois)
## S4 class 'SpatVector' [package "terra"]

Consequently we can apply methods on that data.frame object.

Exercise: Count the number of points of interest corresponding to surveillance cameras.

# your code here 
Show code
sum(pois$fclass == "camera_surveillance")
## [1] 2755

Exercise: Subset the SpatVector object to only include points which are surveillance cameras.

# your code here 
surveillance_spdf <- NULL
summary(surveillance_spdf)
Show code
surveillance_spdf <- pois[pois$fclass == "camera_surveillance", ]
summary(surveillance_spdf)
##     osm_id               code         fclass              name          
##  Length:2755        Min.   :2907   Length:2755        Length:2755       
##  Class :character   1st Qu.:2907   Class :character   Class :character  
##  Mode  :character   Median :2907   Mode  :character   Mode  :character  
##                     Mean   :2907                                        
##                     3rd Qu.:2907                                        
##                     Max.   :2907

The generic plot() function, internally linked to the terra package association when a SpatVector object is given as argument, provides a visualization of the data.

plot(surveillance_spdf)

Let us contextualize the data by plotting it onto an interactive web map. For this purpose, a typecast to a sf object is required.

mapview(sf::st_as_sf(surveillance_spdf))

Working with raster data over using terra and rasterVIS

To showcase some of the features provided by the terra and rasterVis packages in terms of handling, manipulation and visualization of raster data, we download the SRTM digital elevation model (DEM) for Germany. The data was retrieved from OPENDEM on November 07, 2022. Please note the zipped file is about 90 MB in size.

## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "srtm_germany_dtm.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file
unzip(temp)
unlink(temp)

To load the raster data set we apply the rast() function provided by the terra package.

# read the data
srtm <- rast("srtm_germany_dtm.tif")

Printing the SpatRaster object summarizes the raster data set.

srtm
## class       : SpatRaster 
## dimensions  : 10801, 13201, 1  (nrow, ncol, nlyr)
## resolution  : 0.0008333333, 0.0008333333  (x, y)
## extent      : 4.999583, 16.00042, 46.99958, 56.00042  (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326) 
## source      : srtm_germany_dtm.tif 
## name        : srtm_germany_dtm

When working with large data sets, checking the file size in advance is recommended. The processing time of large data may be quite long.

fs <- file.size("srtm_germany_dtm.tif")
print(paste("The file size is", as.numeric(fs) / 1e6, "MB"))
## [1] "The file size is 285.254776 MB"

Owing to the large file size, we downscaled the raster data set using the aggregate() function. The aggregate() function performs aggregation by grouping adjacent pixels into new ones, given an input raster and the aggregation factor fact. For example, using fact = 5 means that each set of 5 \(\times\) 5 cells in the original raster becomes a single cell in the new raster. In addition, the fun argument specifies which function will be used to calculate the new values based on existing ones (the default is mean).

Exercise: Apply the aggregate() function on our SRTM raster, aggregating 5 \(\times\) 5 sets of cells and plot it.

# your code here
srtm_small <- NULL
plot(srtm_small)
Show code
srtm_small <- aggregate(srtm, 5, fun = mean)
## 
|---------|---------|---------|---------|
=========================================
                                          
plot(srtm_small, col = viridisLite::plasma(3500))

We want to clean up the data set and exclude the area that covers the ocean.

Exercise: Set all values in the SRTM data set that are equal to 0 to NA.

# your code here
Show code
srtm_small[srtm_small == 0] <- NA

Now we plot the aggregated and cleaned up data set.

plot(srtm_small,
  main = "DEM-SRTM Germany",
  col = viridisLite::plasma(3500)
)

Well, still there could be done some further adaptations, but we keep it for now.

Now, we want to subset the raster data set to reflect our area of interest (AOI). Therefore, we apply the crop() function and feed it with a SpatVect object, in our case the district border of the city of Berlin (berlin_vect).

## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "Berlin_vect.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file
unzip(temp)
unlink(temp)
berlin_vect <- vect("Berlin_vect.shp")
srtm_berlin <- crop(srtm, berlin_vect)

We may plot the resulting subset by the generic plot() function. In order to add additional layers we simply call the plot() function once again, this time however, we add the argument add = TRUE to the function call. To showcase this functionality we plot our AOI (the area of Berlin) and add the a spatialPoints object, representing the location of the GeoCampus Lankwitz (geocampus_vect).

plot(srtm_berlin,
  main = "DEM-SRTM Berlin and GeoCampus Lankwitz",
  col = viridisLite::viridis(100)
)
plot(geocampus_vect, col = "#FF3300", add = TRUE, pch = 16)

Another useful function is the mask() function, which creates a new Raster object with the same values as the input Raster object, except for the masked cells, which are set to NA by default. Therefore, we need to transform the sp object to the equivalent terra package representation by using the vect() function.

Exercise: Use the mask() function to mask out the area outside the district of Berlin. Use the objects srtm_berlin and berlin_sp_wgs84. Plot the result.

# your code here
srtm_berlin_masked <- NULL

plot(srtm_berlin_masked,
  main = "DEM-SRTM Berlin and GeoCampus Lankwitz",
  cex.main = 0.7,
  col = viridisLite::viridis(100)
)
plot(geocampus_vect, col = "#FF3300", add = TRUE, pch = 16)
Show code
srtm_berlin_masked <- mask(srtm_berlin, berlin_vect)
plot(srtm_berlin_masked,
  main = "DEM-SRTM Berlin and GeoCampus Lankwitz",
  cex.main = 0.7,
  col = viridisLite::viridis(10)
)
plot(geocampus_vect, col = "#FF3300", add = TRUE, pch = 16)

The rasterVis package defines additional methods for spatial raster data access, manipulation and visualization (for more information on the rasterVis package, go here).

Here, we apply the levelplot() function which is optimized for displaying raster data and provides some very nice additional graphic features. Type ??rasterVis::levelplot in your console for further information.

Exercise: Tweak the arguments of the levelplot() function to change the visualization.

library(rasterVis)

# your code here
Show code
library(rasterVis)
m <- 20
srtm_berlin_reproj <- terra::project(srtm_berlin_masked, "epsg:25832")
levelplot(srtm_berlin_reproj,
  margin = FALSE,
  contour = TRUE,
  at = seq(0, minmax(srtm_berlin)[2], m),
  main = paste("DEM-SRTM, contour lines", m, "m"),
  par.settings = rasterTheme()
)

Another useful function in the terra package is the terrain() function. This function computes the slope, aspect and other terrain characteristics from a raster with elevation data.

slope <- terrain(srtm_berlin,
                 v = "slope", neighbors=8,
                 unit = "degrees")

levelplot(slope,
  margin = FALSE,
  par.settings = BuRdTheme(),
  main = "Slope"
)

The levelpot() function returns a trellis object, which relates to the lattice graphics package. In order to overlay different trellis objects we make use of the layer() function of the latticeExtra package. For example, let us add the administrative border of Berlin. We overlay trellis objects with the + sign.

levelplot(slope,
          margin = FALSE,
          par.settings = BuRdTheme(),
          main = "Slope", 
          panel=function(...){
            panel.levelplot(...)
            panel.polygon(geom(terra::project(berlin_vect,
                                              crs(slope)))[, c(3,4)])
            })


2. The sf package

The sf package encodes spatial vector data according to the Simple Features standard described by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO). The sf package provides bindings to GDAL for reading and writing data, to GEOS for geometrical operations, and to proj.4 for projection conversions and datum transformations.

The sf package represents simple features as native R objects, using simple data structures (S3 classes, lists, matrices, vectors). All functions and methods in sf that operate on spatial data are prefixed by st_, which refers to spatial and temporal. There are three classes used to represent simple features:

sf is the R package for spatial analysis that gradually replaced the well known sp package, which retired together with the rgdal package in October 2023. For more details on the sf package refer to the documentation on CRAN, and visit the vignettes (here, here, and here).

sf: objects with simple features

An sf object is a data set consisting of sets of features with attributes.

To showcase the functionality of the sf package we download a polygon data set (ESRI shapefile) and make it available within the R workspace over st_read() function.

The shapefile stores the geometry of districts in Germany and was downloaded from the Federal Agency for Cartography and Geodesy on October 30, 2022. The data is found here, and the file description is found here (GeoBasis-DE/BKG 2022).

We also want to add statistical information about the area, the population size and the population density of each district. This data was obtained from the The Federal Statistical Office on October 30, 2022 (here).

The procedure of how to download the data and merge it into a single shapefile is described in the code below. The completed file can be downloaded here. If you want to use the completed file directly, download and read it as described for the shapefile in the following code.

# download of the shapefile
temp <- tempfile()
download_url <- "https://daten.gdz.bkg.bund.de/produkte/vg/vg250_ebenen_0101/aktuell/"
zipfile <- "vg250_01-01.utm32s.shape.ebenen.zip"

# download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
# unzip the file(s)
unzip(temp)
# close file connection
unlink(temp)
# download of the population data
download.file(url = "https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/Administrativ/Archiv/Standardtabellen/04_KreiseVorjahr.xlsx?__blob=publicationFile", destfile = "./germany_dist_population.xlsx", mode = "wb")
# read the shapefile for German districts
library(sf)
district_geometry <- st_read("./vg250_01-01.utm32s.shape.ebenen/vg250_ebenen_0101/VG250_KRS.shp", quiet = TRUE)
colnames(district_geometry)[9] <- "district"

# read population and area statistics for German districts
library(xlsx)
district_stats <- read.xlsx2("germany_dist_population.xlsx", 2, header = TRUE)
colnames(district_stats)[c(4, 5, 6, 7, 8, 9)] <- c("NUTS", "area", "pop", "pop_m", "pop_f", "pop_dens")
district_stats[, c(5, 6, 7, 8, 9)] <- as.data.frame(apply(district_stats[, c(5, 6, 7, 8, 9)], 2, as.numeric))

# merge both data frames
total <- merge(district_geometry, district_stats, by = "NUTS")

# select relevant data
germany_districts <- total[, c("district", "area", "pop", "pop_m", "pop_f", "pop_dens", "geometry")]

Now, let us take a closer look at our sf object which contains the data about the German districts. The class() function shows that the sf object class extends the data.frame object class.

class(germany_districts)
## [1] "sf"         "data.frame"
colnames(germany_districts)
## [1] "district" "area"     "pop"      "pop_m"    "pop_f"    "pop_dens" "geometry"

The column with name geometry holds the geometries, and the other columns refer to the additional statistical information available for each record.

The str() function provides more information about the number of records (430), data types and the geometry attributes.

str(germany_districts)
## Classes 'sf' and 'data.frame':   430 obs. of  7 variables:
##  $ district: chr  "Stuttgart" "Böblingen" "Esslingen" "Göppingen" ...
##  $ area    : num  207 618 641 642 687 ...
##  $ pop     : num  626275 393195 533388 259046 544679 ...
##  $ pop_m   : num  312886 195513 266348 128916 270057 ...
##  $ pop_f   : num  313389 197682 267040 130130 274622 ...
##  $ pop_dens: num  3021 636 832 403 793 ...
##  $ geometry:sfc_MULTIPOLYGON of length 430; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:532, 1:2] 516514 516502 516484 516871 516824 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA
##   ..- attr(*, "names")= chr [1:6] "district" "area" "pop" "pop_m" ...

Exercise: Access the first 20 rows of the column in the data.frame object that contains the names of the districts of Germany.

# your code here
Show code
head(germany_districts$district, 20)
##  [1] "Stuttgart"             "Böblingen"             "Esslingen"            
##  [4] "Göppingen"             "Ludwigsburg"           "Rems-Murr-Kreis"      
##  [7] "Heilbronn"             "Heilbronn"             "Hohenlohekreis"       
## [10] "Schwäbisch Hall"       "Main-Tauber-Kreis"     "Heidenheim"           
## [13] "Ostalbkreis"           "Baden-Baden"           "Karlsruhe"            
## [16] "Karlsruhe"             "Rastatt"               "Heidelberg"           
## [19] "Mannheim"              "Neckar-Odenwald-Kreis"

The sf object class offers a vast range of functions and spatial data operations to be performed.

methods(class = "sf")
##  [1] $<-                          [                           
##  [3] [[<-                         aggregate                   
##  [5] as.data.frame                cbind                       
##  [7] coerce                       crs                         
##  [9] dbDataType                   dbWriteTable                
## [11] distance                     duplicated                  
## [13] ext                          extract                     
## [15] filter                       identify                    
## [17] initialize                   lines                       
## [19] mapView                      mask                        
## [21] merge                        plot                        
## [23] points                       polys                       
## [25] print                        rasterize                   
## [27] rbind                        show                        
## [29] slotsFromS3                  st_agr                      
## [31] st_agr<-                     st_area                     
## [33] st_as_s2                     st_as_sf                    
## [35] st_as_sfc                    st_bbox                     
## [37] st_boundary                  st_break_antimeridian       
## [39] st_buffer                    st_cast                     
## [41] st_centroid                  st_collection_extract       
## [43] st_concave_hull              st_convex_hull              
## [45] st_coordinates               st_crop                     
## [47] st_crs                       st_crs<-                    
## [49] st_difference                st_drop_geometry            
## [51] st_filter                    st_geometry                 
## [53] st_geometry<-                st_inscribed_circle         
## [55] st_interpolate_aw            st_intersection             
## [57] st_intersects                st_is                       
## [59] st_is_valid                  st_join                     
## [61] st_line_merge                st_m_range                  
## [63] st_make_valid                st_minimum_rotated_rectangle
## [65] st_nearest_points            st_node                     
## [67] st_normalize                 st_point_on_surface         
## [69] st_polygonize                st_precision                
## [71] st_reverse                   st_sample                   
## [73] st_segmentize                st_set_precision            
## [75] st_shift_longitude           st_simplify                 
## [77] st_snap                      st_sym_difference           
## [79] st_transform                 st_triangulate              
## [81] st_triangulate_constrained   st_union                    
## [83] st_voronoi                   st_wrap_dateline            
## [85] st_write                     st_z_range                  
## [87] st_zm                        svc                         
## [89] transform                    vect                        
## see '?methods' for accessing help and source code

Let us try the plot() function on the sf object germany_districts:

plot(germany_districts)

By default the plot() function plots up to 9 features. We can change that behavior by setting the max.plot argument.

plot(germany_districts, max.plot = 2)

Or we explicitly state which column we want to plot:

plot(germany_districts["district"])

Owing to the simplicity of the sf object structure we can easily subset and manipulate our data.

Exercise: Subset the data set to only include the geometry of the city of Berlin and plot it thereafter.

# your code here
Show code
berlin_sf <- germany_districts["district"][germany_districts$district == "Berlin", ]
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845580
## Projected CRS: ETRS89 / UTM zone 32N
##     district                       geometry
## 142   Berlin MULTIPOLYGON (((802831.7 58...
Show code
plot(berlin_sf, main = "Berlin", col = NA)

Using the dplyr package for data manipulation

Exercise: Plot the districts of Germany such that only those districts are filled with colour where the difference between male (pop_m) and female (pop_f) population is larger than 3 % of the overall population.

# replace the NULL values within the following code accordingly
p <- germany_districts %>%
  # calculate the absolute difference between male and female population
  dplyr::mutate(NULL) %>%
  # calculate the percentage with respect to the total population
  dplyr::mutate(relative_difference = NULL) %>%
  # select desired column
  dplyr::select(NULL) %>%
  # filter districts according to threshold of 3%
  dplyr::filter(NULL)

# plot
library(viridis)
p %>%
  ggplot2::ggplot() +
  ggplot2::geom_sf(data = germany_districts, color = "grey") +
  ggplot2::geom_sf(ggplot2::aes(fill = relative_difference)) +
  scale_fill_viridis("relative_difference", direction = -1, option = "plasma") +
  ggplot2::theme_bw()
Show code
p <- germany_districts %>%
  # calculate the absolute difference between male and female population
  dplyr::mutate(abs_diff = abs(pop_m - pop_f)) %>%
  # calculate the percentage with respect to the total population
  dplyr::mutate(relative_difference = abs_diff / pop * 100) %>%
  # select desired column
  dplyr::select(relative_difference) %>%
  # filter districts according to threshold of 3%
  dplyr::filter(relative_difference > 3)

# plot
library(viridis)
p %>%
  ggplot2::ggplot() +
  ggplot2::geom_sf(data = germany_districts, color = "grey") +
  ggplot2::geom_sf(ggplot2::aes(fill = relative_difference)) +
  scale_fill_viridis("Relative Difference in %", direction = -1, option = "plasma") +
  ggplot2::theme_bw()

sfc: simple feature geometry list-column

The geometry column in an sf object stores the geometries. It may be accessed by $geometry, but the more general way is to use the st_geometry() function.

germany_districts_geom <- st_geometry(germany_districts)
class(germany_districts_geom)
## [1] "sfc_MULTIPOLYGON" "sfc"
germany_districts_geom
## Geometry set for 430 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101487
## Projected CRS: ETRS89 / UTM zone 32N
## First 5 geometries:
## MULTIPOLYGON (((516514.1 5412585, 516501.7 5412...
## MULTIPOLYGON (((495454.7 5412605, 495762.8 5412...
## MULTIPOLYGON (((529621.7 5403177, 529823.7 5402...
## MULTIPOLYGON (((543224.2 5403286, 543489.1 5403...
## MULTIPOLYGON (((509140.4 5434509, 509292.2 5434...

The sfc object class offers a vast range of functions and operations for spatial data.

methods(class = "sfc")
##  [1] [                            [<-                         
##  [3] as.data.frame                c                           
##  [5] coerce                       format                      
##  [7] identify                     initialize                  
##  [9] mapView                      Ops                         
## [11] print                        rep                         
## [13] show                         slotsFromS3                 
## [15] st_area                      st_as_binary                
## [17] st_as_grob                   st_as_s2                    
## [19] st_as_sf                     st_as_text                  
## [21] st_bbox                      st_boundary                 
## [23] st_break_antimeridian        st_buffer                   
## [25] st_cast                      st_centroid                 
## [27] st_collection_extract        st_concave_hull             
## [29] st_convex_hull               st_coordinates              
## [31] st_crop                      st_crs                      
## [33] st_crs<-                     st_difference               
## [35] st_geometry                  st_inscribed_circle         
## [37] st_intersection              st_intersects               
## [39] st_is                        st_is_valid                 
## [41] st_line_merge                st_m_range                  
## [43] st_make_valid                st_minimum_rotated_rectangle
## [45] st_nearest_points            st_node                     
## [47] st_normalize                 st_point_on_surface         
## [49] st_polygonize                st_precision                
## [51] st_reverse                   st_sample                   
## [53] st_segmentize                st_set_precision            
## [55] st_shift_longitude           st_simplify                 
## [57] st_snap                      st_sym_difference           
## [59] st_transform                 st_triangulate              
## [61] st_triangulate_constrained   st_union                    
## [63] st_voronoi                   st_wrap_dateline            
## [65] st_write                     st_z_range                  
## [67] st_zm                        str                         
## [69] summary                      vec_cast.sfc                
## [71] vec_ptype2.sfc               vect                        
## see '?methods' for accessing help and source code

For instance, the function st_bbox() retrieves the coordinate bounding box.

st_bbox(germany_districts)
##      xmin      ymin      xmax      ymax 
##  280371.1 5235856.0  921292.4 6101486.9

sfg: simple feature geometry

Simple feature geometry (sfg) objects carry the geometry for a single feature, e.g. a point, linestring or polygon.

Simple feature geometries are implemented as R native data, using the following rules:

  • a single point is a numeric vector
  • a set of points, e.g. in a linestring or ring of a polygon is a matrix, each row containing a point
  • any other set is a list.

Coordinate reference systems and transformations

The coordinate reference system (CRS) specifies the location on Earth of a particular pair of coordinates. The sfc object (geometry list-columns) has two attributes for storing a CRS: epsg and proj4string.

st_crs(germany_districts)
## Coordinate Reference System:
##   User input: ETRS89 / UTM zone 32N 
##   wkt:
## PROJCRS["ETRS89 / UTM zone 32N",
##     BASEGEOGCRS["ETRS89",
##         ENSEMBLE["European Terrestrial Reference System 1989 ensemble",
##             MEMBER["European Terrestrial Reference Frame 1989"],
##             MEMBER["European Terrestrial Reference Frame 1990"],
##             MEMBER["European Terrestrial Reference Frame 1991"],
##             MEMBER["European Terrestrial Reference Frame 1992"],
##             MEMBER["European Terrestrial Reference Frame 1993"],
##             MEMBER["European Terrestrial Reference Frame 1994"],
##             MEMBER["European Terrestrial Reference Frame 1996"],
##             MEMBER["European Terrestrial Reference Frame 1997"],
##             MEMBER["European Terrestrial Reference Frame 2000"],
##             MEMBER["European Terrestrial Reference Frame 2005"],
##             MEMBER["European Terrestrial Reference Frame 2014"],
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[0.1]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4258]],
##     CONVERSION["UTM zone 32N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",9,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore."],
##         BBOX[38.76,6,84.33,12]],
##     ID["EPSG",25832]]

CRS transformation operations for an sf object are carried out using the st_transform() function.

We want to project our sf object of Berlin (berlin_sf) onto the European Terrestrial Reference System 1989 (ETRS89/UTM zone 32N). Therefore, we also need provide the EPSG identifier for ERTS89/UTM zone 32N, which is \(25832\) (see here)

berlin_sf <- st_transform(berlin_sf, 25832)
st_crs(berlin_sf)
## Coordinate Reference System:
##   User input: EPSG:25832 
##   wkt:
## PROJCRS["ETRS89 / UTM zone 32N",
##     BASEGEOGCRS["ETRS89",
##         ENSEMBLE["European Terrestrial Reference System 1989 ensemble",
##             MEMBER["European Terrestrial Reference Frame 1989"],
##             MEMBER["European Terrestrial Reference Frame 1990"],
##             MEMBER["European Terrestrial Reference Frame 1991"],
##             MEMBER["European Terrestrial Reference Frame 1992"],
##             MEMBER["European Terrestrial Reference Frame 1993"],
##             MEMBER["European Terrestrial Reference Frame 1994"],
##             MEMBER["European Terrestrial Reference Frame 1996"],
##             MEMBER["European Terrestrial Reference Frame 1997"],
##             MEMBER["European Terrestrial Reference Frame 2000"],
##             MEMBER["European Terrestrial Reference Frame 2005"],
##             MEMBER["European Terrestrial Reference Frame 2014"],
##             ELLIPSOID["GRS 1980",6378137,298.257222101,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[0.1]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4258]],
##     CONVERSION["UTM zone 32N",
##         METHOD["Transverse Mercator",
##             ID["EPSG",9807]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",9,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["Scale factor at natural origin",0.9996,
##             SCALEUNIT["unity",1],
##             ID["EPSG",8805]],
##         PARAMETER["False easting",500000,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["(E)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["(N)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Engineering survey, topographic mapping."],
##         AREA["Europe between 6°E and 12°E: Austria; Belgium; Denmark - onshore and offshore; Germany - onshore and offshore; Norway including - onshore and offshore; Spain - offshore."],
##         BBOX[38.76,6,84.33,12]],
##     ID["EPSG",25832]]

For later usage we store the geometry of the district Berlin on disk.

save(file = "berlin_district.RData", berlin_sf)

Now it’s your turn!

Exercise: Transform the data set germany_districts into WGS 84/Pseudo-Mercator, the so called Google Web Mercator. The EPSG identifier is \(3857\).

# your code here
Show code
germany_districts <- st_transform(germany_districts, 3857)
st_crs(germany_districts)
## Coordinate Reference System:
##   User input: EPSG:3857 
##   wkt:
## PROJCRS["WGS 84 / Pseudo-Mercator",
##     BASEGEOGCRS["WGS 84",
##         ENSEMBLE["World Geodetic System 1984 ensemble",
##             MEMBER["World Geodetic System 1984 (Transit)"],
##             MEMBER["World Geodetic System 1984 (G730)"],
##             MEMBER["World Geodetic System 1984 (G873)"],
##             MEMBER["World Geodetic System 1984 (G1150)"],
##             MEMBER["World Geodetic System 1984 (G1674)"],
##             MEMBER["World Geodetic System 1984 (G1762)"],
##             MEMBER["World Geodetic System 1984 (G2139)"],
##             ELLIPSOID["WGS 84",6378137,298.257223563,
##                 LENGTHUNIT["metre",1]],
##             ENSEMBLEACCURACY[2.0]],
##         PRIMEM["Greenwich",0,
##             ANGLEUNIT["degree",0.0174532925199433]],
##         ID["EPSG",4326]],
##     CONVERSION["Popular Visualisation Pseudo-Mercator",
##         METHOD["Popular Visualisation Pseudo Mercator",
##             ID["EPSG",1024]],
##         PARAMETER["Latitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8801]],
##         PARAMETER["Longitude of natural origin",0,
##             ANGLEUNIT["degree",0.0174532925199433],
##             ID["EPSG",8802]],
##         PARAMETER["False easting",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8806]],
##         PARAMETER["False northing",0,
##             LENGTHUNIT["metre",1],
##             ID["EPSG",8807]]],
##     CS[Cartesian,2],
##         AXIS["easting (X)",east,
##             ORDER[1],
##             LENGTHUNIT["metre",1]],
##         AXIS["northing (Y)",north,
##             ORDER[2],
##             LENGTHUNIT["metre",1]],
##     USAGE[
##         SCOPE["Web mapping and visualisation."],
##         AREA["World between 85.06°S and 85.06°N."],
##         BBOX[-85.06,-180,85.06,180]],
##     ID["EPSG",3857]]

We can use the coord_sf() function and plot sf objects with different projection.

For example, let us plot our data set in the WGS84/World Equidistant Cylindrical projection (EPSG 4087).

library(ggplot2)
library(viridis)
ggplot(germany_districts) +
  geom_sf(aes(fill = pop)) +
  scale_fill_viridis("pop") +
  coord_sf(crs = st_crs(4087)) +
  ggtitle("World Equidistant Cylindrical (EPSG 4087)") +
  theme_bw()

Frequently used in Germany is the Gauß-Krüger projection(also known as Transverse Mercator projection).

Exercise: Plot the dataset germany_districts in DHDN / 3-degree Gauss-Kruger zone 3 transformation (EPSG 31467).

# your code here
Show code
ggplot(germany_districts) +
  geom_sf(aes(fill = pop)) +
  scale_fill_viridis("pop", direction = -1, option = "mako") +
  coord_sf(crs = st_crs(31467)) +
  ggtitle("DHDN / 3-degree Gauss-Kruger zone 3 (EPSG 31467)") +
  theme_bw()

A nice look up page for different coordinate systems is found here and a fancy visualization of many prominent map projections is found here.


3. The spatstat package

The R package spatstat supports statistical analysis for spatial point patterns. Its main focus is the analysis of spatial patterns of points in two-dimensional space. The points may carry auxiliary data (marks), and the spatial region in which the points were recorded may have arbitrary shape.

The package supports

An introduction to spatstat is provided by the package vignette Getting started with spatstat. For further information visit the project on CRAN, the project homepage or dig through the over 1500-paged reference manual.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.