The sf package encodes spatial vector data according to the Simple Features standard described by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO). The sf package provides bindings to GDAL for reading and writing data, to GEOS for geometrical operations, and to PROJ for projection conversions and datum transformations.

library(sf)

The sfpackage represents simple features as native R objects, using simple data structures (S3 classes, lists, matrices, vectors). All functions and methods in sf that operate on spatial data are prefixed by st_, which refers to spatial and temporal.

In the sf package three classes are used to represent simple features:

Note that the sf gradually replaces the well known sp package for spatial analysis. For more details on the sf package refer to the documentation on CRAN, and visit the vignettes (here, here, and here).


sf: objects with simple features

A sf object is a data set consisting of sets of features with attributes.

To showcase the functionality of the sf package we download a polygon data set, an ESRI shapefile, and load it into memory with the st_read() function.

The shapefile stores the geometry of districts in Germany and additional information on the population size. The shapefile was downloaded from the Federal Agency for Cartography and Geodesy on June 25, 2017. The data is found here, and the file description here (GeoBasis-DE/BKG 2017). Additional information with respect to the population size was obtained from the The Federal Statistical Office on June 26, 2017 (here).

## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "germany_districts.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
## read the data
dists <- st_read("./germany_districts.shp")
## Reading layer `germany_districts' from data source 
##   `C:\Users\Jana\Documents\Uni\SOGA\soga\Soga-R\100\11100_spatial_data_in_R\germany_districts.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 401 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N

The short printed report gives the file name, the driver (ESRI Shapefile), shows that there are 401 features (records, represented as rows) and 6 fields (attributes, represented as columns) in the data set.

The class() function shows that the sf object class extends the data.frame object class.

class(dists)
## [1] "sf"         "data.frame"

The column named geometry holds the geometries and the other columns refer to the additional information available for each record.

colnames(dists)
## [1] "district" "area"     "pop"      "pop_m"    "pop_w"    "pop_dens" "geometry"

We can easily access the first 20 rows of the column that contains the names of the districts of Germany.

head(dists$district, 20)
##  [1] "Ahrweiler"                 "Aichach-Friedberg"        
##  [3] "Alb-Donau-Kreis"           "Altenburger Land"         
##  [5] "Altenkirchen (Westerwald)" "Altmarkkreis Salzwedel"   
##  [7] "Altötting"                 "Alzey-Worms"              
##  [9] "Amberg"                    "Amberg-Sulzbach"          
## [11] "Ammerland"                 "Anhalt-Bitterfeld"        
## [13] "Ansbach"                   "Ansbach"                  
## [15] "Aschaffenburg"             "Aschaffenburg"            
## [17] "Augsburg"                  "Augsburg"                 
## [19] "Aurich"                    "Bad Dürkheim"

The sf object class offers a vast range of functions and spatial data operations to be performed.

methods(class = "sf")
##  [1] $<-                   [                     [[<-                 
##  [4] aggregate             as.data.frame         cbind                
##  [7] coerce                dbDataType            dbWriteTable         
## [10] filter                identify              initialize           
## [13] merge                 plot                  print                
## [16] rbind                 show                  slotsFromS3          
## [19] st_agr                st_agr<-              st_area              
## [22] st_as_s2              st_as_sf              st_bbox              
## [25] st_boundary           st_buffer             st_cast              
## [28] st_centroid           st_collection_extract st_convex_hull       
## [31] st_coordinates        st_crop               st_crs               
## [34] st_crs<-              st_difference         st_filter            
## [37] st_geometry           st_geometry<-         st_inscribed_circle  
## [40] st_interpolate_aw     st_intersection       st_intersects        
## [43] st_is                 st_is_valid           st_join              
## [46] st_line_merge         st_m_range            st_make_valid        
## [49] st_nearest_points     st_node               st_normalize         
## [52] st_point_on_surface   st_polygonize         st_precision         
## [55] st_reverse            st_sample             st_segmentize        
## [58] st_set_precision      st_shift_longitude    st_simplify          
## [61] st_snap               st_sym_difference     st_transform         
## [64] st_triangulate        st_union              st_voronoi           
## [67] st_wrap_dateline      st_write              st_z_range           
## [70] st_zm                 transform            
## see '?methods' for accessing help and source code

Let us try the plot() function on the sf object dists:

plot(dists)

By default the plot() function plots up to 9 features. We can change that behavior by adding the max.plot = 2 argument.

plot(dists, max.plot = 2)

Or, we explicitly state which column we want to plot:

plot(dists["district"])

Owing to the simplicity of the sf object structure we can easily subset and manipulate our data.

Let us subset our data to only include the district of the city of Berlin.

berlin_sf <- dists["district"][dists$district == "Berlin", ]
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
##    district                       geometry
## 33   Berlin MULTIPOLYGON (((802565.3 58...
plot(berlin_sf, main = "Berlin", col = NA)

In the next step we project the berlin_sf data set to the European Terrestrial Reference System 1989 (ETRS89/UTM zone 32N). Therefore, we apply the st_transform() function and provide the EPSG identifier for ERTS89/UTM zone 32N, which is \(25832\) (see here)

berlin_sf <- st_transform(berlin_sf, 25832)
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
##    district                       geometry
## 33   Berlin MULTIPOLYGON (((802565.3 58...

For the purpose of further analysis we store the geometry of the district of Berlin on disk.

save(file = "berlin_district.RData", berlin_sf)

Using the dplyr package for data manipulation

Let us find out for which districts of Germany the number of male (pop_m) and female (pop_w) inhabitants differs more than 3 %. This can be done nicely using functions provided by the dplyr package, which is part of the tidyverse. If you have not done this already, install it by typing install.packages("tidyverse").

library(tidyverse)

p <- dists %>%
  # calculate absolute difference of male and female population
  dplyr::mutate(abs_dif = abs(pop_m - pop_w)) %>%
  # calculate percentage
  dplyr::mutate(abs_dif_per = abs_dif / pop * 100) %>%
  # select desired columns, note geometry column not explicitly selected
  dplyr::select(abs_dif_per) %>%
  # filter to districts where percentage > 3%
  dplyr::filter(abs_dif_per > 3)

sf objects can be plotted with ggplot2 by using geom_sf(). Hence, we can plot our sf object p and visualize the districts of Germany with a gender gap in inhabitants.

library(viridis) # needed for custom scale color

ggplot(p) +
  geom_sf(data = dists, color = "grey") +
  geom_sf(aes(fill = abs_dif_per)) +
  scale_fill_viridis("abs_dif_per") +
  ggplot2::theme_bw()