The sf package encodes spatial vector data according to the Simple Features standard described by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO). The sf package provides bindings to GDAL for reading and writing data, to GEOS for geometrical operations, and to PROJ for projection conversions and datum transformations.

library(sf)

The sfpackage represents simple features as native R objects, using simple data structures (S3 classes, lists, matrices, vectors). All functions and methods in sf that operate on spatial data are prefixed by st_, which refers to spatial and temporal.

In the sf package three classes are used to represent simple features:

Note that the sf gradually replaces the well known sp package for spatial analysis. For more details on the sf package refer to the documentation on CRAN, and visit the vignettes (here, here, and here).


sf: objects with simple features

A sf object is a data set consisting of sets of features with attributes.

To showcase the functionality of the sf package we download a polygon data set, an ESRI shapefile, and load it into memory with the st_read() function.

The shapefile stores the geometry of districts in Germany and additional information on the population size. The shapefile was downloaded from the Federal Agency for Cartography and Geodesy on June 25, 2017. The data is found here, and the file description here (GeoBasis-DE/BKG 2017). Additional information with respect to the population size was obtained from the The Federal Statistical Office on June 26, 2017 (here).

## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "germany_districts.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
## read the data
dists <- st_read("./germany_districts.shp")
## Reading layer `germany_districts' from data source 
##   `C:\Users\Jana\Documents\Uni\SOGA\soga\Soga-R\100\11100_spatial_data_in_R\germany_districts.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 401 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N

The short printed report gives the file name, the driver (ESRI Shapefile), shows that there are 401 features (records, represented as rows) and 6 fields (attributes, represented as columns) in the data set.

The class() function shows that the sf object class extends the data.frame object class.

class(dists)
## [1] "sf"         "data.frame"

The column named geometry holds the geometries and the other columns refer to the additional information available for each record.

colnames(dists)
## [1] "district" "area"     "pop"      "pop_m"    "pop_w"    "pop_dens" "geometry"

We can easily access the first 20 rows of the column that contains the names of the districts of Germany.

head(dists$district, 20)
##  [1] "Ahrweiler"                 "Aichach-Friedberg"        
##  [3] "Alb-Donau-Kreis"           "Altenburger Land"         
##  [5] "Altenkirchen (Westerwald)" "Altmarkkreis Salzwedel"   
##  [7] "Altötting"                 "Alzey-Worms"              
##  [9] "Amberg"                    "Amberg-Sulzbach"          
## [11] "Ammerland"                 "Anhalt-Bitterfeld"        
## [13] "Ansbach"                   "Ansbach"                  
## [15] "Aschaffenburg"             "Aschaffenburg"            
## [17] "Augsburg"                  "Augsburg"                 
## [19] "Aurich"                    "Bad Dürkheim"

The sf object class offers a vast range of functions and spatial data operations to be performed.

methods(class = "sf")
##  [1] $<-                   [                     [[<-                 
##  [4] aggregate             as.data.frame         cbind                
##  [7] coerce                dbDataType            dbWriteTable         
## [10] filter                identify              initialize           
## [13] merge                 plot                  print                
## [16] rbind                 show                  slotsFromS3          
## [19] st_agr                st_agr<-              st_area              
## [22] st_as_s2              st_as_sf              st_bbox              
## [25] st_boundary           st_buffer             st_cast              
## [28] st_centroid           st_collection_extract st_convex_hull       
## [31] st_coordinates        st_crop               st_crs               
## [34] st_crs<-              st_difference         st_filter            
## [37] st_geometry           st_geometry<-         st_inscribed_circle  
## [40] st_interpolate_aw     st_intersection       st_intersects        
## [43] st_is                 st_is_valid           st_join              
## [46] st_line_merge         st_m_range            st_make_valid        
## [49] st_nearest_points     st_node               st_normalize         
## [52] st_point_on_surface   st_polygonize         st_precision         
## [55] st_reverse            st_sample             st_segmentize        
## [58] st_set_precision      st_shift_longitude    st_simplify          
## [61] st_snap               st_sym_difference     st_transform         
## [64] st_triangulate        st_union              st_voronoi           
## [67] st_wrap_dateline      st_write              st_z_range           
## [70] st_zm                 transform            
## see '?methods' for accessing help and source code

Let us try the plot() function on the sf object dists:

plot(dists)

By default the plot() function plots up to 9 features. We can change that behavior by adding the max.plot = 2 argument.

plot(dists, max.plot = 2)

Or, we explicitly state which column we want to plot:

plot(dists["district"])

Owing to the simplicity of the sf object structure we can easily subset and manipulate our data.

Let us subset our data to only include the district of the city of Berlin.

berlin_sf <- dists["district"][dists$district == "Berlin", ]
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
##    district                       geometry
## 33   Berlin MULTIPOLYGON (((802565.3 58...
plot(berlin_sf, main = "Berlin", col = NA)

In the next step we project the berlin_sf data set to the European Terrestrial Reference System 1989 (ETRS89/UTM zone 32N). Therefore, we apply the st_transform() function and provide the EPSG identifier for ERTS89/UTM zone 32N, which is \(25832\) (see here)

berlin_sf <- st_transform(berlin_sf, 25832)
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
##    district                       geometry
## 33   Berlin MULTIPOLYGON (((802565.3 58...

For the purpose of further analysis we store the geometry of the district of Berlin on disk.

save(file = "berlin_district.RData", berlin_sf)

Using the dplyr package for data manipulation

Let us find out for which districts of Germany the number of male (pop_m) and female (pop_w) inhabitants differs more than 3 %. This can be done nicely using functions provided by the dplyr package, which is part of the tidyverse. If you have not done this already, install it by typing install.packages("tidyverse").

library(tidyverse)

p <- dists %>%
  # calculate absolute difference of male and female population
  dplyr::mutate(abs_dif = abs(pop_m - pop_w)) %>%
  # calculate percentage
  dplyr::mutate(abs_dif_per = abs_dif / pop * 100) %>%
  # select desired columns, note geometry column not explicitly selected
  dplyr::select(abs_dif_per) %>%
  # filter to districts where percentage > 3%
  dplyr::filter(abs_dif_per > 3)

sf objects can be plotted with ggplot2 by using geom_sf(). Hence, we can plot our sf object p and visualize the districts of Germany with a gender gap in inhabitants.

library(viridis) # needed for custom scale color

ggplot(p) +
  geom_sf(data = dists, color = "grey") +
  geom_sf(aes(fill = abs_dif_per)) +
  scale_fill_viridis("abs_dif_per") +
  ggplot2::theme_bw()


sfc: simple feature geometry list-column

The geometry column in a sf object, storing the geometries, may be accessed by $geometry, but the more general way is to use the st_geometry() function.

dists_geom <- st_geometry(dists)
class(dists_geom)
## [1] "sfc_MULTIPOLYGON" "sfc"
dists_geom
## Geometry set for 401 features 
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N
## First 5 geometries:
## MULTIPOLYGON (((373017.7 5612406, 373074 561229...
## MULTIPOLYGON (((649708.3 5386593, 650041.3 5386...
## MULTIPOLYGON (((569873.3 5386882, 570015 538679...
## MULTIPOLYGON (((734500.1 5665909, 734539.2 5665...
## MULTIPOLYGON (((416304.5 5643695, 416558.5 5643...

The sfc object class offers a vast range of functions and spatial data operations to be performed.

methods(class = "sfc")
##  [1] [                     [<-                   as.data.frame        
##  [4] c                     coerce                format               
##  [7] fortify               identify              initialize           
## [10] obj_sum               Ops                   print                
## [13] rep                   scale_type            show                 
## [16] slotsFromS3           st_area               st_as_binary         
## [19] st_as_grob            st_as_s2              st_as_sf             
## [22] st_as_text            st_bbox               st_boundary          
## [25] st_buffer             st_cast               st_centroid          
## [28] st_collection_extract st_convex_hull        st_coordinates       
## [31] st_crop               st_crs                st_crs<-             
## [34] st_difference         st_geometry           st_inscribed_circle  
## [37] st_intersection       st_intersects         st_is                
## [40] st_is_valid           st_line_merge         st_m_range           
## [43] st_make_valid         st_nearest_points     st_node              
## [46] st_normalize          st_point_on_surface   st_polygonize        
## [49] st_precision          st_reverse            st_sample            
## [52] st_segmentize         st_set_precision      st_shift_longitude   
## [55] st_simplify           st_snap               st_sym_difference    
## [58] st_transform          st_triangulate        st_union             
## [61] st_voronoi            st_wrap_dateline      st_write             
## [64] st_z_range            st_zm                 str                  
## [67] summary               type_sum              vec_cast.sfc         
## [70] vec_ptype2.sfc       
## see '?methods' for accessing help and source code

For instance, the function st_bbox() retrieves the coordinates of the bounding box.

st_bbox(dists)
##      xmin      ymin      xmax      ymax 
##  280371.1 5235856.0  921292.4 6101443.7

sfg: simple feature geometry

Simple feature geometry (sfg) objects carry the geometry for a single feature, e.g. a point, linestring or polygon.

Simple feature geometries are implemented as R native data, using the following rules:


Coordinate reference systems and transformations

The coordinate reference system (CRS) specifies the location on Earth of a particular pair of coordinates. The sfc object (geometry list-columns) has two attributes to store a CRS: epsg and proj4string. Using the st_crs() function and the proj4string method we can retrieve the coordinate reference system from a sf or sfc object.

st_crs(dists)$proj4string
## [1] "+proj=utm +zone=32 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

CRS transformation operations for a sf object are carried out using the st_transform() function.

Let us transform our data (dists) into WGS 84/Pseudo-Mercator , the so called Google Web Mercator. The EPSG identifier is \(3857\).

st_crs(st_transform(dists, 3857))$proj4string
## [1] "+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"

Using the ggplot2 package, we can plot sf objects with different projections by specifying the coord_sf() function.

For example, let us plot our data set in the WGS84/World Equidistant Cylindrical projection (EPSG \(4087\)).

ggplot(dists) +
  geom_sf(aes(fill = pop)) +
  scale_fill_viridis("pop") +
  coord_sf(crs = st_crs(4087)) +
  ggtitle("World Equidistant Cylindrical (EPSG 4087)") +
  theme_bw()

For the sake of comparison we plot our data set in the WGS 84/Pseudo-Mercator projection (EPSG \(3857\)).

ggplot(dists) +
  geom_sf(aes(fill = pop)) +
  scale_fill_viridis("pop") +
  coord_sf(crs = st_crs(3857)) +
  ggtitle("WGS 84/Pseudo-Mercator (EPSG 3857)") +
  theme_bw()

A nice look-up page for different coordinate systems is found here and a fancy visualization of many prominent map projections is found here.


Conversion from and to sp

library(sp)

Spatial objects of class sp can be converted into simple feature objects or geometries by st_as_sf() and st_as_sfc(), respectively:

methods(st_as_sf)
##  [1] st_as_sf.data.frame*   st_as_sf.lpp*          st_as_sf.map*         
##  [4] st_as_sf.owin*         st_as_sf.ppp*          st_as_sf.ppplist*     
##  [7] st_as_sf.psp*          st_as_sf.s2_geography* st_as_sf.sf*          
## [10] st_as_sf.sfc*          st_as_sf.Spatial*      st_as_sf.SpatVector*  
## see '?methods' for accessing help and source code
methods(st_as_sfc)
##  [1] st_as_sfc.bbox*               st_as_sfc.blob*              
##  [3] st_as_sfc.character*          st_as_sfc.dimensions*        
##  [5] st_as_sfc.factor*             st_as_sfc.list*              
##  [7] st_as_sfc.map*                st_as_sfc.owin*              
##  [9] st_as_sfc.pq_geometry*        st_as_sfc.psp*               
## [11] st_as_sfc.raw*                st_as_sfc.s2_geography*      
## [13] st_as_sfc.SpatialLines*       st_as_sfc.SpatialMultiPoints*
## [15] st_as_sfc.SpatialPixels*      st_as_sfc.SpatialPoints*     
## [17] st_as_sfc.SpatialPolygons*    st_as_sfc.tess*              
## [19] st_as_sfc.WKB*               
## see '?methods' for accessing help and source code

Simple feature objects can be converted into a sp class using the command as(<your-sf-object-here>, "Spatial").

Let us transform the geometry of the district border of Berlin (berlin_sf) into a sp object with the coordinate reference system WGS84 in one line of code.

berlin_sp_wgs84 <- as(st_transform(berlin_sf, 4326), "Spatial")
berlin_sp_wgs84@proj4string@projargs
## [1] "+proj=longlat +datum=WGS84 +no_defs"

We can save the resulting object for further usage.

save(berlin_sp_wgs84, file = "berlin_districts_sp_wgs84.RData")

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.