The sf
package encodes spatial vector data according to the Simple Features standard described by the Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO). The sf
package provides bindings to GDAL for reading and writing data, to GEOS for geometrical operations, and to PROJ for projection conversions and datum transformations.
library(sf)
The sf
package represents simple features as native R objects, using simple data structures (S3 classes, lists, matrices, vectors). All functions and methods in sf
that operate on spatial data are prefixed by st_
, which refers to spatial and temporal.
In the sf
package three classes are used to represent simple features:
sf
, the table (data.frame
) with feature attributes and feature geometries, which containssfc
, the list-column with the geometries for each feature (record), which is composed ofsfg
, the feature geometry of an individual simple feature.Note that the sf
gradually replaces the well known sp
package for spatial analysis. For more details on the sf
package refer to the documentation on CRAN, and visit the vignettes (here, here, and here).
sf
: objects with simple featuresA sf
object is a data set consisting of sets of features with attributes.
To showcase the functionality of the sf
package we download a polygon data set, an ESRI shapefile, and load it into memory with the st_read()
function.
The shapefile stores the geometry of districts in Germany and additional information on the population size. The shapefile was downloaded from the Federal Agency for Cartography and Geodesy on June 25, 2017. The data is found here, and the file description here (GeoBasis-DE/BKG 2017). Additional information with respect to the population size was obtained from the The Federal Statistical Office on June 26, 2017 (here).
## build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "germany_districts.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
## read the data
dists <- st_read("./germany_districts.shp")
## Reading layer `germany_districts' from data source
## `C:\Users\Jana\Documents\Uni\SOGA\soga\Soga-R\100\11100_spatial_data_in_R\germany_districts.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 401 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N
The short printed report gives the file name, the driver (ESRI Shapefile), shows that there are 401 features (records, represented as rows) and 6 fields (attributes, represented as columns) in the data set.
The class()
function shows that the sf
object class extends the data.frame
object class.
class(dists)
## [1] "sf" "data.frame"
The column named geometry
holds the geometries and the other columns refer to the additional information available for each record.
colnames(dists)
## [1] "district" "area" "pop" "pop_m" "pop_w" "pop_dens" "geometry"
We can easily access the first 20 rows of the column that contains the names of the districts of Germany.
head(dists$district, 20)
## [1] "Ahrweiler" "Aichach-Friedberg"
## [3] "Alb-Donau-Kreis" "Altenburger Land"
## [5] "Altenkirchen (Westerwald)" "Altmarkkreis Salzwedel"
## [7] "Altötting" "Alzey-Worms"
## [9] "Amberg" "Amberg-Sulzbach"
## [11] "Ammerland" "Anhalt-Bitterfeld"
## [13] "Ansbach" "Ansbach"
## [15] "Aschaffenburg" "Aschaffenburg"
## [17] "Augsburg" "Augsburg"
## [19] "Aurich" "Bad Dürkheim"
The sf
object class offers a vast range of functions and spatial data operations to be performed.
methods(class = "sf")
## [1] $<- [ [[<-
## [4] aggregate as.data.frame cbind
## [7] coerce dbDataType dbWriteTable
## [10] filter identify initialize
## [13] merge plot print
## [16] rbind show slotsFromS3
## [19] st_agr st_agr<- st_area
## [22] st_as_s2 st_as_sf st_bbox
## [25] st_boundary st_buffer st_cast
## [28] st_centroid st_collection_extract st_convex_hull
## [31] st_coordinates st_crop st_crs
## [34] st_crs<- st_difference st_filter
## [37] st_geometry st_geometry<- st_inscribed_circle
## [40] st_interpolate_aw st_intersection st_intersects
## [43] st_is st_is_valid st_join
## [46] st_line_merge st_m_range st_make_valid
## [49] st_nearest_points st_node st_normalize
## [52] st_point_on_surface st_polygonize st_precision
## [55] st_reverse st_sample st_segmentize
## [58] st_set_precision st_shift_longitude st_simplify
## [61] st_snap st_sym_difference st_transform
## [64] st_triangulate st_union st_voronoi
## [67] st_wrap_dateline st_write st_z_range
## [70] st_zm transform
## see '?methods' for accessing help and source code
Let us try the plot()
function on the sf
object dists
:
plot(dists)
By default the plot()
function plots up to 9 features. We can change that behavior by adding the max.plot = 2
argument.
plot(dists, max.plot = 2)
Or, we explicitly state which column we want to plot:
plot(dists["district"])
Owing to the simplicity of the sf
object structure we can easily subset and manipulate our data.
Let us subset our data to only include the district of the city of Berlin.
berlin_sf <- dists["district"][dists$district == "Berlin", ]
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
## district geometry
## 33 Berlin MULTIPOLYGON (((802565.3 58...
plot(berlin_sf, main = "Berlin", col = NA)
In the next step we project the berlin_sf
data set to the European Terrestrial Reference System 1989 (ETRS89/UTM zone 32N). Therefore, we apply the st_transform()
function and provide the EPSG identifier for ERTS89/UTM zone 32N, which is \(25832\) (see here)
berlin_sf <- st_transform(berlin_sf, 25832)
berlin_sf
## Simple feature collection with 1 feature and 1 field
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 777974.1 ymin: 5808837 xmax: 823510.5 ymax: 5845512
## Projected CRS: ETRS89 / UTM zone 32N
## district geometry
## 33 Berlin MULTIPOLYGON (((802565.3 58...
For the purpose of further analysis we store the geometry of the district of Berlin on disk.
save(file = "berlin_district.RData", berlin_sf)
Using the dplyr
package for data manipulation
Let us find out for which districts of Germany the number of male (pop_m
) and female (pop_w
) inhabitants differs more than 3 %. This can be done nicely using functions provided by the dplyr
package, which is part of the tidyverse
. If you have not done this already, install it by typing install.packages("tidyverse")
.
library(tidyverse)
p <- dists %>%
# calculate absolute difference of male and female population
dplyr::mutate(abs_dif = abs(pop_m - pop_w)) %>%
# calculate percentage
dplyr::mutate(abs_dif_per = abs_dif / pop * 100) %>%
# select desired columns, note geometry column not explicitly selected
dplyr::select(abs_dif_per) %>%
# filter to districts where percentage > 3%
dplyr::filter(abs_dif_per > 3)
sf
objects can be plotted with ggplot2
by using geom_sf()
. Hence, we can plot our sf
object p
and visualize the districts of Germany with a gender gap in inhabitants.
library(viridis) # needed for custom scale color
ggplot(p) +
geom_sf(data = dists, color = "grey") +
geom_sf(aes(fill = abs_dif_per)) +
scale_fill_viridis("abs_dif_per") +
ggplot2::theme_bw()
sfc
: simple feature geometry list-columnThe geometry column in a sf
object, storing the geometries, may be accessed by $geometry
, but the more general way is to use the st_geometry()
function.
dists_geom <- st_geometry(dists)
class(dists_geom)
## [1] "sfc_MULTIPOLYGON" "sfc"
dists_geom
## Geometry set for 401 features
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6101444
## Projected CRS: ETRS89 / UTM zone 32N
## First 5 geometries:
## MULTIPOLYGON (((373017.7 5612406, 373074 561229...
## MULTIPOLYGON (((649708.3 5386593, 650041.3 5386...
## MULTIPOLYGON (((569873.3 5386882, 570015 538679...
## MULTIPOLYGON (((734500.1 5665909, 734539.2 5665...
## MULTIPOLYGON (((416304.5 5643695, 416558.5 5643...
The sfc
object class offers a vast range of functions and spatial data operations to be performed.
methods(class = "sfc")
## [1] [ [<- as.data.frame
## [4] c coerce format
## [7] fortify identify initialize
## [10] obj_sum Ops print
## [13] rep scale_type show
## [16] slotsFromS3 st_area st_as_binary
## [19] st_as_grob st_as_s2 st_as_sf
## [22] st_as_text st_bbox st_boundary
## [25] st_buffer st_cast st_centroid
## [28] st_collection_extract st_convex_hull st_coordinates
## [31] st_crop st_crs st_crs<-
## [34] st_difference st_geometry st_inscribed_circle
## [37] st_intersection st_intersects st_is
## [40] st_is_valid st_line_merge st_m_range
## [43] st_make_valid st_nearest_points st_node
## [46] st_normalize st_point_on_surface st_polygonize
## [49] st_precision st_reverse st_sample
## [52] st_segmentize st_set_precision st_shift_longitude
## [55] st_simplify st_snap st_sym_difference
## [58] st_transform st_triangulate st_union
## [61] st_voronoi st_wrap_dateline st_write
## [64] st_z_range st_zm str
## [67] summary type_sum vec_cast.sfc
## [70] vec_ptype2.sfc
## see '?methods' for accessing help and source code
For instance, the function st_bbox()
retrieves the coordinates of the bounding box.
st_bbox(dists)
## xmin ymin xmax ymax
## 280371.1 5235856.0 921292.4 6101443.7
sfg
: simple feature geometrySimple feature geometry (sfg
) objects carry the geometry for a single feature, e.g. a point, linestring or polygon.
Simple feature geometries are implemented as R native data, using the following rules:
The coordinate reference system (CRS) specifies the location on Earth of a particular pair of coordinates. The sfc
object (geometry list-columns) has two attributes to store a CRS: epsg
and proj4string
. Using the st_crs()
function and the proj4string
method we can retrieve the coordinate reference system from a sf
or sfc
object.
st_crs(dists)$proj4string
## [1] "+proj=utm +zone=32 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
CRS transformation operations for a sf
object are carried out using the st_transform()
function.
Let us transform our data (dists
) into WGS 84/Pseudo-Mercator , the so called Google Web Mercator. The EPSG identifier is \(3857\).
st_crs(st_transform(dists, 3857))$proj4string
## [1] "+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs"
Using the ggplot2
package, we can plot sf
objects with different projections by specifying the coord_sf()
function.
For example, let us plot our data set in the WGS84/World Equidistant Cylindrical projection (EPSG \(4087\)).
ggplot(dists) +
geom_sf(aes(fill = pop)) +
scale_fill_viridis("pop") +
coord_sf(crs = st_crs(4087)) +
ggtitle("World Equidistant Cylindrical (EPSG 4087)") +
theme_bw()
For the sake of comparison we plot our data set in the WGS 84/Pseudo-Mercator projection (EPSG \(3857\)).
ggplot(dists) +
geom_sf(aes(fill = pop)) +
scale_fill_viridis("pop") +
coord_sf(crs = st_crs(3857)) +
ggtitle("WGS 84/Pseudo-Mercator (EPSG 3857)") +
theme_bw()
A nice look-up page for different coordinate systems is found here and a fancy visualization of many prominent map projections is found here.
sp
library(sp)
Spatial objects of class sp
can be converted into simple feature objects or geometries by st_as_sf()
and st_as_sfc()
, respectively:
methods(st_as_sf)
## [1] st_as_sf.data.frame* st_as_sf.lpp* st_as_sf.map*
## [4] st_as_sf.owin* st_as_sf.ppp* st_as_sf.ppplist*
## [7] st_as_sf.psp* st_as_sf.s2_geography* st_as_sf.sf*
## [10] st_as_sf.sfc* st_as_sf.Spatial* st_as_sf.SpatVector*
## see '?methods' for accessing help and source code
methods(st_as_sfc)
## [1] st_as_sfc.bbox* st_as_sfc.blob*
## [3] st_as_sfc.character* st_as_sfc.dimensions*
## [5] st_as_sfc.factor* st_as_sfc.list*
## [7] st_as_sfc.map* st_as_sfc.owin*
## [9] st_as_sfc.pq_geometry* st_as_sfc.psp*
## [11] st_as_sfc.raw* st_as_sfc.s2_geography*
## [13] st_as_sfc.SpatialLines* st_as_sfc.SpatialMultiPoints*
## [15] st_as_sfc.SpatialPixels* st_as_sfc.SpatialPoints*
## [17] st_as_sfc.SpatialPolygons* st_as_sfc.tess*
## [19] st_as_sfc.WKB*
## see '?methods' for accessing help and source code
Simple feature objects can be converted into a sp
class using the command as(<your-sf-object-here>, "Spatial")
.
Let us transform the geometry of the district border of Berlin (berlin_sf
) into a sp
object with the coordinate reference system WGS84 in one line of code.
berlin_sp_wgs84 <- as(st_transform(berlin_sf, 4326), "Spatial")
berlin_sp_wgs84@proj4string@projargs
## [1] "+proj=longlat +datum=WGS84 +no_defs"
We can save the resulting object for further usage.
save(berlin_sp_wgs84, file = "berlin_districts_sp_wgs84.RData")
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.