The sp package provides classes and methods for spatial data analysis and utility functions for plotting maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc. Many other packages have become dependent on the data representation provided by the sp package.

library(sp)

The foundation class of the sp package is the Spatial class with two slots. The first is a bounding box, a matrix of numerical coordinates. The second is a CRS class object defining the coordinate reference system.

By calling getClass() we can inspect the complete definition of a class, including its slot names and the types of their contents:

getClass("Spatial")
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

SpatialPoints class

The SpatialPoints class is the first subclass of Spatial. It extends the Spatial class by adding a coords slot for a matrix of point coordinates.

getClass("SpatialPoints")
## Class "SpatialPoints" [package "sp"]
## 
## Slots:
##                                           
## Name:       coords        bbox proj4string
## Class:      matrix      matrix         CRS
## 
## Extends: "Spatial"
## 
## Known Subclasses: 
## Class "SpatialPointsDataFrame", directly
## Class "SpatialPixels", directly
## Class "SpatialPixelsDataFrame", by class "SpatialPixels", distance 2

Let us create a SpatialPoints class object by ourselves. For this we need point coordinates and have to decide on the coordinate reference system of our data. For the first task we make use of the geocode_OSM() function in the tmaptools() package. This function allows us to geocode a location (find its latitude and longitude) using Nominatim.

Let us retrieve the coordinates for the GeoCampus Lankwitz using the geocode_OSM() function and the Nominatim API. Note that we specifically save the output as a data frame, so that we can extract only the lat and lon coordinates easily.

library(tmaptools)

geocampus <- geocode_OSM(q = "Malteserstrasse 74-100, Berlin", as.data.frame = TRUE)
geocampus_latlon <- geocampus[, c("lon", "lat")]
geocampus_latlon
##        lon     lat
## 1 13.35803 52.4262

Nice! In the next step we have to decide on a coordinate reference system (CRS). In general the CRS may be defined in several ways, for example as Well-known text (WKT) format, or JSON format, or GML format, or in the PROJ format, among many others.

The PROJ, formerly Proj4, format is a generic, string-based description of a CRS. It defines projection types and parameter values for particular projections. For instance the PROJ format string for the European Terrestrial Reference System 1989 (ETRS89) is \(\text {+proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no\_defs}\).

With respect to the enormous amount of existing CRS the International Association of Oil & Gas Producers (IOGP), formerly known as European Petroleum Survey Group (EPSG), built a collection of definitions for global, regional, national and local coordinate reference systems and coordinate transformations, the EPSG Geodetic Parameter Dataset. Within this collection each particular coordinate reference systems gets an unique integer identifier, commonly denoted as EPSG. For instance, the EPSG identifier for the ETRS89 is simply \(\mathtt 4258\) (for other commonly used representations of ETRS89 go here or here).

In R we may either provide Spatial objects the PROJ format string (proj4string) or simply the EPSG identifier. In our example we use the EPSG identifier \(4326\), which corresponds to the latest revision of the World Geodetic System (WGS 84).

The CRS() function of the sp package transforms the character string into a coordinate reference system (CRS) object.

latlong <- CRS("+init=epsg:4326")
geocampus_sp <- SpatialPoints(geocampus_latlon,
  proj4string = latlong
)
geocampus_sp
## SpatialPoints:
##           lon     lat
## [1,] 13.35803 52.4262
## Coordinate Reference System (CRS) arguments: +proj=longlat +datum=WGS84
## +no_defs

Without a geographical context geographical coordinates are difficult to understand, thus we plot the location of the GeoCampus Lankwitz by making use of the mapview package.

library(mapview)

mapview(geocampus_sp,
  map.types = "Esri.WorldImagery",
  label = "GeoCampus Lankwitz"
)

Well, the geocoding did quite a reasonable job!

We can save the geocampus_sp object for later usage.

save(geocampus_sp, file = "berlin_geocampus.RData")

SpatialPointsDataFrame class

Another Spatial class, the SpatialPointsDataFrame, extends the SpatialPoints class. The SpatialPointsDataFrame class acts like a container that allows us to store additional information, similar to the concept of an attribute table in a GIS.

getClass("SpatialPointsDataFrame")
## Class "SpatialPointsDataFrame" [package "sp"]
## 
## Slots:
##                                                                   
## Name:         data  coords.nrs      coords        bbox proj4string
## Class:  data.frame     numeric      matrix      matrix         CRS
## 
## Extends: 
## Class "SpatialPoints", directly
## Class "Spatial", by class "SpatialPoints", distance 2
## Class "SpatialVector", by class "SpatialPoints", distance 2
## 
## Known Subclasses: 
## Class "SpatialPixelsDataFrame", directly, with explicit coerce

The data slot is where the additional information is kept in form of a data.frame object.

Let us build a SpatialPointsDataFrame object. This time, however, we do not build it from scratch, but we read in an ESRI shapefile with point information.

The shapefile osm_pois_p.shp stores points of interest in Berlin provided by OpenStreetMap (OSM). The data was downloaded from GEOFABRIK on June 25, 2017, and contains OpenStreetMap data as of June 22, 2017 (see here).

# build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "osm_pois_p.zip"

## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
library(rgdal)
## read in the data
pois <- readOGR(
  dsn = ".", "osm_pois_p",
  use_iconv = TRUE, # enable encoding
  encoding = "UTF-8"
)

Several methods are available to access the values of the slots of a Spatial class object.

The bbox() method returns the bounding box of the object.

bbox(pois)
##                min      max
## coords.x1 13.08407 13.75799
## coords.x2 52.33794 52.66951

The proj4string() method reports the projection string, but it also has an assignment form, allowing the user to alter the current value.

proj4string(pois)
## [1] "+proj=longlat +datum=WGS84 +no_defs"

The str() functions shows the structure of the Spatial class object.

str(pois)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  49091 obs. of  4 variables:
##   .. ..$ osm_id: chr [1:49091] "16541597" "21677342" "26735749" "26735753" ...
##   .. ..$ code  : int [1:49091] 2907 2701 2301 2006 2301 2301 2701 2307 2031 2503 ...
##   .. ..$ fclass: chr [1:49091] "camera_surveillance" "tourist_info" "restaurant" "telephone" ...
##   .. ..$ name  : chr [1:49091] "Aral" NA "Aida" NA ...
##   ..@ coords.nrs : num(0) 
##   ..@ coords     : num [1:49091, 1:2] 13.3 13.4 13.3 13.3 13.3 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
##   ..@ bbox       : num [1:2, 1:2] 13.1 52.3 13.8 52.7
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +no_defs"
##   .. .. ..$ comment: chr "GEOGCRS[\"WGS 84\",\n    DATUM[\"World Geodetic System 1984\",\n        ELLIPSOID[\"WGS 84\",6378137,298.257223"| __truncated__

The summary() method shows the bounding box, whether the object is projected and the number of rows of coordinates.

summary(pois)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                min      max
## coords.x1 13.08407 13.75799
## coords.x2 52.33794 52.66951
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Number of points: 49091
## Data attributes:
##     osm_id               code         fclass              name          
##  Length:49091       Min.   :2001   Length:49091       Length:49091      
##  Class :character   1st Qu.:2301   Class :character   Class :character  
##  Mode  :character   Median :2512   Mode  :character   Mode  :character  
##                     Mean   :2490                                        
##                     3rd Qu.:2901                                        
##                     Max.   :2964

We can access the additional information in the data slot by using the @ operator.

head(pois@data, 15)
##      osm_id code              fclass                            name
## 1  16541597 2907 camera_surveillance                            Aral
## 2  21677342 2701        tourist_info                            <NA>
## 3  26735749 2301          restaurant                            Aida
## 4  26735753 2006           telephone                            <NA>
## 5  26735759 2301          restaurant                      Madame Ngo
## 6  26735763 2301          restaurant                          Sakana
## 7  26754448 2701        tourist_info                            <NA>
## 8  26865440 2307          biergarten                   Spinnerbrücke
## 9  26867409 2031     recycling_glass                            <NA>
## 10 26882423 2503               kiosk                 Aral Tankstelle
## 11 26972366 2724            memorial                     Konrad Zuse
## 12 27318009 2307          biergarten                         Loretta
## 13 27378775 2002        fire_station Freiwillige Feuerwehr Bohnsdorf
## 14 27431777 2724            memorial              Luftbrückendenkmal
## 15 28968292 2601                bank           Berliner Volksbank eG

The str() function reveals that we are actually dealing with a data.frame object.

str(pois@data)
## 'data.frame':    49091 obs. of  4 variables:
##  $ osm_id: chr  "16541597" "21677342" "26735749" "26735753" ...
##  $ code  : int  2907 2701 2301 2006 2301 2301 2701 2307 2031 2503 ...
##  $ fclass: chr  "camera_surveillance" "tourist_info" "restaurant" "telephone" ...
##  $ name  : chr  "Aral" NA "Aida" NA ...

Consequently, we can apply methods on that data.frame object.

Let us count the number of points of interest corresponding to surveillance cameras.

sum(pois@data$fclass == "camera_surveillance")
## [1] 1320

We can easily subset our SpatialPointsDataFrame object to include only points which are referenced as surveillance cameras.

surveillance_spdf <- pois[pois@data$fclass == "camera_surveillance", ]
summary(surveillance_spdf)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                min      max
## coords.x1 13.10776 13.72867
## coords.x2 52.38887 52.60603
## Is projected: FALSE 
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Number of points: 1320
## Data attributes:
##     osm_id               code         fclass              name          
##  Length:1320        Min.   :2907   Length:1320        Length:1320       
##  Class :character   1st Qu.:2907   Class :character   Class :character  
##  Mode  :character   Median :2907   Mode  :character   Mode  :character  
##                     Mean   :2907                                        
##                     3rd Qu.:2907                                        
##                     Max.   :2907

The generic plot() function gives us a view of the data.

plot(surveillance_spdf)

Let us contextualize the data by plotting it onto an interactive web-map.

mapview(surveillance_spdf)

More on classes

In addition to the classes discussed so far, the sp package provides many more classes:

For more details on these classes and on the sp package in general, refer to the documentation on CRAN, the vignettes (here, here and here) or the book of Bivand et al. (2013).


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.