R provides a rich environment for cartographic plotting.
ggplot2
The powerful ggplot2
library is capable of producing
quite nice looking maps. In order to showcase these capacities we will
visualize the location of weather stations maintained by Deutscher Wetterdienst (DWD) (German Weather
Service). The data was downloaded from the DWD data portal on
April 21, 2017.
First, before plotting any data we have to prepare our data set.
Hence, we start by importing the data set and assign it a proper name.
In this example we use the read.csv2()
function for loading
the data.
dwd <- read.csv2("https://userpage.fu-berlin.de/soga/data/raw-data/DWD.csv",
stringsAsFactors = FALSE, sep = ","
)
The dwd data set consists of 599 rows, each of them
representing a particular weather station in Germany, and 22 columns,
each of them corresponding to a variable/feature related to that
particular weather station. For the purpose of this tutorial we are only
interested in the variables LAT
and LON
,
corresponding to the geographic coordinates of each particular weather
station in the data set, as well as the record length in years
(RECORD.LENGTH
). Hence, we subset our data set based on
these variables. Further, we make sure that we exclude all missing
values.
## subset data
dwd <- dwd[, c("LAT", "LON", "RECORD_LENGTH")]
## drop missing values
dwd <- dwd[complete.cases(dwd), ]
dwd$LAT <- as.numeric(dwd$LAT)
dwd$LON <- as.numeric(dwd$LON)
nrow(dwd)
## [1] 599
head(dwd)
## LAT LON RECORD_LENGTH
## 1 47.8413 8.8493 55
## 2 50.7827 6.0941 160
## 3 52.9335 8.2370 45
## 4 48.2156 8.9784 30
## 5 48.6159 13.0506 64
## 6 52.4853 7.9126 55
For the purpose of visualization we categorize the variable
RECORD.LENGTH
into five groups:
To achieve this task we apply the mutate()
function
provided by the dplyr
package. If you get an error message,
type install.packages('dplyr')
to install the package.
library(dplyr)
dwd <- dwd %>%
mutate(RECORD_LENGTH_CATEGORY = cut(RECORD_LENGTH,
breaks = c(-Inf, 10, 30, 60, 90, Inf),
labels = c(
"very short (<10)",
"short (10-30)",
"middle (30-60)",
"long (60-90)",
"very long (>90)"
)
))
Now we are ready to plot a map to visualize the spatial distribution
of the DWD weather stations. For this we rely on the raster
package and on the
ggplot2
package.
library(raster)
library(ggplot2)
If you get an error message saying that one or both of the packages
are not found, type
install.packages(c('raster', 'ggplot2'))
to install the
packages.
Basically, we only use the getData()
function from the
raster
package. This function provides us with geographic
data for Germany. Type help(getData)
into your R console
for further details.
# Retrieve Federal States by the the getData() function from the raster package
germany <- getData(country = "Germany", level = 1)
## Warning in getData(country = "Germany", level = 1): getData will be removed in a future version of raster
## . Please use the geodata package instead
## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.6.2, released 2023/01/02
## Path to GDAL shared files: C:/Users/mceck/AppData/Local/R/cache/R/renv/cache/v5/R-4.3/x86_64-w64-mingw32/rgdal/1.6-7/10b777236c9e7855bc9dea8e347e30b7/rgdal/gdal
## GDAL does not use iconv for recoding strings.
## GDAL binary built with GEOS: TRUE
## Loaded PROJ runtime: Rel. 9.2.0, March 1st, 2023, [PJ_VERSION: 920]
## Path to PROJ shared files: C:/Users/mceck/AppData/Local/R/cache/R/renv/cache/v5/R-4.3/x86_64-w64-mingw32/rgdal/1.6-7/10b777236c9e7855bc9dea8e347e30b7/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.6-1
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
In the next step we build a ggplot2()
graph, layer by
layer.
ggplot()
).geom
layer, more specifically a
geom_polygon()
. Here, we specify the variables to find
longitude and latitude as well as defining the color of the borders and
the filling of the polygon.geom
layer
(geom_point()
), representing the locations of the DWD
weather stations. Again, we specify latitude and longitude, as well as
color and size parameters.coord_map()
).theme_bw()
,xlab()
,ylab()
,labs()
and ggtitle()
).library(mapproj)
ggplot() +
geom_polygon(
data = germany,
aes(x = long, y = lat, group = group),
colour = "grey10", fill = "#fff7bc"
) +
geom_point(
data = dwd,
aes(x = LON, y = LAT, col = RECORD_LENGTH_CATEGORY),
alpha = .5,
size = 1.5
) +
coord_map() +
theme_bw() +
xlab("Longitude") +
ylab("Latitude") +
labs(color = "Record length") +
ggtitle("DWD Weather Stations")
ggmap
packageSource: Kahle and Wickham (2013) ggmap : Spatial Visualization with ggplot2, The R Journal Vol. 5/1
The basic idea driving ggmap
is to take a downloaded map
image, plot it as a context layer using ggplot2
and then
plot additional content layers of data, statistics or models on top of
the map. In ggmap
this process is broken down into two
pieces:
get_map()
or get_stamenmap()
, andggmap()
.Once we have installed the ggmap
package using
install.packages("ggmap")
we can load it to use its
functions
library(ggmap)
Get the map
One way to obtain a raster
object for plotting with
ggmap
is the get_map()
function. Since this
function uses the Google API which requires registration with the Google
Cloud Platform and obtaining an API key, we will not enlarge upon it
here. Instead, we will use the function get_stamenmap()
which accesses a tile server for Stamen Maps, based on data by OpenStreetMap.
First, we have to decide on the extent of our map. For the purpose of
this tutorial we map the neighborhood of GeoCampus Lankwitz, Berlin, Germany. In order to
retrieve the geographic coordinates of GeoCampus Lankwitz we apply the
geocode_OSM()
function from the tmaptools()
package. All we have to do
is to feed the function with a string of the postal address and it
returns a pair of coordinates using a public API from the OpenStreetMap
called Nominatim.
After installing the tmaptools()
package using
install.packages("tmaptools")
, we retrieve the coordinates
for the GeoCampus Lankwitz using the geocode_OSM()
function. Note that we specifically save the output as a data frame, so
that we can extract only the lat and lon coordinates easily.
library(tmaptools)
geocampus <- geocode_OSM(q = "Malteserstrasse 74-100, Berlin", as.data.frame = TRUE)
geocampus_latlon <- geocampus[, c("lat", "lon")]
geocampus_latlon
## lat lon
## 1 52.4262 13.35803
The function get_stamenmap()
requires a bounding box in
the format c(lowerleftlon, lowerleftlat, upperrightlon,
upperrightlat) as an input. Thus, we need to create such a bounding
box around the geographic coordinates of GeoCampus Lankwitz.
berlin_bb <- c(
left = geocampus_latlon$lon - 0.05,
bottom = geocampus_latlon$lat - 0.02,
right = geocampus_latlon$lon + 0.05,
top = geocampus_latlon$lat + 0.04
)
Then we download the actual map as a raster
object by
calling the get_stamenmap()
function. The zoom argument
controls the level of detail in the map. The larger the number, the
greater the detail. You can also play around with the appearance of the
map by for example specifying maptype = "watercolor"
.
Finally, we plot the map using the ggmap()
function.
berlin <- get_stamenmap(bbox = berlin_bb, zoom = 14)
berlin_map <- ggmap(berlin, extent = "device")
berlin_map
Nice, there it is a map of the neighborhood of the GeoCampus Lankwitz.
Make the plot
The next steps aim to add content to the basemap above. We start by adding the actual location of GeoCampus Lankwitz, mark it with a red dot and annotate it.
p <- berlin_map +
## add point
geom_point(aes(x = lon, y = lat),
data = geocampus_latlon,
col = "red",
size = 2
) +
## add text
geom_text(
x = geocampus_latlon$lon + 0.005, # add a small number
y = geocampus_latlon$lat + 0.0015, # add a small number
label = "GeoCampus",
size = 2
)
p
Looks good!
Now we add even more geographic content. We read in an ESRI
shapefile with point data information. The shapefile
osm_pois_p.shp
stores points of interest in Berlin
provided by OpenStreetMap (OSM). The data was
downloaded from GEOFABRIK on June 25, 2017, and contains
OpenStreetMap data as of June 22, 2017 (see here).
First we download the zipped shapefile to a temporary folder and extract it afterwards.
# build a temporary folder on disk
temp <- tempfile()
download_url <- "https://userpage.fu-berlin.de/soga/data/raw-data/spatial/"
zipfile <- "osm_pois_p.zip"
## download the file
download.file(paste0(download_url, zipfile), temp, mode = "wb")
## unzip the file(s)
unzip(temp)
## close file connection
unlink(temp)
After downloading and unzipping of the file, we open it by calling
the readOGR()
function from the rgdal
package.
library(rgdal)
## read in the data
pois <- readOGR(
dsn = ".", "osm_pois_p",
use_iconv = TRUE, # enable encoding
encoding = "UTF-8"
)
Finally, we use the fortify()
function from the
ggfortify
package to transform the shapefile into a
data.frame
object.
library(ggfortify)
pois_df <- fortify(pois)
class(pois_df)
## [1] "data.frame"
The data set pois_df
is quite large. It has 49091 rows
(observations) and 6 feature columns.
nrow(pois_df)
## [1] 49091
ncol(pois_df)
## [1] 6
head(pois_df)
## osm_id code fclass name long lat
## 1 16541597 2907 camera_surveillance Aral 13.34544 52.54644
## 2 21677342 2701 tourist_info <NA> 13.37086 52.51940
## 3 26735749 2301 restaurant Aida 13.32283 52.50691
## 4 26735753 2006 telephone <NA> 13.32212 52.50647
## 5 26735759 2301 restaurant Madame Ngo 13.31809 52.50621
## 6 26735763 2301 restaurant Sakana 13.32078 52.50734
For now we are only interested in the variable fclass
,
which refers to the category of the points in space.
unique(pois_df$fclass)
## [1] "camera_surveillance" "tourist_info" "restaurant"
## [4] "telephone" "biergarten" "recycling_glass"
## [7] "kiosk" "memorial" "fire_station"
## [10] "bank" "post_office" "toilet"
## [13] "cinema" "library" "police"
## [16] "hospital" "post_box" "chemist"
## [19] "hotel" "motel" "arts_centre"
## [22] "cafe" "fast_food" "optician"
## [25] "convenience" "car_sharing" "shelter"
## [28] "viewpoint" "attraction" "pub"
## [31] "supermarket" "bakery" "sports_shop"
## [34] "pharmacy" "atm" "playground"
## [37] "theatre" "school" "car_rental"
## [40] "gift_shop" "museum" "guesthouse"
## [43] "beverages" "bar" "travel_agent"
## [46] "embassy" "butcher" "pitch"
## [49] "stationery" "outdoor_shop" "kindergarten"
## [52] "fountain" "university" "monument"
## [55] "beauty_shop" "sports_centre" "hostel"
## [58] "tower" "bench" "waste_basket"
## [61] "recycling_clothes" "artwork" "nightclub"
## [64] "windmill" "drinking_water" "computer_shop"
## [67] "furniture_shop" "recycling" "community_centre"
## [70] "bicycle_shop" "car_dealership" "vending_any"
## [73] "public_building" "bookshop" "florist"
## [76] "doityourself" "laundry" "park"
## [79] "comms_tower" "hairdresser" "clothes"
## [82] "mobile_phone_shop" "bicycle_rental" "video_shop"
## [85] "picnic_site" "jeweller" "college"
## [88] "veterinary" "dentist" "water_tower"
## [91] "hunting_stand" "vending_machine" "swimming_pool"
## [94] "track" "newsagent" "shoe_shop"
## [97] "mall" "doctors" "vending_cigarette"
## [100] "courthouse" "greengrocer" "town_hall"
## [103] "toy_shop" "nursing_home" "recycling_paper"
## [106] "garden_centre" "department_store" "vending_parking"
## [109] "theme_park" "car_wash" "graveyard"
## [112] "water_well" "water_works" "chalet"
## [115] "ruins" "prison" "zoo"
## [118] "battlefield" "stadium" "archaeological"
## [121] "camp_site" "observation_tower" "wayside_shrine"
## [124] "alpine_hut" "castle" "food_court"
## [127] "dog_park" "water_mill" "caravan_site"
## [130] "golf_course"
In order to shrink the volume of the data set we restrict the entries
to be either of type biergarten
(German for
open-air pub) or fast_food
.
selection <- pois_df[pois_df$fclass %in% c("biergarten", "fast_food"), ]
Now let us plot our selection, by adding a geom_point()
to our graph p
.
p <- p +
geom_point(aes(x = long, y = lat, colour = fclass),
data = selection
)
p
Interestingly, there are quite a few locations in the near distance, however, very close to GeoCampus Lankwitz there are very few.
Okay, then where to? Should we head North, South, East or West?
Well, let us make a density map which shows where the entries for our
selection (biergarten
and fast_food
) are
somehow clustered. ggplot2
offers the
stat_density2d()
function, which performs a 2D kernel density estimation and displays the results
with contours.
p +
stat_density2d(aes(x = long, y = lat),
bins = 2.6, alpha = 0.3,
data = selection, geom = "polygon"
)
Well, I guess we should probably head north-west!
mapview
packageThe mapview
package provides functions to
very quickly and conveniently create interactive visualizations of
spatial data. It allows quick interactive plotting to examine and
visually investigate both aspects of spatial data, the geometries and
their attributes. Visit the project’s website for more information and a quick
tour through the functionality of this awesome package.
Our goal is to localize GeoCampus Lankwitz, Berlin, Germany and plot the location onto an interactive map.
First, we have to determine the geographical location of GeoCampus
Lankwitz. Therefore, we make use of the geocode_OSM()
function in the tmaptools
package. This function allows
us to geocode a location (find its latitude and longitude)
using OpenStreetMap Nominatim. If you have not already, install the
tmaptools
package using
install.packages("tmaptools")
.
Let us retrieve the coordinates for the GeoCampus Lankwitz using the
geocode_OSM()
function as a data frame. We then extract
only the lat and lon coordinates from the result.
library(tmaptools)
geocampus <- geocode_OSM(q = "Malteserstrasse 74-100, Berlin", as.data.frame = TRUE)
geocampus_latlon <- geocampus[, c("lat", "lon")]
geocampus_latlon
## lat lon
## 1 52.4262 13.35803
Nice!
In the next step we transform the geocampus
date frame
object into a simple feature
object. Therefore we apply the
st_as_sf()
function from the sf
package. In
addition we provide that object with a spatial coordinate reference
system. In our example we use the EPSG identifier \(4326\), which corresponds to the World Geodetic System (WGS84).
library(sf)
geocampus_sf <- st_as_sf(
x = geocampus_latlon, coords = c("lon", "lat"),
crs = 4326
)
geocampus_sf
## Simple feature collection with 1 feature and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 13.35803 ymin: 52.4262 xmax: 13.35803 ymax: 52.4262
## Geodetic CRS: WGS 84
## geometry
## 1 POINT (13.35803 52.4262)
Once we have a spatial feature it is very easy to plot it using the
mapview()
function from the same-titled
mapview
package. Note that the mapview()
function has many additional arguments (type help(mapview)
into the R console), allowing for customization. Additionally, we can
use the leafpop
package to embed tables, images
or graphs in pop-ups of our interactive map (Feel free to scroll over
the blue marker in the map and click on it!).
library(mapview)
img <- "http://www.osa.fu-berlin.de/geographie_bsc/_medien/bilder_geocampus/bild_010/010_546.jpeg"
mapview(geocampus_sf,
map.types = "Esri.WorldImagery",
popup = leafpop:::popupIframe(img),
label = "GeoCampus Lankwitz"
)
leaflet
packageAnother great package for producing interactive maps in R is the leaflet
package. For more information
and a thorough introduction we recommend to review its website Leaflet for
R.
We will reproduce the same example as we did above using the
mapview
package.
In order to get started we install the leaflet
package
by typing install.packages("leaflet")
. Again, we retrieve
the coordinates for the GeoCampus Lankwitz using the
geocode_OSM()
function from the tmaptools
package. In this case the extracted lat and lon coordinates are already
enough to plot the map and we do not need to convert them to a spatial
feature.
library(tmaptools)
geocampus <- geocode_OSM(q = "Malteserstrasse 74-100, Berlin", as.data.frame = TRUE)
geocampus_latlon <- geocampus[, c("lat", "lon")]
geocampus_latlon
## lat lon
## 1 52.4262 13.35803
We then create a map widget using the leaflet()
function. Note that leaflet
maps are commonly produced
using the pipe operator %>%
of the magrittr
package. This makes our code
much more readable and allows us to gradually add elements to our
map.
library(leaflet)
img <- "http://www.osa.fu-berlin.de/geographie_bsc/_medien/bilder_geocampus/bild_010/010_546.jpeg"
geocampus_latlon %>%
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addMarkers(~lon, ~lat, label = "GeoCampus Lankwitz", popup = paste0("<img src = ", img, ">"))
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.