ggplot2 is a plotting system for R
created by Hadley
Wickham. ggplot2 offers a consistent and systematic
approach to generate graphics based on the book Grammar of Graphics by Leland Wilkinson
(data, aesthetic mapping, geoms,
stats, scales, facets and themes).
ggplot2 takes care of many of the fiddly details that make
plotting a hassle (like drawing legends) as well as providing a powerful
model of graphics that makes it easy to produce complex multi-layered
graphics. However, it is a comprehensive system and thus, not as easy to
learn. Visit the ggplot2 website or take a quick tour
through the R graph gallery featuring ggplot2 to
get an overview of the functionality provided by ggplot2.
Further, the Cookbook for R is a valuable source, as it provides
solutions to common tasks and problems in analyzing data. Moreover, RStudio provides a
quite helpful cheat sheet summarizing the main
ggplot2 functions.
Please be aware that within the scope of this tutorial we will only
scratch the surface of the functionality of ggplot2.
However, we highly recommend to apply ggplot2 to as many as
possible of the plotting tasks you are facing. You will learn to master
the grammar of graphics step by step and finally you may leverage the
full power of this awesome plotting library.
The grammar of graphics with ggplot2
ggplot2 is based on the grammar of graphics. The basic
idea of grammar of graphics, and hence of ggplot2 is, that
you can build every graph from the same few components:
a data set,
a set of geom’s: visual marks that represent data
points, and
a coordinate system. In addition there are some more
components, such as statistical transformation (stat),
scales, position adjustments and faceting, to
allow fine tuning and enhance the spectrum of graphs being
generated.
In order to begin a plot we apply the ggplot() function.
The plot is finished by adding layers using the +
operator.
ggplot(data, aes(x, y))
To display data values, variables in the data set are mapped
(aesthetic mapping = “something you can see”) to aesthetic properties of
the geom like size, color,
shape, and x and y locations.
Note that there is a helper function called qplot() (for
quick plot) that can hide much of this complexity when creating standard
graphs.
Once again, we make use of the students data set. You may
download the students.csv file here.
students <- read.csv("https://userpage.fu-berlin.de/soga/data/raw-data/students.csv")
For a less packed visualization we restrict the data set to the first 100 entries.
students100 <- students[1:100, ]
Let us construct a ggplot2 plot layer. We start with a
scatter plot of the variables weight and
height.
First we load the library ggplot2. If you have not
installed ggplot2 type
install.packages("ggplot2") into your console.
library(ggplot2)
We initialize the plot using the ggplot() function and
add a geom, more specifically a geom_point()
(hence, we want points to be plotted). We use aes() to
specify the variables in the data to be mapped as visual properties
(aesthetics) of the geometric object (geom_point()).
p <- ggplot(
data = students100,
aes(x = weight, y = height)
) +
geom_point()
p
Just three lines of code and we get back a quite nice looking plot!
In addition, there are several aesthetic properties available, such
as color, shape and size. By
adding color = gender or shape = gender to the
aes() function we can condition the data on the categorical
variable gender.
ggplot(
data = students100,
aes(x = weight, y = height)
) +
geom_point(aes(color = gender))
ggplot(
data = students100,
aes(x = weight, y = height)
) +
geom_point(aes(shape = gender))
We may easily expand this approach and add size = age to
the aes() function in order to relate the size of the
points to the age of the particular student. Please note how
ggplot2 takes care of adding a nice looking legend to the
graph. Finally, we assign our graph the variable p to save
it for further processing.
p <- ggplot(
data = students100,
aes(x = weight, y = height)
) +
geom_point(aes(size = age, color = gender))
p
We can also apply more than one geom to our graph. For
this, we now add geom_smooth() to our graph p
defined above. We use method = 'loess' as a property
(lm, linear model is the default) to add a locally weighted
scatter plot smoothing. Note that by default the standard error band
se = TRUE will be plotted. We can avoid this behavior by
setting se = FALSE.
p <- p +
geom_smooth(
method = "loess"
# , se = FALSE ## uncomment to avoid plotting CI
)
p
Finally, we add a title (ggtitle()) to the graph and
change the theme (theme_bw()) to be a white background with
grid lines.
p <- p +
ggtitle("my first graph with ggplot2") +
theme_bw()
p
To conclude this section we show some more simple graph examples
using the ggplot2 package. Hopefully, this quick tour
through ggplot2 encouraged you to apply this plotting
library for your own tasks.
Boxplot
ggplot(
data = students100,
aes(x = religion, y = age)
) +
stat_boxplot() +
geom_point() +
theme_bw()
Conditioned boxplot
ggplot(
data = students100,
aes(x = religion, y = age)
) +
stat_boxplot(aes(fill = religion)) +
geom_point() +
facet_wrap(~gender) +
theme_bw()
Barplot
ggplot(data = students100, aes(fill = major, y = salary, x = religion)) +
geom_bar(stat = "identity") +
theme_classic()
Density plot
ggplot(data = students100, aes(weight, colour = gender, fill = gender)) +
geom_density(alpha = 0.55) +
theme_minimal()
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.