ggplot2 is a plotting system for R created by Hadley Wickham. ggplot2 offers a consistent and systematic approach to generate graphics based on the book Grammar of Graphics by Leland Wilkinson (data, aesthetic mapping, geoms, stats, scales, facets and themes). ggplot2 takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics. However, it is a comprehensive system and thus, not as easy to learn. Visit the ggplot2 website or take a quick tour through the R graph gallery featuring ggplot2 to get an overview of the functionality provided by ggplot2. Further, the Cookbook for R is a valuable source, as it provides solutions to common tasks and problems in analyzing data. Moreover RStudio provides a quite helpful cheat sheet summarizing the main ggplot2 functions.

Please be aware that within he scope of this tutorial we will only scratch the surface of the functionality of ggplot2. However, we highly recommend to apply ggplot2 to as many of the plotting task you are facing, because you will learn to master the grammar of graphics step by step and finally you may leverage the full power of this awesome plotting library.

The grammar of graphics with ggplot2

ggplot2 is based on the grammar of graphics, The basic idea of grammar of graphics, and hence of ggplot2 is, that you can build every graph from the same few components:

To begin a plot, which is finished by adding layers we apply the ggplot() function. Additional layers are added using the + operator.

ggplot(data, aes(x, y))

To display data values, variables in the data set are mapped (aesthetic mapping = “something you can see”) to aesthetic properties of the geom like size, color, shape, and x and y locations.

Note that there is a helper function called qplot() (for quick plot) that can hide much of this complexity when creating standard graphs.

Once again we make use of the students data set. You may download the students.csv file here.

students <- read.csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")

For a less packed visualization we restrict the data set to the first 100 entries.

students100 <- students[1:100,]

Let us construct a ggplot2 plot layer for layer. We start with a scatter plot of the variables weight and height.

First we load the library ggplot2. If you have not installed ggplot2 type install.packages("ggplot2") into your console.

library(ggplot2)

We initialize the plot using the ggplot() function and add a geom, more specifically a geom_point() (hence, we want points to be plotted). We use aes() to specify the variables in the data to be mapped as visual properties (aesthetics) of the geometric object (geom_point()).

p <- ggplot(data = students100, 
            aes(x = weight, y = height)) +
  geom_point()
p

Just three lines of code and we get back a quite nice looking plot!

Now there are several aesthetic properties available, such as color, shape and size. By adding color = gender or shape = gender to the aes() function we can condition the data on the categorical variable gender.

ggplot(data = students100, 
            aes(x = weight, y = height)) +
  geom_point(aes(color=gender))

ggplot(data = students100, 
            aes(x = weight, y = height)) +
  geom_point(aes(shape = gender))

We may easily expand this approach and add size = age to the aes() function in order to relate the size of the points to the age of the particular student. Please note how ggplot2 takes care of adding a nice looking legend to the graph. Finally, we assign our graph the variable p for further processing.

p <- ggplot(data = students100, 
            aes(x = weight, y = height)) +
  geom_point(aes(size=age, color=gender))
p