For the purpose of illustration of the most popular types of compositional graphics we use a data set of the geochemistry of a sediment core recovered from the eastern Juyanze palaeolake in north-western China (Hartmann and Wünnemann, 2009). We load the data set and prepare if for the analysis.
library(compositions)
# load data set
g36 <- read.csv("http://userpages.fu-berlin.de/soga/data/raw-data/G36chemical.txt",
sep='\t',
row.names = 'Sample')
# exclude columns of no interest
g36 <- g36[, -c(1:5, ncol(g36)-1, ncol(g36))]
# rename columns for better readability
colnames(g36) <- gsub(".mg.g", "", colnames(g36))
The data set consists of 177 samples (rows) and 11 variables (columns).
str(g36)
## 'data.frame': 177 obs. of 11 variables:
## $ Ba : num 0.178 0.172 0.204 0.167 0.122 0.181 0.22 0.101 0.255 0.114 ...
## $ Ca : num 61.3 55.2 96.4 53 36 ...
## $ Fe : num 15.2 15.2 14 16.1 10.9 ...
## $ K : num 3.59 2.79 3.9 4.23 2.46 ...
## $ Mg : num 77.3 31.5 59 55.9 97.5 ...
## $ Mn : num 0.326 0.324 0.317 0.356 0.218 0.45 0.575 0.221 0.501 0.222 ...
## $ Na : num 18.78 9.84 18.33 14.19 16.34 ...
## $ PO4: num 0.865 0.785 0.976 1.002 0.667 ...
## $ S : num 24.6 15.4 18.2 14 22.7 ...
## $ Sr : num 0.485 0.446 0.74 0.352 0.324 0.496 0.507 0.081 0.782 0.068 ...
## $ Cl : num 27.6 14.9 27.4 24.7 26.1 ...
The elements Ca, Mg, Sr, Fe, Mn, K, Na, S, Ba, and PO4 were measured by an inductively coupled atomic emission spectrometer (ICP-OES Aqua regia-digestion), and Chloride (Cl) was measured after Mohr on extracted fluid by titration. All elements are given in mg/g.
In many cases a scatterplot is the first choice for exploratory data analysis. With respect to compositional data analysis it is important to realize that, scatterplots are neither scaling nor perturbation invariant and not subcompositionally coherent (Aitchison and Egozcue, 2005): It is not guaranteed that the plot of a closed subcomposition exhibits similar or even compatible patterns with the plot of the original data set; thus, a regression line drawn in such a plot cannot be trusted (van den Boogaart and Tolosana-Delgado 2013).
par(mfrow = c(1,3))
# plot 1
plot(g36[, c("Ca", "Mg")], pch = 16, main = 'No transformation')
abline(lm(Mg~Ca, data = g36), col = 'red', lty = 2)
legend('topright', legend = 'OLS', lty = 2, col = 'red', cex = 0.7)
# plot 2
plot(clo(g36)[, c("Ca", "Mg")], pch = 16, main = 'Closure, full data set')
abline(lm(Mg~Ca, data = as.data.frame(clo(g36))), col = 'red', lty = 2)
legend('topright', legend = 'OLS', lty = 2, col = 'red', cex = 0.7)
# plot 3
subcomp <- c("Ba", "Ca", "Mg", "K", "Mn", "PO4", "Sr")
plot(clo(g36[, subcomp])[, c("Ca", "Mg")], pch = 16, main = 'Closure, subcomposition')
abline(lm(Mg~Ca, data = as.data.frame(clo(g36[, subcomp]))), col = 'red', lty = 2)
legend('topright', legend = 'OLS', lty = 2, col = 'red', cex = 0.7)
Note that this is another example for subcompositional incoherence!
A compositional data set may consist of many variables, however, for graphical purposes one seldom represents the full sample. In many cases the visualization of the compositional data feature space is restricted to three-part (sub)compositions. For three parts, the simplex is an equilateral triangle, a so-called ternary diagram, with vertices at \(A = [\kappa, 0, 0]\), \(B = [0, \kappa, 0]\) and \(C = [0, 0, \kappa]\). Ternary diagrams represent the data as compositional and relative.
In order to plot a ternary diagram with R we make use of the
compositions
package. The package ships with the
acomp()
function, which applies a closure to the data
vector/matrix provided to the function and further assigns the resulting
object to a special object class, the acomp
class. This
acomp
object provides the means to analyse compositions in
the philosophical framework of the Aitchison simplex. If we apply the
generic plot()
function on a three-part compositions, of
the acomp
object, we get in return a ternary diagram.
xc = acomp(g36, c("Mg","Ca","Fe"))
plot(xc)
Another possibility to plot ternary diagrams is provided by the
ggtern
package, an extension to the functionality of the
ggplot2
package. First, we install the package by typing
install.packages('ggtern')
into the console, and then we
make the package available by calling the library()
function.
For the purpose of illustration we use the same data as in the
example from above. After selecting the three components of interest
(Mg, Ca, Fe), we apply the closure on the selection, and then we plot a
ternary diagram using the ggtern()
function. The package
comes with a lot of additional graphical features, which can be reviewed
in the package documentation in more detail.
library(ggtern)
# data preparation
data <- as.data.frame(clo(g36[, c("Mg","Ca","Fe")]))
# plotting
ggtern(data = data, aes(Mg, Fe, Ca)) +
geom_point(alpha = 0.5, size = 2, color = "black") +
theme_rgbw()
Note that if the three parts represented have too
different magnitudes, the data tends to plot on a border or a
vertex.
Important notice
> ggtern()
has been deleted from the CRAN-repsoitory on
May 7th, 2023 due to ignored maintainance request. You can rather use
the package Ternary:
Package Ternary
with functions
TernaryPlot()
and TernaryPoints
:
#install.packages("Ternary")
library(Ternary)
par(mar = c(2,2,2,2))
TernaryPlot(
alab = "Mg",
blab = "Ca",
clab = "Fe",
lab.offset = 0.16,
lab.col = "black",
point = "up",
clockwise = TRUE,
isometric = TRUE,
padding = 0.08,
col = "#FFFFFF",
grid.lines = 10,
grid.col = "pink" ,
grid.lty = "solid",
grid.minor.col = "lightblue",
grid.minor.lty = "solid",
grid.minor.lwd = 1,
axis.lty = "solid",
axis.labels = TRUE,
axis.cex = 0.8,
axis.font = 1,
axis.rotate = TRUE,
axis.tick = TRUE,
ticks.length = 0.025,
axis.col = "black" ,
ticks.col = "lightgrey"
)
TernaryPoints(g36[,c("Mg","Ca","Fe")],
type = "p",
cex = 1.2,
pch = 16,
lwd = 1,
lty = "solid",
col = "lightgreen"
)
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.