The scale of a measurement describes which values are possible outcomes and which mathematical operations on these values are meaningful in the context of the application. Classical statistical procedures distinguish several scales of measurement, such as categorical, ordinal, interval, and ratio.
Compositional data analysis adds additional complexity to this subject, as compositions are multivariate objects on themselves. Each component cannot be treated as an isolated variable because its value is only meaningful in relation to the amounts of the other variables. Thus, the whole composition has to be considered as a unique object with a particular scale and properties. In this scale, denoted as Aitchison geometry, the classical algebraic/geometric operations, such as addition/translation, product/scaling, scalar product/orthogonal projection, and Euclidean distance are neither subcompositionally coherent nor scaling invariant. Thus an alternative set of operations was proposed by Aitchison in order to replace these conventional ones in compositional geometry (van den Boogaart and Tolosana-Delgado 2013).
Perturbation plays the role of sum or
translation and is a closed component-wise product of the
compositions involved:
\[\mathbb z = x\oplus y = \mathscr C[x_1 \cdot y_1, ... x_D \cdot y_D]\]
Perturbation by the opposite element plays the role of subtraction:
\[\mathbf x \ominus \mathbf y = \mathbf x \oplus \ominus \mathbf y\]
Exercise: Write a function for a pertubutation of a composition. If you can’t remember how to define your own function cf.here
# Your code here
pert<-function(){
}
# Your code here
pert<-function(x,y){
res<-(x*y)/sum(x*y)
print(res)
}
x<-c(1,2,1)
x<-x/sum(x) #closure of x
y<-c(1,3,1)
y<-y/sum(y)
pert(x,y)
With the compositions
package, one can perturb a pair of
compositions in two ways: by using the command
perturbe(x,y)
or by adding or subtracting two vectors of
class acomp
.
library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
# perturbation
x+y
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
perturbe(x,y)
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
# inverse perturbation
x-y
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"
perturbe(x,-y)
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"
A very neat use case of perturbation in compositional data analysis is related to the change of units of a composition. Let us consider once again the data set of the geochemistry of a sediment core recovered from the eastern Juyanze palaeolake in north-western China (Hartmann and Wünnemann, 2009). We load the data set and prepare if for the analysis.
# load data set
g36 <- read.csv("http://userpages.fu-berlin.de/soga/data/raw-data/G36chemical.txt",
sep='\t',
row.names = 'Sample')
# exclude columns of no interest
g36 <- g36[, -c(1:5, ncol(g36)-1, ncol(g36))]
# rename columns for better readability
colnames(g36) <- gsub(".mg.g", "", colnames(g36))
str(g36)
## 'data.frame': 177 obs. of 11 variables:
## $ Ba : num 0.178 0.172 0.204 0.167 0.122 0.181 0.22 0.101 0.255 0.114 ...
## $ Ca : num 61.3 55.2 96.4 53 36 ...
## $ Fe : num 15.2 15.2 14 16.1 10.9 ...
## $ K : num 3.59 2.79 3.9 4.23 2.46 ...
## $ Mg : num 77.3 31.5 59 55.9 97.5 ...
## $ Mn : num 0.326 0.324 0.317 0.356 0.218 0.45 0.575 0.221 0.501 0.222 ...
## $ Na : num 18.78 9.84 18.33 14.19 16.34 ...
## $ PO4: num 0.865 0.785 0.976 1.002 0.667 ...
## $ S : num 24.6 15.4 18.2 14 22.7 ...
## $ Sr : num 0.485 0.446 0.74 0.352 0.324 0.496 0.507 0.081 0.782 0.068 ...
## $ Cl : num 27.6 14.9 27.4 24.7 26.1 ...
The data set consists of 177 samples (rows) and 11 variables (columns). The variables are the elements Ca, Mg, Sr, Fe, Mn, K, Na, S, Ba, PO4, and Cl, which are given as mass units (mg/g).
We now want to recast our data set from a composition of mass
proportions to a composition of molar proportions. This can be easily
achieved by (inverse) perturbation. All we have to do is to divide each
component by its molar weight. In order find the molar
weight for each particular element in our data set we may look it up
in a table or make use of R. For the R software there exist several
packages which ship with a function to return the molar weight of any
particular element. Here, for this example we install the
[marelac
package](https://mran.microsoft.com/snapshot/2023-02-17/web/packages/marelac/index.html.
This package provides the molweight()
function, which
returns the molecular weight in g/mol for each particular element given
to the function.
library(marelac)
molweight(colnames(g36)) # molecular weight in g/mol
## Ba Ca Fe K Mg Mn Na PO4
## 137.32700 40.07800 55.84500 39.09830 24.30500 54.93805 22.98977 94.97136
## S Sr Cl
## 32.06500 87.62000 35.45300
g36.comp.mas <- acomp(g36) # composition of mass proportions
mw <- acomp(molweight(colnames(g36))) # molar weights
g36.comp.mol <- g36.comp.mas - mw # inverse perturbation
g36.comp.mol[1:5,] # first 5 rows of composition of molar proportions
## Ba Ca Fe K Mg
## G36-1 "0.0001738203" "0.2051454" "0.03654827" "0.012326978" "0.4263909"
## G36-10 "0.0002868280" "0.3153576" "0.06216757" "0.016359200" "0.2969885"
## G36-103 "0.0002021487" "0.3274182" "0.03401714" "0.013573861" "0.3303898"
## G36-108 "0.0002099890" "0.2282233" "0.04984445" "0.018681793" "0.3972189"
## G36-11 "0.0001210366" "0.1225157" "0.02651906" "0.008575641" "0.5467080"
## Mn Na PO4 S Sr
## G36-1 "0.0007957577" "0.10954622" "0.0012214055" "0.10267310" "0.0007422922"
## G36-10 "0.0013505815" "0.09801882" "0.0018928927" "0.10984343" "0.0011656834"
## G36-103 "0.0007852044" "0.10849870" "0.0013984721" "0.07728147" "0.0011492781"
## G36-108 "0.0011189556" "0.10658199" "0.0018218437" "0.07544717" "0.0006937060"
## G36-11 "0.0005406250" "0.09683454" "0.0009568547" "0.09657863" "0.0005037959"
## Cl
## G36-1 "0.10443591"
## G36-10 "0.09656888"
## G36-103 "0.10528572"
## G36-108 "0.12015787"
## G36-11 "0.10014615"
## attr(,"class")
## [1] "acomp"
Now, g36.comp.mol
contains the composition of elements
in terms of molar proportions. Most changes of compositional units may
be expressed as a perturbation (molar proportions, mass percentages,
volumetric proportions, molality, ppm, etc.). Perturbation is considered
as the fundamental operation in compositional data analysis, equally
important as the vector addition is in the usual real vector space \(\mathbb R^D\) (van
den Boogaart and Tolosana-Delgado 2013).
Powering or power transformation
replaces the product of a vector by a scalar (geometrically,
this is equivalent to scaling) and is defined as the closed powering of
the components by a given scalar:
\[\mathbf z = \lambda \odot \mathbf x = \mathscr C[x_1^\lambda, .., x_D^\lambda]\]
Exercise: Write a function for powering of a composition. If you can’t remember how to define your own function cf.here
# Your code here
pow<-function(){
}
# Your code here
pow<-function(x,lambda=1){
res<-(x^lambda)/sum(x^lambda)
print(res)
}
x<-c(1,2,1)
x<-x/sum(x) #closure
pow(x,2)
The compositions
package offers two ways to power a
composition by a scalar: with the generic function
power.acomp()
or by multiplying a scalar by a vector of
class acomp
.
library(compositions)
x = acomp(c(1,2,1))
x
## [1] "0.25" "0.50" "0.25"
## attr(,"class")
## [1] "acomp"
s <- 2 # scalar
power.acomp(x,s)
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"
x*s
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"
The Aitchison scalar product for compositions replaces
the conventional scalar product:
\[\langle \mathbf x, \mathbf y \rangle_A=\frac{1}{D}\sum_{i<j}^D \ln\frac{x_i}{x_j} \ln \frac{y_i}{y_j}\]
The generic function scalar(x,y)
of the
compositions
packages computes the scalar product.
library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
scalar(x,y)
## [1] 0.5076667
The norm of a vector, its length, is
\[\Vert \mathbf x \Vert_A = \sqrt{\langle \mathbf x, \mathbf x \rangle_A}\text{.}\]
The norm of a vector can be obtained by the norm()
function. If x
is a data matrix of class
acomp
, then the result of norm(x)
will be a
vector giving the norm of each row.
u <- c(1,2,3)
v <- c(2,2,2)
w <- c(3,1,2)
uvw <- acomp(rbind(u,v,w))
norm(uvw)
## [1] 0.785664 0.000000 0.785664
The normalized vector of \(x\) (or its direction, or a unit-norm
vector) is
\[\mathbf v = \Vert \mathbf x\Vert^{-1}_A \odot \mathbf x\]
The generic function normalize()
provides this
normalization, either for a single composition or for the rows of a data
matrix.
normalize(uvw)
## [,1] [,2] [,3]
## u "0.1339635" "0.3236980" "0.5423385"
## v " MAR" " MAR" " MAR"
## w "0.5423385" "0.1339635" "0.3236980"
## attr(,"class")
## [1] "acomp"
The angle between two compositions allows to
assess if two compositions are orthogonal (scalar product is \(0\), or their angle is \(90^\circ\)).
\[ \alpha (\mathbf x, \mathbf y) = \cos^{-1}\frac{\langle \mathbf x, \mathbf y\rangle_A}{\Vert\mathbf x \Vert_A \cdot \Vert \mathbf y \Vert_A}=\cos^{-1} \langle\Vert \mathbf x \Vert_A^{-1} \odot \mathbf x, \Vert \mathbf y \Vert_A^{-1}\odot \mathbf y \rangle_A\]
acos(scalar(normalize(u),normalize(v)))
## [1] 0.3875967
The Aitchison distance is the distance between
two compositions:
\[ d_A(\mathbf x, \mathbf y) = \Vert \mathbf x \ominus \mathbf x, \mathbf y\Vert_A = \sqrt{\frac{1}{D}\sum_{i=1}^D\sum_{j>i}^D \left( \ln\frac{x_i}{x_j}-\ln\frac{y_i}{y_j}\right)^2}\]
It can be computed by norm(x-y)
, or by the generic
dist()
function of the compositions
package.
The dist()
function computes the Aitchison distances
between all rows of an acomp
data matrix.
norm(acomp(u)-acomp(v))
## [1] 0.785664
dist(uvw)
## u v
## v 0.785664
## w 1.360810 0.785664
Note that other compositional distances, such as the Manhattan distance, are also available through
the extra argument method
:
dist(uvw, method = 'manhattan')
## u v
## v 1.194506
## w 2.197225 1.194506
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.