The scale of a measurement describes which values are possible outcomes and which mathematical operations on these values are meaningful in the context of the application. Classical statistical procedures distinguish several scales of measurement, such as categorical, ordinal, interval, and ratio.

Compositional data analysis adds additional complexity to this subject, as compositions are multivariate objects on themselves. Each component cannot be treated as an isolated variable because its value is only meaningful in relation to the amounts of the other variables. Thus, the whole composition has to be considered as a unique object with a particular scale and properties. In this scale, denoted as Aitchison geometry, the classical algebraic/geometric operations, such as addition/translation, product/scaling, scalar product/orthogonal projection, and Euclidean distance are neither subcompositionally coherent nor scaling invariant. Thus an alternative set of operations was proposed by Aitchison in order to replace these conventional ones in compositional geometry (van den Boogaart and Tolosana-Delgado 2013).


Perturbation


Perturbation plays the role of sum or translation and is a closed component-wise product of the compositions involved:

\[\mathbb z = x\oplus y = \mathscr C[x_1 \cdot y_1, ... x_D \cdot y_D]\]

Perturbation by the opposite element plays the role of subtraction:

\[\mathbf x \ominus \mathbf y = \mathbf x \oplus \ominus \mathbf y\]

Exercise: Write a function for a pertubutation of a composition. If you can’t remember how to define your own function cf.here

# Your code here
pert<-function(){
  
}
Show code
# Your code here
pert<-function(x,y){
 res<-(x*y)/sum(x*y)
  print(res) 
}
x<-c(1,2,1)
x<-x/sum(x) #closure of x
y<-c(1,3,1)
y<-y/sum(y)
pert(x,y)


With the compositions package, one can perturb a pair of compositions in two ways: by using the command perturbe(x,y) or by adding or subtracting two vectors of class acomp.

library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
# perturbation
x+y
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
perturbe(x,y)
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
# inverse perturbation
x-y
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"
perturbe(x,-y)
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"

A very neat use case of perturbation in compositional data analysis is related to the change of units of a composition. Let us consider once again the data set of the geochemistry of a sediment core recovered from the eastern Juyanze palaeolake in north-western China (Hartmann and Wünnemann, 2009). We load the data set and prepare if for the analysis.

# load data set
g36 <- read.csv("http://userpages.fu-berlin.de/soga/data/raw-data/G36chemical.txt",
                sep='\t',
                row.names = 'Sample')
# exclude columns of no interest 
g36 <- g36[, -c(1:5, ncol(g36)-1, ncol(g36))]
# rename columns for better readability
colnames(g36) <- gsub(".mg.g", "", colnames(g36))
str(g36)
## 'data.frame':    177 obs. of  11 variables:
##  $ Ba : num  0.178 0.172 0.204 0.167 0.122 0.181 0.22 0.101 0.255 0.114 ...
##  $ Ca : num  61.3 55.2 96.4 53 36 ...
##  $ Fe : num  15.2 15.2 14 16.1 10.9 ...
##  $ K  : num  3.59 2.79 3.9 4.23 2.46 ...
##  $ Mg : num  77.3 31.5 59 55.9 97.5 ...
##  $ Mn : num  0.326 0.324 0.317 0.356 0.218 0.45 0.575 0.221 0.501 0.222 ...
##  $ Na : num  18.78 9.84 18.33 14.19 16.34 ...
##  $ PO4: num  0.865 0.785 0.976 1.002 0.667 ...
##  $ S  : num  24.6 15.4 18.2 14 22.7 ...
##  $ Sr : num  0.485 0.446 0.74 0.352 0.324 0.496 0.507 0.081 0.782 0.068 ...
##  $ Cl : num  27.6 14.9 27.4 24.7 26.1 ...

The data set consists of 177 samples (rows) and 11 variables (columns). The variables are the elements Ca, Mg, Sr, Fe, Mn, K, Na, S, Ba, PO4, and Cl, which are given as mass units (mg/g).

We now want to recast our data set from a composition of mass proportions to a composition of molar proportions. This can be easily achieved by (inverse) perturbation. All we have to do is to divide each component by its molar weight. In order find the molar weight for each particular element in our data set we may look it up in a table or make use of R. For the R software there exist several packages which ship with a function to return the molar weight of any particular element. Here, for this example we install the [marelac package](https://mran.microsoft.com/snapshot/2023-02-17/web/packages/marelac/index.html. This package provides the molweight() function, which returns the molecular weight in g/mol for each particular element given to the function.

library(marelac)
molweight(colnames(g36)) # molecular weight in g/mol
##        Ba        Ca        Fe         K        Mg        Mn        Na       PO4 
## 137.32700  40.07800  55.84500  39.09830  24.30500  54.93805  22.98977  94.97136 
##         S        Sr        Cl 
##  32.06500  87.62000  35.45300
g36.comp.mas <- acomp(g36) # composition of mass proportions
mw <- acomp(molweight(colnames(g36))) # molar weights
g36.comp.mol <- g36.comp.mas - mw # inverse perturbation
g36.comp.mol[1:5,] # first 5 rows of composition of molar proportions
##         Ba             Ca          Fe           K             Mg         
## G36-1   "0.0001738203" "0.2051454" "0.03654827" "0.012326978" "0.4263909"
## G36-10  "0.0002868280" "0.3153576" "0.06216757" "0.016359200" "0.2969885"
## G36-103 "0.0002021487" "0.3274182" "0.03401714" "0.013573861" "0.3303898"
## G36-108 "0.0002099890" "0.2282233" "0.04984445" "0.018681793" "0.3972189"
## G36-11  "0.0001210366" "0.1225157" "0.02651906" "0.008575641" "0.5467080"
##         Mn             Na           PO4            S            Sr            
## G36-1   "0.0007957577" "0.10954622" "0.0012214055" "0.10267310" "0.0007422922"
## G36-10  "0.0013505815" "0.09801882" "0.0018928927" "0.10984343" "0.0011656834"
## G36-103 "0.0007852044" "0.10849870" "0.0013984721" "0.07728147" "0.0011492781"
## G36-108 "0.0011189556" "0.10658199" "0.0018218437" "0.07544717" "0.0006937060"
## G36-11  "0.0005406250" "0.09683454" "0.0009568547" "0.09657863" "0.0005037959"
##         Cl          
## G36-1   "0.10443591"
## G36-10  "0.09656888"
## G36-103 "0.10528572"
## G36-108 "0.12015787"
## G36-11  "0.10014615"
## attr(,"class")
## [1] "acomp"

Now, g36.comp.mol contains the composition of elements in terms of molar proportions. Most changes of compositional units may be expressed as a perturbation (molar proportions, mass percentages, volumetric proportions, molality, ppm, etc.). Perturbation is considered as the fundamental operation in compositional data analysis, equally important as the vector addition is in the usual real vector space \(\mathbb R^D\) (van den Boogaart and Tolosana-Delgado 2013).


Powering


Powering or power transformation replaces the product of a vector by a scalar (geometrically, this is equivalent to scaling) and is defined as the closed powering of the components by a given scalar:

\[\mathbf z = \lambda \odot \mathbf x = \mathscr C[x_1^\lambda, .., x_D^\lambda]\]

Exercise: Write a function for powering of a composition. If you can’t remember how to define your own function cf.here

# Your code here
pow<-function(){
  
}

Show code
# Your code here
pow<-function(x,lambda=1){
 res<-(x^lambda)/sum(x^lambda)
  print(res) 
}
x<-c(1,2,1)
x<-x/sum(x) #closure
pow(x,2)


The compositions package offers two ways to power a composition by a scalar: with the generic function power.acomp() or by multiplying a scalar by a vector of class acomp.

library(compositions)
x = acomp(c(1,2,1))
x
## [1] "0.25" "0.50" "0.25"
## attr(,"class")
## [1] "acomp"
s <- 2 # scalar
power.acomp(x,s)
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"
x*s
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"

Aitchison scalar product


The Aitchison scalar product for compositions replaces the conventional scalar product:

\[\langle \mathbf x, \mathbf y \rangle_A=\frac{1}{D}\sum_{i<j}^D \ln\frac{x_i}{x_j} \ln \frac{y_i}{y_j}\]

The generic function scalar(x,y) of the compositions packages computes the scalar product.

library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
scalar(x,y)
## [1] 0.5076667

Norm


The norm of a vector, its length, is

\[\Vert \mathbf x \Vert_A = \sqrt{\langle \mathbf x, \mathbf x \rangle_A}\text{.}\]

The norm of a vector can be obtained by the norm() function. If x is a data matrix of class acomp, then the result of norm(x) will be a vector giving the norm of each row.

u <- c(1,2,3)
v <- c(2,2,2)
w <- c(3,1,2)
uvw <- acomp(rbind(u,v,w))
norm(uvw)
## [1] 0.785664 0.000000 0.785664

Unit vector


The normalized vector of \(x\) (or its direction, or a unit-norm vector) is

\[\mathbf v = \Vert \mathbf x\Vert^{-1}_A \odot \mathbf x\]

The generic function normalize() provides this normalization, either for a single composition or for the rows of a data matrix.

normalize(uvw)
##   [,1]        [,2]        [,3]       
## u "0.1339635" "0.3236980" "0.5423385"
## v "      MAR" "      MAR" "      MAR"
## w "0.5423385" "0.1339635" "0.3236980"
## attr(,"class")
## [1] "acomp"

Angle


The angle between two compositions allows to assess if two compositions are orthogonal (scalar product is \(0\), or their angle is \(90^\circ\)).

\[ \alpha (\mathbf x, \mathbf y) = \cos^{-1}\frac{\langle \mathbf x, \mathbf y\rangle_A}{\Vert\mathbf x \Vert_A \cdot \Vert \mathbf y \Vert_A}=\cos^{-1} \langle\Vert \mathbf x \Vert_A^{-1} \odot \mathbf x, \Vert \mathbf y \Vert_A^{-1}\odot \mathbf y \rangle_A\]

acos(scalar(normalize(u),normalize(v)))
## [1] 0.3875967

Aitchison distance


The Aitchison distance is the distance between two compositions:

\[ d_A(\mathbf x, \mathbf y) = \Vert \mathbf x \ominus \mathbf x, \mathbf y\Vert_A = \sqrt{\frac{1}{D}\sum_{i=1}^D\sum_{j>i}^D \left( \ln\frac{x_i}{x_j}-\ln\frac{y_i}{y_j}\right)^2}\]

It can be computed by norm(x-y), or by the generic dist() function of the compositions package. The dist() function computes the Aitchison distances between all rows of an acomp data matrix.

norm(acomp(u)-acomp(v))
## [1] 0.785664
dist(uvw)
##          u        v
## v 0.785664         
## w 1.360810 0.785664

Note that other compositional distances, such as the Manhattan distance, are also available through the extra argument method:

dist(uvw, method = 'manhattan')
##          u        v
## v 1.194506         
## w 2.197225 1.194506




Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.