The scale of a measurement describes which values are
possible outcomes and which mathematical operations on these values are
meaningful in the context of the application. Classical statistical
procedures distinguish several scales of measurement, such as
*categorical*, *ordinal*, *interval*, and
*ratio*.

Compositional data analysis adds additional complexity to this
subject, as compositions are multivariate objects on themselves. Each
component cannot be treated as an isolated variable because its value is
only meaningful in relation to the amounts of the other variables. Thus,
the whole composition has to be considered as a unique object with a
particular scale and properties. In this scale, denoted as
**Aitchison geometry**, the classical algebraic/geometric
operations, such as addition/translation, product/scaling, scalar
product/orthogonal projection, and Euclidean distance are neither
subcompositionally coherent nor scaling invariant. Thus an alternative
set of operations was proposed by Aitchison in order to replace these
conventional ones in compositional geometry (van
den Boogaart and Tolosana-Delgado 2013).

**Perturbation** plays the role of **sum or
translation** and is a closed component-wise product of the
compositions involved:

\[\mathbb z = x\oplus y = \mathscr C[x_1 \cdot y_1, ... x_D \cdot y_D]\]

Perturbation by the opposite element plays the role of subtraction:

\[\mathbf x \ominus \mathbf y = \mathbf x \oplus \ominus \mathbf y\]

Exercise: Write a function for a pertubutation of a composition. If you can’t remember how to define your own function cf.here

```
# Your code here
pert<-function(){
}
```

```
# Your code here
pert<-function(x,y){
res<-(x*y)/sum(x*y)
print(res)
}
x<-c(1,2,1)
x<-x/sum(x) #closure of x
y<-c(1,3,1)
y<-y/sum(y)
pert(x,y)
```

With the `compositions`

package, one can perturb a pair of
compositions in two ways: by using the command
`perturbe(x,y)`

or by adding or subtracting two vectors of
class `acomp`

.

```
library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
# perturbation
x+y
```

```
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
```

`perturbe(x,y)`

```
## [1] "0.125" "0.750" "0.125"
## attr(,"class")
## [1] "acomp"
```

```
# inverse perturbation
x-y
```

```
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"
```

`perturbe(x,-y)`

```
## [1] "0.375" "0.250" "0.375"
## attr(,"class")
## [1] "acomp"
```

A very neat use case of perturbation in compositional data analysis is related to the change of units of a composition. Let us consider once again the data set of the geochemistry of a sediment core recovered from the eastern Juyanze palaeolake in north-western China (Hartmann and Wünnemann, 2009). We load the data set and prepare if for the analysis.

```
# load data set
g36 <- read.csv("http://userpages.fu-berlin.de/soga/data/raw-data/G36chemical.txt",
sep='\t',
row.names = 'Sample')
# exclude columns of no interest
g36 <- g36[, -c(1:5, ncol(g36)-1, ncol(g36))]
# rename columns for better readability
colnames(g36) <- gsub(".mg.g", "", colnames(g36))
str(g36)
```

```
## 'data.frame': 177 obs. of 11 variables:
## $ Ba : num 0.178 0.172 0.204 0.167 0.122 0.181 0.22 0.101 0.255 0.114 ...
## $ Ca : num 61.3 55.2 96.4 53 36 ...
## $ Fe : num 15.2 15.2 14 16.1 10.9 ...
## $ K : num 3.59 2.79 3.9 4.23 2.46 ...
## $ Mg : num 77.3 31.5 59 55.9 97.5 ...
## $ Mn : num 0.326 0.324 0.317 0.356 0.218 0.45 0.575 0.221 0.501 0.222 ...
## $ Na : num 18.78 9.84 18.33 14.19 16.34 ...
## $ PO4: num 0.865 0.785 0.976 1.002 0.667 ...
## $ S : num 24.6 15.4 18.2 14 22.7 ...
## $ Sr : num 0.485 0.446 0.74 0.352 0.324 0.496 0.507 0.081 0.782 0.068 ...
## $ Cl : num 27.6 14.9 27.4 24.7 26.1 ...
```

The data set consists of 177 samples (rows) and 11 variables (columns). The variables are the elements Ca, Mg, Sr, Fe, Mn, K, Na, S, Ba, PO4, and Cl, which are given as mass units (mg/g).

We now want to recast our data set from a composition of mass
proportions to a composition of molar proportions. This can be easily
achieved by (inverse) perturbation. All we have to do is to divide each
component by its molar weight. In order find the molar
weight for each particular element in our data set we may look it up
in a table or make use of R. For the R software there exist several
packages which ship with a function to return the molar weight of any
particular element. Here, for this example we install the
[`marelac`

package](https://mran.microsoft.com/snapshot/2023-02-17/web/packages/marelac/index.html.
This package provides the `molweight()`

function, which
returns the molecular weight in g/mol for each particular element given
to the function.

```
library(marelac)
molweight(colnames(g36)) # molecular weight in g/mol
```

```
## Ba Ca Fe K Mg Mn Na PO4
## 137.32700 40.07800 55.84500 39.09830 24.30500 54.93805 22.98977 94.97136
## S Sr Cl
## 32.06500 87.62000 35.45300
```

```
g36.comp.mas <- acomp(g36) # composition of mass proportions
mw <- acomp(molweight(colnames(g36))) # molar weights
g36.comp.mol <- g36.comp.mas - mw # inverse perturbation
g36.comp.mol[1:5,] # first 5 rows of composition of molar proportions
```

```
## Ba Ca Fe K Mg
## G36-1 "0.0001738203" "0.2051454" "0.03654827" "0.012326978" "0.4263909"
## G36-10 "0.0002868280" "0.3153576" "0.06216757" "0.016359200" "0.2969885"
## G36-103 "0.0002021487" "0.3274182" "0.03401714" "0.013573861" "0.3303898"
## G36-108 "0.0002099890" "0.2282233" "0.04984445" "0.018681793" "0.3972189"
## G36-11 "0.0001210366" "0.1225157" "0.02651906" "0.008575641" "0.5467080"
## Mn Na PO4 S Sr
## G36-1 "0.0007957577" "0.10954622" "0.0012214055" "0.10267310" "0.0007422922"
## G36-10 "0.0013505815" "0.09801882" "0.0018928927" "0.10984343" "0.0011656834"
## G36-103 "0.0007852044" "0.10849870" "0.0013984721" "0.07728147" "0.0011492781"
## G36-108 "0.0011189556" "0.10658199" "0.0018218437" "0.07544717" "0.0006937060"
## G36-11 "0.0005406250" "0.09683454" "0.0009568547" "0.09657863" "0.0005037959"
## Cl
## G36-1 "0.10443591"
## G36-10 "0.09656888"
## G36-103 "0.10528572"
## G36-108 "0.12015787"
## G36-11 "0.10014615"
## attr(,"class")
## [1] "acomp"
```

Now, `g36.comp.mol`

contains the composition of elements
in terms of molar proportions. Most changes of compositional units may
be expressed as a perturbation (molar proportions, mass percentages,
volumetric proportions, molality, ppm, etc.). Perturbation is considered
as the fundamental operation in compositional data analysis, equally
important as the vector addition is in the usual real vector space \(\mathbb R^D\) (van
den Boogaart and Tolosana-Delgado 2013).

**Powering** or **power transformation**
replaces the product of a *vector by a scalar* (geometrically,
this is equivalent to scaling) and is defined as the closed powering of
the components by a given scalar:

\[\mathbf z = \lambda \odot \mathbf x = \mathscr C[x_1^\lambda, .., x_D^\lambda]\]

Exercise: Write a function for powering of a composition. If you can’t remember how to define your own function cf.here

```
# Your code here
pow<-function(){
}
```

```
# Your code here
pow<-function(x,lambda=1){
res<-(x^lambda)/sum(x^lambda)
print(res)
}
x<-c(1,2,1)
x<-x/sum(x) #closure
pow(x,2)
```

The `compositions`

package offers two ways to power a
composition by a scalar: with the generic function
`power.acomp()`

or by multiplying a scalar by a vector of
class `acomp`

.

```
library(compositions)
x = acomp(c(1,2,1))
x
```

```
## [1] "0.25" "0.50" "0.25"
## attr(,"class")
## [1] "acomp"
```

```
s <- 2 # scalar
power.acomp(x,s)
```

```
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"
```

`x*s`

```
## [1] "0.1666667" "0.6666667" "0.1666667"
## attr(,"class")
## [1] "acomp"
```

The **Aitchison scalar product** for compositions replaces
the conventional scalar product:

\[\langle \mathbf x, \mathbf y \rangle_A=\frac{1}{D}\sum_{i<j}^D \ln\frac{x_i}{x_j} \ln \frac{y_i}{y_j}\]

The generic function `scalar(x,y)`

of the
`compositions`

packages computes the scalar product.

```
library(compositions)
x = acomp(c(1,2,1))
y = acomp(c(1,3,1))
scalar(x,y)
```

`## [1] 0.5076667`

The ** norm** of a vector, its length, is

\[\Vert \mathbf x \Vert_A = \sqrt{\langle \mathbf x, \mathbf x \rangle_A}\text{.}\]

The norm of a vector can be obtained by the `norm()`

function. If `x`

is a data matrix of class
`acomp`

, then the result of `norm(x)`

will be a
vector giving the norm of each row.

```
u <- c(1,2,3)
v <- c(2,2,2)
w <- c(3,1,2)
uvw <- acomp(rbind(u,v,w))
norm(uvw)
```

`## [1] 0.785664 0.000000 0.785664`

The ** normalized vector** of \(x\) (or its direction, or a unit-norm
vector) is

\[\mathbf v = \Vert \mathbf x\Vert^{-1}_A \odot \mathbf x\]

The generic function `normalize()`

provides this
normalization, either for a single composition or for the rows of a data
matrix.

`normalize(uvw)`

```
## [,1] [,2] [,3]
## u "0.1339635" "0.3236980" "0.5423385"
## v " MAR" " MAR" " MAR"
## w "0.5423385" "0.1339635" "0.3236980"
## attr(,"class")
## [1] "acomp"
```

The ** angle** between two compositions allows to
assess if two compositions are orthogonal (scalar product is \(0\), or their angle is \(90^\circ\)).

\[ \alpha (\mathbf x, \mathbf y) = \cos^{-1}\frac{\langle \mathbf x, \mathbf y\rangle_A}{\Vert\mathbf x \Vert_A \cdot \Vert \mathbf y \Vert_A}=\cos^{-1} \langle\Vert \mathbf x \Vert_A^{-1} \odot \mathbf x, \Vert \mathbf y \Vert_A^{-1}\odot \mathbf y \rangle_A\]

`acos(scalar(normalize(u),normalize(v)))`

`## [1] 0.3875967`

The ** Aitchison distance** is the distance between
two compositions:

\[ d_A(\mathbf x, \mathbf y) = \Vert \mathbf x \ominus \mathbf x, \mathbf y\Vert_A = \sqrt{\frac{1}{D}\sum_{i=1}^D\sum_{j>i}^D \left( \ln\frac{x_i}{x_j}-\ln\frac{y_i}{y_j}\right)^2}\]

It can be computed by `norm(x-y)`

, or by the generic
`dist()`

function of the `compositions`

package.
The `dist()`

function computes the Aitchison distances
between all rows of an `acomp`

data matrix.

`norm(acomp(u)-acomp(v))`

`## [1] 0.785664`

`dist(uvw)`

```
## u v
## v 0.785664
## w 1.360810 0.785664
```

Note that other compositional distances, such as the *Manhattan distance*, are also available through
the extra argument `method`

:

`dist(uvw, method = 'manhattan')`

```
## u v
## v 1.194506
## w 2.197225 1.194506
```

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*