The particular properties and scale of compositional data, imply a relative geometry, the so-called Aitchison geometry, in which only the ratios of different components actually matter. This geometrical space, the Aitchison simplex sample space \((\mathbb S)\), is a vector space structure, isometrically equivalent to \(\mathbb R^{D-1}\). Owing to this equivalence we can convert any statistical problem involving compositions of \(D\) parts onto a classical multivariate problem involving real vectors of \(D-1\) coordinates (van den Boogaart and Tolosana-Delgado 2013). For this transformation from the simplex sample space to the usual Euclidean space we make use of the family of log-ratio transformations. This technique, referred to as the principle of working in coordinates, allows us to apply classical multivariate methods on compositional data. This means the that we apply classical multivariate methods on log-ratio-transformed compositional data and that thereafter, we back-transformed the results into compositions when interpreting the results. These transformations, such as the additive, centered, and isometric log-ratio transformation remove the problem of a constrained sample space, and open up all available standard multivariate techniques (Filzmoser et al. 2010).


Centered log-ratio transformation clr

The centered log-ratio transformation (clr) divides each compositional part by the geometric mean of all parts. It is written as

\[\text{clr}(\mathbf x)= \left(log \frac{x_i}{g(\mathbf x)} \right)_{i=1,...,D} \qquad \text{with} \quad g(\mathbf x) = \left(\prod_{i=1}^Dx_i\right)^{1/D} = \exp\left(\frac{1}{D}\sum_{i=1}^D \log x_i\right)\text{,} \] or in a compact way

\[ \text{clr}(\mathbf x) = log\frac{\mathbf x}{ g(\mathbf x)}\text{.}\]

The clr-transformation represents a one-by-one link between the original and the transformed parts, which is helpful in interpretation of the results.
Therefore this transformation is frequently used in practice, however it has two severe drawbacks:  
(a) the resulting parts are still linear dependent, and
(b) the transformation is sub-compositionally incoherent.

The first property yields singular covariance matrices, i.e. the determinant zero, which becomes problematic if a statistical method applied needs to invert the covariance matrix (e.g. PCA, LDA, etc.).

The latter property means that when different subsets of variables (parts) are considered, the clr-transformed results are not the same. This has serious consequences for the data analysis, because any chosen bivariate subset of interest would not reflect the original data. However, this problem may overcome by a right-multiplication of a transposed \(Dx(D-1)\)- quasi-orthonormal base of \(\mathbb R^{D-1}\) (cf. ilr-transformation (Egozcue, 2003).

In R-package compositions the function clr() computes the centered log-ratio transformation and the function clrInv() computes the back-transformation.They admit either a vector (considered as a composition), or a matrix or data frame (where each row is then taken as a composition).

library(compositions)
# create a 6x3 matrix from exponential distribution at random
X <- acomp(matrix(rexp(90), nrow = 6, ncol = 3))
X
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"
# centered log-ratio transformation
clr(X)
##             [,1]       [,2]       [,3]
## [1,]  1.16953063 -0.8779911 -0.2915395
## [2,] -0.82461820  0.0644414  0.7601768
## [3,]  0.71195347 -0.4912137 -0.2207398
## [4,] -0.01756271  0.2059132 -0.1883505
## [5,] -0.18147455  2.0860174 -1.9045429
## [6,]  1.24293486 -0.5593709 -0.6835640
## attr(,"class")
## [1] "rmult"
# back-transformation
clrInv(clr(X))
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"

Note that the composition package is aware of the change of the sample space. The original and the back-transformed compositions are of class acomp, corresponding to the Aitchison compositional scale, whereas the log-transformed data matrix is of the class rmult, corresponding to the real multivariate scale of the Euclidean sample space.


Isometric log-ratio transformation ilr

Isometric logratio transformation (ilr) transforms the original data in a multivariate observation \(z=(z_1,...,z_{D−1})\) in a \((D−1)-\)dimensional Euclidean space.

\[\mathbf z = (z_1,...,z_{D−1})=\sqrt{\frac{i}{i+1}} \log \frac{\prod_{j=1}^{i}x_j}{x_i+1}\text{,} \quad i= 1,...D-1\]

It is appropriate whenever distances between observations are of importance (Filzmoser et al. 2010).

If only two parts \((x_1, x_2)\) are considered the equation simplifies to

\[z = \frac{1}{\sqrt{2}}\log\frac{x_1}{x_2}\]

In that particular case the ilr variable \(z\) is univariate, but it includes all the relevant information between \(x_1\) and \(x_2\) which is in fact contained in their (log) ratio. This property makes the ilr-transformed values virtually impossible to interpret them, as each coordinate might involve many parts (potentially all).

However, the ilr is an isometric transformation and its transformed values yield full-rank covariance matrices. The generic ilr transformation is thus a perfect black box: compute ilr coordinates, apply your method to the coordinates, and recast the results to compositions with the inverse ilr (Filzmoser et al. 2010).

In R-package compositions the isometric log-ratio transformation is available through the ilr() function. The ilrInv() function provides the inverse transformation. Also, if we want to pass from ilr to clr or vice versa, we can use the functions ilr2clr() and clr2ilr().

# original composition
X
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"
# isometric log-ratio transformation (D-1)
ilr(X)
##            [,1]       [,2]
## [1,] -1.4478165 -0.3570615
## [2,]  0.6286601  0.9310226
## [3,] -0.8507676 -0.2703500
## [4,]  0.1580213 -0.2306813
## [5,]  1.6033589 -2.3325791
## [6,] -1.2744226 -0.8371915
## attr(,"class")
## [1] "rmult"
# back-transformation
ilrInv(ilr(X))
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"

Transformations between clr and ilr

However, since the ilr-transformation is a projection of the clr-transformed coordinates on an orthonormal base, we can write \[ ilr(x)=clr(x)*V^T \]

with \(V^T*V=I\) , \(I_{(D-1)\times(D-1)}\) unity matrix.
The column vectors of V can be any orthonormal base of \(\mathbb S^D\) (Egozcue, 2003) e.g. derived by singular value decomposition (SVG).


Additive log-ratio transfoprmation alr

The additive log-ratio transformation (alr) expresses a composition in additive log-ratio coordinates.

\[\text{alr}(x) = \left(log \frac{x_i}{x_D} \right)_{i=1,...,D-1} \]

The alr transformation is a non-isometric transformation and thus should never be used for the computation of distances, angles, and shapes.

In R-package compositions additive log-ratio transformation is available with the alr(), and its inverse with the alrInv() function.

# original composition
X
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"
# additive log-ratio transformation (D-1)
alr(X)
##              v1         v2
## [1,]  1.4610701 -0.5864516
## [2,] -1.5847950 -0.6957354
## [3,]  0.9326933 -0.2704738
## [4,]  0.1707877  0.3942636
## [5,]  1.7230683  3.9905603
## [6,]  1.9264988  0.1241931
## attr(,"class")
## [1] "rmult"
# back-transformation
alrInv(alr(X))
##      [,1]        [,2]         [,3]        
## [1,] "0.7347311" "0.09482023" "0.17044870"
## [2,] "0.1203206" "0.29272074" "0.58695870"
## [3,] "0.5904114" "0.17726618" "0.23232245"
## [4,] "0.3232672" "0.40421835" "0.27251443"
## [5,] "0.0923048" "0.89121717" "0.01647803"
## [6,] "0.7630236" "0.12583647" "0.11113994"
## attr(,"class")
## [1] "acomp"



Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.