Let matrix \(\mathbf X\) contain compositional data of \(i=1,2,..., m\) rows, representing observations, and \(j=1,2,..., n\) columns, representing measured variables. The variables are nonnegative, and each row of the data matrix sums to a constant \(c\), usually \(1\) (proportion), \(100\) (percent), or \(10^6\) (parts per million, ppm):

\[\sum_{j=1}^px_{ij} = c \qquad x_{ij} \ge 0\]

In EMMA, similar as in PCA, the data matrix \(\mathbf X\) is decomposed into linear combinations of eigenvectors, denoted as principal components, and synthetic variables, denoted as scores (Dietze el al. 2012). This model, the so called mixing model, is determined by the matrix of factor loadings, \(\mathbf V\), (principal components), and the matrix of factor scores, \(\mathbf M\).

\[\mathbf X_{m \times n} = \mathbf M_{m \times n} \mathbf V_{n \times n}^T\]

One of the benefits of PCA is that in many cases just a few eigenvectors \(q\) (with \(q < n\)) explain most of the variability in the data set. Projecting the original \(n\)-dimensional feature space onto a lower \(q\)-dimensional feature space, e.g. by applying the Guttman rule (or Kaiser criterion) considering confidence bounces of the eigenvalues (cf. Larsen & Warne,2010), may separate signal portion and noise portion. Thus, the equation from above becomes

\[\mathbf X^*_{m \times n} = \mathbf M^*_{m \times q} \mathbf V_{n \times q}^{*T}\text{,}\]

Hereby, we lost some portion of explained variance and \(\mathbf X\) is no longer predicted exactly by the mixing model. Therefore the mixing model is divided into a mixture matrix \(\mathbf X^*\) and an absolute error matrix \(\mathbf E\), which accounts for the noise portion or non-systematic contributions (Weltje 1997).

\[\mathbf X = \mathbf X^*+\mathbf E = \mathbf M^* \mathbf V^{*T}+\mathbf E\]

In EMMA, in addition to implementing dimensionality reduction through PCA, factor analysis is applied to simplify the structure of the eigenvector matrix, \(\mathbf V\), by an orthogonal rotation of the axes in the feature space. This rotation removes the order of the eigenvectors and redistributes the loadings more evenly \(-\) a condition often used to decipher natural processes (Dietze el al. 2012).


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.