The concept of end-members is closely related to the concept of compositional data. Compositional data is a special type of nonnegative data which carries only relative information. It sums up to a constant and the relevant information is contained only in the ratios between the parts (van den Boogaart and Tolosana-Delgado 2013). For example, if soil samples are analyzed with respect to soil texture, the individual soil texture classes per sample, such as clay, loam, sand, gravel, sum up to 1 or 100%.
Compositional variation of a particular sample may be attributed to a mixture of a fixed number of compositions, denoted as end-members (Weltje 1997). Linear mixing models, such as end-member modeling analysis (EMMA), are applied for analyzing proportional contributions of (theoretical) end-members of compositional data (Weltje 1997). EMMA is a statistical approach to decompose mixed constituents with respect to the contribution of its end-members (Dietze el al. 2012). EMMA is based on principal component analysis (PCA) and factor analysis (FA). Recall that in PCA the principal components (PCs) are given by the eigenvectors of the correlation matrix of the observed data, and the variances are derived from the associated eigenvalues. The first PC accounts for the highest amount of variance in a data set. All succeeding PCs capture the remaining variance in a hierarchical order. The PCs are orthogonal to each another, and thus by definition uncorrelated. By contrast, factor analysis specifies a model relating the observed variables to a smaller set of underlying latent variables, denoted as factors. By rotating the factor subspace the interpretability of the factors might be enhanced.
Sediment deposits are important geoarchives because they contain valuable information about the sediment’s source area, sediment transport processes and pathways. The properties of such an archive are a result of interrelated physical and biological processes. By measuring these physical (e.g. grain-size distribution), geochemical (e.g. mineral composition) and biological (e.g. pollen content) properties we obtain a mixed signal, integrating different sources, transport processes and transport pathways. The application of EMMA is a viable strategy for unmixing sedimentary bulk data because it decomposes the mixed constituents with respect to the contribution of its end-members and thus, provides a process-based explanation of the data structure ( Dietze el al. 2012).
The figure below explains the principal setting of EMMA for unmixing grain-size data. Site specific process regimes (fluvial, aeolian, glacial, etc.) tend to sort sediments and create characteristic grain-size distributions. Interrelated physical and biological processes during and after composition obscure and mix the process signatures preserved in the sedimentary archive. EMMA may decompose such a mixture as a combination of end-member loadings and end-member scores. End-member loadings relate to a characteristic grain-size distribution and thus, are a proxy for the associated process regime. Scores are the relative contributions of the loadings to a sample and thus, relate to the predominance of a process during the formation of the sedimentary deposit. It should be noted that, an end-member does not necessarily corresponds to an idealised unimodal grain-size distribution, but may itself be subject to mixing processes during sediment production and dispersal (see concept of dynamic grain size populations, Weltje and Prins 2003).
Citation
The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.