Dietze el al. 2012 published an end-member modelling analysis (EMMA) algorithm for grain-size distribution to derive previously unknown, geoscientifically meaningful end-members. The algorithm is based on principal component analysis (PCA), factor rotation, variable data scaling and nonnegative least-square estimation. The proposed algorithm derives from an ensemble of different end-member models an optimal end-member model and the most robust signals within the grain-size distributions (Dietze el al. 2012).

Pseudocode of EMMA

Step 1 Transform raw grain-size distributions to constant sum, e.g. 1 or 100%.

Step 2 Minimize the effects of scale by applying a column-wise weight transformation. A weighted matrix \(\mathbf W\) is derived from the columns of the original matrix \(\mathbf X\) by scaling the columns based on percentiles, \(P\), with lower \((l)\) and upper \((100-l)\) boundaries as weights:

\[\mathbf W = \frac{\mathbf X - \mathbf h}{\mathbf g - \mathbf h}\text{,}\]

where vectors \(\mathbf h\) and \(\mathbf g\) are defined by \(\mathbf h_j=P_l(\mathbf x_j)\) and \(\mathbf g_j=P_{100-l}(\mathbf x_j)\) for columns \(j = 1,2,..., p\). A value of \(l=0\) reflects the minimum and maximum of each column, and for example a value of \(l=2.5\) gives percentiles between \(P_{2.5}/P_{97.5}\).

Step 3 Extract the eigenvector matrix \(\mathbf V\) and eigenvalue matrix \(\mathbf \Lambda\) from the minor product matrix \(\Gamma\) given by \(\Gamma=\mathbf W^T \mathbf W\).

Step 4 Apply factor rotation (e.g. VARIMAX) on the eigenspace of \(q\) end-members in order to simplify the structure of the end-members and thus, facilitate factor interpretation.

Step 5 Normalize the preliminary eigenvector loadings \((\mathbf V)\) to ensure the nonnegativity of the rotated eigenvectors and estimate the eigenvector scores \((\mathbf M)\) using linear nonnegative least squares as objective function.

Step 6 Reverse the initial weight transformation in order to rescale \(\mathbf V\) and \(\mathbf M\) to the original units of the initial data set. Normalize the rescaled matrices to fulfill the constant sum constraint. The rescaled and standardized matrices are denoted as end-member loadings \((\mathbf V^*)\) and end-member scores \((\mathbf M^*)\), respectively. Calculate the variance explained by each end-member as the proportion of total scores variance.

Step 7 Calculate the modeled data set \(\mathbf X^* = \mathbf M^*\mathbf V^{*T}\) and the respective error matrix \(\mathbf E = \mathbf X^* - \mathbf X\). Evaluate the goodness of fit calculating mean row- and column-wise linear coefficients of determinations \((r^2)\) between \(\mathbf X\) and \(\mathbf X^*\). The resulting matrix gives the explained proportion of variance of each sample and each variable, respectively (Dietze el al. 2012).

The algorithm developed by (Dietze el al. 2012) allows for testing the robustness of the end-members. End-members that appear most frequently in a series of model runs constitute the robust end-member model. It is important to realize that one single model configuration is just a scenario with a given likelihood. The likelihood of a model itself can only be reliably estimated by comparing large sets of other similarly likely model runs (Dietze el al. 2012). The true number of final end-members \(q\) in the system is in general unknown. However, the number of end-members affects the numerical stability of the EMMA algorithm. Therefore, the optimal values of \(q\) (number of end-members) and \(l\) (weight transformations) are found by iteration and averaging.


The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.