Dietze el al. 2012 published an end-member modelling analysis (EMMA) algorithm for grain-size distribution to derive previously unknown, geoscientifically meaningful end-members. The algorithm is based on principal component analysis (PCA), factor rotation, variable data scaling and nonnegative least-square estimation. The proposed algorithm derives from an ensemble of different end-member models an optimal end-member model and the most robust signals within the grain-size distributions (Dietze el al. 2012).

**Step 1** Transform raw grain-size distributions to
constant sum, e.g. 1 or 100%.

**Step 2** Minimize the effects of scale by applying a
column-wise weight transformation. A weighted matrix \(\mathbf W\) is derived from the columns of
the original matrix \(\mathbf X\) by
scaling the columns based on percentiles, \(P\), with lower \((l)\) and upper \((100-l)\) boundaries as weights:

\[\mathbf W = \frac{\mathbf X - \mathbf h}{\mathbf g - \mathbf h}\text{,}\]

where vectors \(\mathbf h\) and \(\mathbf g\) are defined by \(\mathbf h_j=P_l(\mathbf x_j)\) and \(\mathbf g_j=P_{100-l}(\mathbf x_j)\) for columns \(j = 1,2,..., p\). A value of \(l=0\) reflects the minimum and maximum of each column, and for example a value of \(l=2.5\) gives percentiles between \(P_{2.5}/P_{97.5}\).

**Step 3** Extract the eigenvector matrix \(\mathbf V\) and eigenvalue matrix \(\mathbf \Lambda\) from the minor product
matrix \(\Gamma\) given by \(\Gamma=\mathbf W^T \mathbf W\).

**Step 4** Apply factor rotation (e.g. *VARIMAX*)
on the eigenspace of \(q\) end-members
in order to simplify the structure of the end-members and thus,
facilitate factor interpretation.

**Step 5** Normalize the preliminary eigenvector
loadings \((\mathbf V)\) to ensure the
nonnegativity of the rotated eigenvectors and estimate the eigenvector
scores \((\mathbf M)\) using linear nonnegative least squares as objective
function.

**Step 6** Reverse the initial weight transformation in
order to rescale \(\mathbf V\) and
\(\mathbf M\) to the original units of
the initial data set. Normalize the rescaled matrices to fulfill the
constant sum constraint. The rescaled and standardized matrices are
denoted as *end-member loadings* \((\mathbf V^*)\) and *end-member
scores* \((\mathbf M^*)\),
respectively. Calculate the variance explained by each end-member as the
proportion of total scores variance.

**Step 7** Calculate the modeled data set \(\mathbf X^* = \mathbf M^*\mathbf V^{*T}\)
and the respective error matrix \(\mathbf E =
\mathbf X^* - \mathbf X\). Evaluate the goodness of fit
calculating mean row- and column-wise linear coefficients of
determinations \((r^2)\) between \(\mathbf X\) and \(\mathbf X^*\). The resulting matrix gives
the explained proportion of variance of each sample and each variable,
respectively (Dietze el al. 2012).

The algorithm developed by (Dietze el al. 2012) allows for testing the
*robustness* of the end-members. End-members that appear most
frequently in a series of model runs constitute the **robust
end-member model**. It is important to realize that one single
model configuration is just a scenario with a given likelihood. The
likelihood of a model itself can only be reliably estimated by comparing
large sets of other similarly likely model runs (Dietze el al. 2012). The true number of final
end-members \(q\) in the system is in
general unknown. However, the number of end-members affects the
numerical stability of the EMMA algorithm. Therefore, the optimal values
of \(q\) (number of end-members) and
\(l\) (weight transformations) are
found by iteration and averaging.

**Citation**

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: *Hartmann,
K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis
using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.*