The `EMMAgeo`

package provides a set of functions for end-member modelling analysis of grain-size data. The authors of the package suggest a protocol for applying the `EMMAgeo`

package for end-member modelling.

- Check the data

- Bracket weight transformation limits

- Bracket numbers of end-members

- Calculate and extract robust end-members

- Run EMMA with optimal parameters

- Evaluate model quality

- Estimate model uncertainty

Following their suggestions, we walk through the EMMA algorithm and perform concurrently the calculations in R using the machinery of the `EMMAgeo`

package. We perform an end-member analysis on the `df.soil.samples`

data set provided here. The data set consists of 45 rows, corresponding to soil samples and 99 columns, corresponding to a particular grain size class given on the *Krumbein phi \((\phi)\) scale*.

\[\phi = -\log_2 D/D_0\text{,} \] where \(D\) is the diameter of the particle or grain in millimeters, and \(D_0\) is a reference diameter, equal to \(1\) mm.

```
library(EMMAgeo)
# load data
load(url("https://userpage.fu-berlin.de/soga/300/30100_data_sets/data_emma.RData"))
phi <- colnames(df.soil.samples)
X <- as.matrix(df.soil.samples) # convert data.frame object to matrix object
str(X)
```

```
## num [1:45, 1:99] 3.04e-04 2.27e-04 7.75e-05 3.80e-04 3.79e-04 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:45] "1" "2" "3" "4" ...
## ..$ : chr [1:99] "0" "0.1" "0.2" "0.31" ...
```

Let us plot the data:

In order to start with EMMA using the `EMMAgeo`

package the data set has to fulfill certain criteria:

- The data has to be of numeric data type.
- The data has to be free of
`NA`

-values.

- The data contains no zero-only columns.
- The variables sum to constant value.
- The weight transformation limits \((l)\) must be below a critical value, to guarantee numerical stability.

We can use R to check our data set with respect to these constraints:

The `is.numeric()`

function returns `TRUE`

if the class is *double* or *integer*.

`is.numeric(X)`

`## [1] TRUE`

The `is.na()`

function indicates which elements are missing. If we apply the function on a vector or a matrix, the function checks each element of the the vector/matrix and returns `TRUE`

if the element is a missing value or `FALSE`

if not. As `TRUE`

evaluates to a numeric value of \(1\) and `FALSE`

to \(0\), we may combine the `sum()`

function and the `is.na()`

function to check if our matrix is free of `NA`

-values.

`sum(is.na(X))`

`## [1] 0`

The function call returns 0, thus we conclude that our data is free of `NA`

-values.

In order to check if the data contains any zero-only columns we make use of the `apply()`

function. We set `MARGIN = 2`

to indicate that we want to apply the `sum()`

function on the columns. Finally we sum the results for each particular column.

`sum(apply(X, MARGIN = 2, FUN = sum) == 0)`

`## [1] 0`

The function call returns 0, thus we conclude that our data does not contain any zero-only columns.

Similar to the approaches from above we use the `apply()`

function to check if the variables sum to constant value. However this time, we set `MARGIN = 1`

to indicate that we want to apply the `sum()`

function on the rows.

`apply(X, MARGIN = 1, FUN = sum)`

```
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
```

We examine the final constrained of our list, the value for the weight transformation limits \((l)\) in the next section.

Recall that for a stable EMMA we have to minimize the effects of scale. Therefore a column-wise weight transformation is applied by scaling the columns based on percentiles with lower \((l)\) and upper \((100âˆ’l)\) boundaries as weights. The EMMA algorithm becomes unstable if the weight transformation limits are too high. The `EMMAgeo`

package provides the `test.l()`

function, which performs the weight transformation of the data matrix with different weight limits to check if valid results are yielded. It returns the maximum value for which the transformation remains stable.

```
l.seq <- seq(from = 0, to = 0.5, by = 0.01)
l <- test.l(X, l = l.seq)
l
```

```
## $step
## [1] 28
##
## $l.max
## [1] 0.27
```

The function returns a list of objects: `step`

holds the position of last valid value and `l.max`

contains the last valid value of \(l\). We assign the value of `l.max`

to a new variable `l.max`

for further usage.

```
l.max <-l$l.max
l.max
```

`## [1] 0.27`

In general the number of end-member is unknown. The `EMMAgeo`

package provides the `test.parameters()`

function, which evaluates all possible combinations of number of end-members, \(q\), and weight transformation limits, \(l\). In addition the function provides a graphical output for several error metrics. Here we use `plot = "mRt"`

to indicate that we are interested in the *mean relative total error (mRt)*. Look up the documentations for further specifications and additional options (type `?test.parameters`

in your console).

```
seq.q <- seq(from = 2, to = 10, by = 1) # set sequence of end-members
seq.l <- seq(from = 0, to = l.max, by= 0.02) # sequence of weight transformations
params <- test.parameters(X,
q = seq.q,
l = seq.l,
rotation = "Varimax",
plot = "mRt", # "mRt" (mean relative total error),
# "mEm" (mean absolute row-wise error),
# "mEn" (mean absolute column-wise error),
# "mRm" (mean relative row-wise error),
# "mRn" (mean relative column-wise error),
# "ol" (number of overlapping end-members).
legend = 'bottomright',
cex = 0.6)
```