Following their suggestions, we walk through the EMMA algorithm and
perform concurrently the calculations in R using the machinery of the
EMMAgeo
package. We perform an end-member analysis on the
df.soil.samples
data set provided here. The data set consists of 45 rows,
corresponding to soil samples and 99 columns, corresponding to a
particular grain size class given on the Krumbein phi \((\phi)\) scale.
\[\phi = -\log_2 D/D_0\text{,} \] where \(D\) is the diameter of the particle or grain in millimeters, and \(D_0\) is a reference diameter, equal to \(1\) mm.
library(EMMAgeo)
# load data
load(url("http://userpage.fu-berlin.de/~soga/300/30100_data_sets/data_emma.RData"))
phi <- colnames(df.soil.samples)
X <- as.matrix(df.soil.samples) # convert data.frame object to matrix object
str(X)
## num [1:45, 1:99] 3.04e-04 2.27e-04 7.75e-05 3.80e-04 3.79e-04 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:45] "1" "2" "3" "4" ...
## ..$ : chr [1:99] "0" "0.1" "0.2" "0.31" ...
Let us plot the data:
In order to start with EMMA using the EMMAgeo
package
the data set has to fulfill certain criteria:
NA
-values.We can use R to check our data set with respect to these constraints:
The is.numeric()
function returns TRUE
if
the class is double or integer.
is.numeric(X)
## [1] TRUE
The is.na()
function indicates which elements are
missing. If we apply the function on a vector or a matrix, the function
checks each element of the vector/matrix and returns TRUE
if the element is a missing value or FALSE
if not. As
TRUE
evaluates to a numeric value of \(1\) and FALSE
to \(0\), we may combine the sum()
function and the is.na()
function to check if our matrix is
free of NA
-values.
sum(is.na(X))
## [1] 0
The function call returns 0, thus we conclude that our data is free
of NA
-values.
In order to check if the data contains any zero-only columns we make
use of the apply()
function. We set MARGIN = 2
to indicate that we want to apply the sum()
function on the
columns. Finally we sum the results for each particular column.
sum(apply(X, MARGIN = 2, FUN = sum) == 0)
## [1] 0
The function call returns 0, thus we conclude that our data does not contain any zero-only columns.
Similar to the approaches from above we use the apply()
function to check if the variables sum to constant value. However this
time, we set MARGIN = 1
to indicate that we want to apply
the sum()
function on the rows.
apply(X, MARGIN = 1, FUN = sum)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
We examine the final constrained of our list, the value for the weight transformation limits \((l)\) in the next section.
Recall that for a stable EMMA we have to minimize the effects of
scale. Therefore a column-wise weight transformation is applied by
scaling the columns based on percentiles with lower \((l)\) and upper \((100−l)\) boundaries as weights. The EMMA
algorithm becomes unstable if the weight transformation limits are too
high. The EMMAgeo
package provides the
test.l()
function, which performs the weight transformation
of the data matrix with different weight limits to check if valid
results are yielded. It returns the maximum value for which the
transformation remains stable.
l.seq <- seq(from = 0, to = 0.5, by = 0.01)
l <- test.l(X, l = l.seq)
l
## $step
## [1] 28
##
## $l.max
## [1] 0.27
The function returns a list of objects: step
holds the
position of last valid value and l.max
contains the last
valid value of \(l\). We assign the
value of l.max
to a new variable l.max
for
further usage.
l.max <-l$l.max
l.max
## [1] 0.27
In general the number of end-member is unknown. The
EMMAgeo
package provides the test.parameters()
function, which evaluates all possible combinations of number of
end-members, \(q\), and weight
transformation limits, \(l\). In
addition the function provides a graphical output for several error
metrics. Here we use plot = "mRt"
to indicate that we are
interested in the mean relative total error (mRt). Look up the
documentations for further specifications and additional options (type
?test.parameters
in your console).
seq.q <- seq(from = 2, to = 10, by = 1) # set sequence of end-members
seq.l <- seq(from = 0, to = l.max, by= 0.02) # sequence of weight transformations
params <- test.parameters(X,
q = seq.q,
l = seq.l,
rotation = "Varimax",
plot = "mRt", # "mRt" (mean relative total error),
# "mEm" (mean absolute row-wise error),
# "mEn" (mean absolute column-wise error),
# "mRm" (mean relative row-wise error),
# "mRn" (mean relative column-wise error),
# "ol" (number of overlapping end-members).
legend = 'bottomright',
cex = 0.6)