31200_LDA_Hethke.knit

Hethke, M., Hartmann, K., Alberti, M., Kutzner, T. & Schwentner, M. (2023). Testing the success of palaeontological methods in the delimitation of clam shrimp (Crustacea, Branchiopoda) on extant species. Palaeontology, 66, e12634. doi:10.1111/pala.12634

Hethke et al. (2023) applied Linear discriminant analysis (LDA) to evaluate whether morphological distinction through carapace traits (size, shape, ornamentation) adequately reflects species discrimination among clam shrimp.

The hypotheses regarding size and shape (ornamentation not covered here) are:

H1. Size and shape are informative for the discrimination of Ozestheria species.

H1a: Species are morphologically distinct (separate analyses of size and shape).

H1b: The combination of size and shape variables leads to higher accuracies in species discrimination than analyzing each separately.

All Data and R-markdown files for this study are available in the Dryad Digital Repository and can be downloaded here.

Here, only a brief summary of selected parts of the analysis are presented.

Method

A total of 481 specimens from ten different Ozestheria (sub-)species were selected from a larger dataset collected in Australia (Schwentner et al., 2015). Each individual was photographed and the carapaces outlined and measured using a vector graphics program. The nine different linear measurements describing each carapace can be seen in the figure below:

Carapace outline with nine linear measurements (a, b, c, Arr, Av, Ch, Cr, H, L) (Fig. 2, [Hethke et al. (2023)](https://doi.org/10.1111/pala.12634). Used under a [Creative Commons Attribution-NonCommercial License](http://creativecommons.org/licenses/by-nc/4.0/)).

Carapace outline with nine linear measurements (a, b, c, Arr, Av, Ch, Cr, H, L) (Fig. 2, Hethke et al. (2023). Used under a Creative Commons Attribution-NonCommercial License).

The size variables were standardized by dividing each measurement by the arithmetic mean of all nine measurements. Values were then transformed into the real space by using the natural logarithm. The shape parameter was determined using a Fourier shape analysis.

The data analyses for H1a included a linear discriminant analysis (LDA) of the size and shape datasets. For each of the 45 pairwise species combinations samples of individuals with predefined species identity were randomly selected (training set).

The linear discriminant models were implemented in R using the lda() function from the MASS package. The corresponding species served as the grouping factor and the size and shape parameters as the covariates. LDA provides a method to find the linear combination of covariates, which best separates the species groups.

Next, the developed models were used to classify individuals with unspecified species identity (test set) using the predict() function. Model performance was assessed by iterating the sampling and modeling steps 300 times and calculating the mean number of correctly classified individuals. This mean accuracy also allowed to measure morphological separation between species pairs.

In addition to the separate data analyses for H1a, the linear measures (size) and Fourier coefficients (shape) were combined and the same methodology applied for H1b. For this analysis one species (C) was excluded since it strongly differed morphologically from the remaining species.

Results

The overall mean accuracy of the LDA test results over all species combinations is around 93 % (for size and shape). The mean accuracies between the species pairs differ between <80 % to 100 %. Among the pairs with the lowest classification accuracy based on size parameters are two combinations of sister species. While for most species pair comparisons the accuracy is higher based on shape than based on size, the opposite is true for some species pairs.

The overall mean accuracy of the combined analysis of size and shape is slightly higher than for the separate analyses. In addition, the results show that each of the genetically differentiated species is morphologically distinct. Still, when strongly differentiable species C is excluded, the remaining nine species are closely positioned in the morphospace:

LDA biplots of (A) shape covariate with all ten species and (B) combined size and shape covariates with nine species excluding species C (Fig. 9, Hethke et al. (2023). Used under a Creative Commons Attribution-NonCommercial License).

The strong overlap between these species suggests high similarities in morphologies and requires further evaluation of the individual size and shape relationships. Lower mean accuracy and model performance can reflect closely related species pairs (such as the sister species in this study). However, for some species sample size bias can cause the same effect highlighting the influence of sampling on model performance.

Further implications of this paper, which are not covered here in more detail relate to sexual dimorphism among the studied Ozestheria species and carapace ornamentation as a distinguishing feature. The interested reader is referred to the original paper as well as the R-markdown files.

Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.