The absolute majority of statistical methods is designed for
unlimited rational (\(\mathbb
Q\)) or real feature scales (\(\mathbb R\)), thus any number between \(-\infty\) and \(+\infty\) are generally possible. We are
expecting meaningful algebraic operations and ideally
symmetric or better normal populations.
But nearly all variables are at least one-sided limited
to the positive
branch of \(\mathbb R\) or \(\mathbb Q\). Hereby, zeros are mostly
meaningless and nearly always represent missing values. From an
algebraic point of view only multiplication/division cannot leave the
feature scale and mislead to estimate insufficient statistical
parameters such as arithmetic mean or variances.
A closer look often reveals an additional upper limit of our variables
space, mostly related to the physical frame of our planets gravity field
and/or limited resources, ecological constraints and other reasons
(distances, heights, temperature, counts of individuals, areas,
brightness values, etc.).
Furthermore, relative amounts
or compositions are leading to spurious and sometimes crazy
correlations or artificial patterns.
Therefore we have to “open” our bounded scales in order
to perform mathematical/statistical right and meaningful results with
regards to scientific contents.
Many transformation are proposed in terms of achieving normality
or symmetry. Some of them only change the shapes of
distributions whereas other are solving algebraic constraints.
Here, we are presenting linear and non-linear transformation as well as
transformation for double constraint feature scales of compositional or
physical nature.
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.