3060005_linear_Transformations-py

Linear feature scaling¶

Linear feature scaling comprises three different operations on variable spaces: translation (shifting), scaling (compression/stretching) and rotation (orientation change). They are generally used in machine learning for adjusting distributions or metric units concerning central tendency (translation) and/or variance (scaling).

Linear feature scaling don't change the main properties of the original distribution, e.g.:

multi-modal distributions remain multi-modal
skewed distributions won't become symmetric
non-normal frequency distributions stay non-normal
single or double bounded scales just getting new boundaries
singular covariance matrix stays singular

Thus, methodological or algebraic limitation will be preserved, even signs or scales are changing!

Translation¶

On a 1-D feature scale translation means constant shifting of the orign by a constant c: $$x\to x'=x+c: c\in\mathbb R$$

For c>0 x shifts in direction to $+\infty$ (right shift ) and c<0 to $-\infty$ (left shift).

A simple example is the transformation of temperature units between Kelvin $x\in\mathbb R_+$ and Celsius $y\in]-273.15,+\infty[_\mathbb R$ $$ x[°K] \to y[°C]=x-273.15$$ or the back transformation: $$ y[°C] \to x[°K]=x+273.15$$

Note that translation just change orign, but not remove scale limits. Thus translation cannot solve algebraic limitations of bound scales. Translation is and additive Transformation!

Scaling means expand/compress of data spaces by a scalar $s\in\mathbb R_+$: $$x\to x'=s\cdot x$$ For $s>1$ feature sales will be expanded (stretching) and for $s<1$ compressed. A simple example is a transformation of irradiance [W/m^2] to 8-bit-brightness-value $[0,255]_{\mathbb N}$:

$$E_e[W/m^2]\to BV=\lfloor\frac{255}{max(E_e)}E_{e}\rfloor$$

Here, a reel variable $x\in\mathbb R_+$ is transformed to integers $x'\in [0,255]_{\mathbb N_{0}}$ and thus becomes increasingly limited concerning algebraic and methodological approaches.

Further simple examples are currency exchange rates, kilometer to miles, gram to kilogram, etc..

Like translation also scaling (stretching/compression) cannot solve algebraic limitations of bound scales! Stretching/compression is a multiplicative transformation!

In a 1D-feature scale only 180° rotation is possible using $s=-1$ (mirror at orign!.

Linear transformation is the combination of translation, stretching and rotation. It can be simply expressed by: $$ x\to y= s\cdot x+c$$

A huge number of linear transformation are used for different purposes. Hereby, we justpresent the most common ones.

Standardization (z-scoring)¶

Standardization (or z-Transformation) is possible the most known linear transformation with $s=\frac{1}{\sigma}$ and $c=-\frac{\mu}{\sigma}$:

$$x\to z=\frac{x-\mu}{\sigma}=\frac{1}{\sigma}\cdot x-\frac{\mu}{\sigma}$$

Due to its properties for mean $\bar x_z=0$ and variance/standard deviation$s^2_z=s_z=1$, z-transformation is used for comparing variables independent of mean and variance. However standardization preserves the shape properties of the original distribution \

Note: If x is non-normal, z(x) will be non-normal as well!

Special Example of standardization: Stable Isotopes

Isotope ratios paly an important role in paleoclimatology, sedimentology, biology among other related disciplines. Let's have a look on the stable Isotope ratio abreviated as $\delta^{18}O$:
$$\delta^{18}O=\frac{\left ( \frac{^{18}O}{^{16}O}\right )_{sample}-\left ( \frac{^{18}O}{^{16}O}\right )_{standard}}{\left ( \frac{^{18}O}{^{16}O}\right )_{standard}}\cdot 1000 [^0/_{00}]$$ Thus, we have a measured variable $x=\left ( \frac{^{18}O}{^{16}O}\right )_{sample}$ transfomed by a translation constant of $$c=-1000$$ and a scaling factor:

$$\delta^{18}O=\frac{1000}{\left ( \frac{^{18}O}{^{16}O}\right )_{standard}} [^0/_{00}]$$

From an algebraic and thus statistical point of view, the definition of $\delta^{18}O$ comprises a very dangerous transformation:

A very small naturally positive ratio is shifted, so that negative ratios become possible!

$$x=\left ( \frac{^{18}O}{^{16}O}\right )_{sample}\in[0,1]_{\mathbb R_+} \to x'=\delta^{18}O\in[-1000,1000]_\mathbb R$$

Any ratio is algebraically centred to 1 (neutral element concerning multiplication): $\frac{a}{b}=1 \iff a=b$, but here the center is shifted to 0 as neutral element of an additive space while the variable remains meaningful multiplicative!
This transformation (standardization) suggests a variable space related to the field $(\mathbb R,+,\cdot)$ but is definitly not!

However, these problem stay buried as long as the real world data arise relative close to the center compared to the width of possible data range. Thus, in most cases their will appear as quasi-normal. But we should keep these limetations in mind by avoiding distance based classification algorithms!

Further feature scaling procedures¶

Min-Max feature scaling¶

The min-max-normalization is a special case of the weight-transformation after Klovan & Imbrie, 1971: $$w_i=\frac{x_i-a}{b-a}$$ The min-max normalization is given by $a=min(x) \land b=max(x)$. It transfom $x\in[min(x),max(x)]_\mathbb R \to w\in [0,1]_\mathbb R$.

Further variation of the weight transformation is provided by Miesch,1981.

If centering the feature space at 0 is desired, the akin Mean-Normalization can be applied:

$$x'=\frac{x-\bar x}{max(x)-min(x)}$$$$x\in[min(x),max(x)]_\mathbb R \to x'\in [-1,1]_\mathbb R$$

or the median normalization: $$x'=\frac{x-median(x)}{max(x)-min(x)}$$ with a similar resultig feature scale.

Changing the range against the inter-quartile-range leads to the median-quartile normalization:

$$x'= \frac{x-Q_{50}}{Q_{75}-Q_{25}}$$

However, many more linear scaling appraoches are recommended for different sample distribution and approaches, but none of them can solve challenges as non-normality, missing Euclidian properties, non-linear measures and other.

In most cases our real world data has to be treated by non-linaer transformation in order to may apply appropriate statistical methods for meaningful results.