30410_The_explanatory_factor

Consider $p$ manifest variables in $n$ observations, denoted by $\mathbf x_1,\mathbf x_2, ...,\mathbf x_p$, the variables mean, denoted by $\mu_1, \mu_2, ..., \mu_p$, and the covariance matrix of $\mathbf X_{n \times p}$, denoted by $\mathbf{\Sigma_{p \times p}}$ with

$$ \begin{align} \mathbf X_{n \times p}= \begin{bmatrix} x_{11} & x_{12} & \dots & x_{1p} \\ x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \dots & x_{np} \\ \end{bmatrix}\text{,}\qquad \mathbf M_{n \times p}= \begin{bmatrix} \mu_1 & \mu_2 & \dots & \mu_p\\ \mu_1 & \mu_2 & \dots & \mu_p\\ \vdots & \vdots & \ddots & \vdots\\ \mu_1 & \mu_2 & \dots & \mu_p\\ \end{bmatrix}\text{,}\qquad \mathbf \Sigma_{p \times p}= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \dots & \sigma_{1p} \\ \sigma_{12} & \sigma_{22} & \dots & \sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{1p} & \sigma_{2p} & \dots & \sigma_{pp} \\ \end{bmatrix}\text{}\qquad \end{align} \, . $$

Further, consider $m < p$ factors, denoted by $F_1, F_2,...,F_m$

$$ \begin{align} \mathbf F_{j \times 1}= \begin{bmatrix} f_{j1} \\ f_{j2} \\ \vdots \\ f_{jm} \\ \end{bmatrix} \end{align},\qquad \\ j=1,...,p. $$

The basic idea behind factor analysis is that of regression, or conditional expectation, which means that we express each of the manifest (observed) variables as a linear combinations of latent variables (or factors). Thus, if we have $p$ manifest variables and $m$ factors we may write

$$x_{jn} = \mu_j + \lambda_{j1}f_{1n}+ \lambda_{j2}f_{2n}+...+ \lambda_{jm}f_{mn}+ e_{jn}\qquad j = 1,2,...,p\text{,}$$

where the factors, $F$, are assumed to have zero means and unit standard deviations.The error term $e_j$ is also assumed to have zero mean and standard deviation, $\sigma_j$.

Expressed in matrix form this equation becomes

$$ \begin{align} \begin{bmatrix} x_{i1} & x_{i2} & \dots & x_{ip} \ \end{bmatrix}= \begin{bmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \\ \end{bmatrix}^T+ \begin{bmatrix} \begin{bmatrix} \lambda_{11} & \lambda_{12} & \dots & \lambda_{1m} \\ \lambda_{21} & \lambda_{22} & \dots & \lambda_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{p1} & \lambda_{p2} & \dots & \lambda_{pm} \\ \end{bmatrix} \begin{bmatrix} f_{1i} \\ f_{2i} \\ \vdots \\ f_{mi} \\ \end{bmatrix}\end{bmatrix}^T+ \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_p \\ \end{bmatrix} ^T \end{align} $$

for $i=1,...,n$, which can be expressed more compact in matrix notation.

$$ \mathbf X = \mathbf{M}+ F^T \Lambda^T+\mathbf E $$

Rewriting this equation gives us the exploratory factor model:

$$\mathbf X- \mathbf M = \mathbf F^T \mathbf \Lambda^T+ \mathbf E.$$

Citation

The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.

You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.