Consider \(p\) manifest variables in n observations, denoted by \(\mathbf x_1,\mathbf x_2, ...,\mathbf x_p\), and the variables mean, denoted by \(\mu_1, \mu_2, ..., \mu_p\), and the covariance matrix of \(\mathbf X_{n \times p}\), denoted by \(\mathbf{\Sigma_{p \times p}}\).

\[ \begin{align} \mathbf X_{n \times p}= \begin{bmatrix} x_{11} & x_{12} & \dots & x_{1p} \\ x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots\\ x_{n1} & x_{n2} & \dots & x_{np} \\ \end{bmatrix}\text{,}\qquad \mathbf M_{n \times p}= \begin{bmatrix} \mu_1 & \mu_2 & \dots & \mu_p\\ \mu_1 & \mu_2 & \dots & \mu_p\\ \vdots & \vdots & \ddots & \vdots\\ \mu_1 & \mu_2 & \dots & \mu_p\\ \end{bmatrix}\text{,}\qquad \mathbf \Sigma_{p \times p}= \begin{bmatrix} \sigma_{11} & \sigma_{12} & \dots & \sigma_{1p} \\ \sigma_{12} & \sigma_{22} & \dots & \sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{1p} & \sigma_{2p} & \dots & \sigma_{pp} \\ \end{bmatrix}\text{}\qquad \end{align} \]

Further consider \(m < p\) factors, denoted by \(F_1, F_2,...,F_m\)

\[ \begin{align} \mathbf F_{j \times 1}= \begin{bmatrix} f_{j1} \\ f_{j2} \\ \vdots \\ f_{jm} \\ \end{bmatrix} \end{align},\qquad j=1,...,p. \]

The basic idea behind factor analysis is that of regression, or conditional expectation, which means that we express each of the manifest (observed) variables as a linear combination of latent variables (or factors). Thus, if we have \(p\) manifest variables and \(m\) factors we may write

\[x_{jn} = \mu_j + f_{1n}\lambda_{j1}+ f_{2n}\lambda_{j2}+...+ f_{mn}\lambda_{jm}+ e_{jn}\qquad j = 1,2,...,p\text{,}\]

where the factors, \(F\), are assumed to have zero means and unit standard deviations.The error term \(e_j\) is also assumed to have zero mean and standard deviation, \(\sigma_j\) and \(\lambda_j\) are the loadings of the j-th factor.

Expressed in matrix form this equation becomes

\[ \begin{align} \begin{bmatrix} x_{i1} & x_{i2} & \dots & x_{ip} \ \end{bmatrix}= \begin{bmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \\ \end{bmatrix}^T+ \begin{bmatrix} \begin{bmatrix} \lambda_{11} & \lambda_{12} & \dots & \lambda_{1m} \\ \lambda_{21} & \lambda_{22} & \dots & \lambda_{2m} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{p1} & \lambda_{p2} & \dots & \lambda_{pm} \\ \end{bmatrix} \begin{bmatrix} f_{1i} \\ f_{2i} \\ \vdots \\ f_{mi} \\ \end{bmatrix}\end{bmatrix}^T+ \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_p \\ \end{bmatrix} ^T \end{align} \]

for i=1,…,n, which can be expressed more compact in matrix notation.

\[ \mathbf X = \mathbf{M}+ F^T \Lambda^T+\mathbf E \] Rewriting this equation gives us the exploratory factor model:

\[\mathbf X- \mathbf M = \mathbf F^T \mathbf \Lambda^T+ \mathbf E.\]


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.