In practice we do not know \(\mathbf \Sigma\), but can estimate it by the sample covariance matrix \(\mathbf S\) (or by the sample correlation matrix \(\mathbf R\) if we want to deal with standardized data). Therefore we seek the estimates \(\hat {\mathbf\Lambda}\) and \(\hat {\mathbf\Psi}\) such that \(\mathbf\Lambda\mathbf\Lambda^T+\mathbf\Psi\) is as close to \(\mathbf{S}\) as possible. In other words we try to make the discrepancy between the sample covariance matrix, \(\mathbf{S}\) and the model matrix \(\mathbf\Sigma\) as small as possible.

Out of different parameter estimation methods we briefly discuss the maximum likelihood method as it is the estimation method implemented in the R function factanal(). Maximum likelihood estimates are obtained by iteration, a process in which \(\hat \Lambda\) and \(\hat \Psi\) are systematically altered to make the maximum likelihood function get smaller and smaller. This approach assumes that the vector \(\mathbf x\) is distributed according to a multivariate normal distribution, that is:

\[f(x) = \vert 2 \pi\mathbf\Sigma\vert^{-1/2}e^{-\frac{1}{2}(\mathbf{x}-\mathbf\mu)^T\Sigma^{-1}(\mathbf{x}-\mathbf\mu)}\] The likelihood for a sample of \(n\) units, is then

\[\mathcal L(\mathbf{x,\mu,\Sigma}) = \vert 2 \pi\mathbf\Sigma\vert^{-n/2}e^{-\frac{1}{2}\sum_{j=1}^n(\mathbf{x_j}-\mathbf\mu)^T\mathbf\Sigma^{-1}(\mathbf{x_j}-\mathbf\mu)}\]

and the log-likelihood

\[\ln L(\mathbf{x,\mu,\Sigma}) = -\frac{n}{2}\ln \vert 2 \pi\mathbf\Sigma\vert -\frac{1}{2}\sum_{j=1}^n(\mathbf{x_j}-\mathbf\mu)^T\Sigma^{-1}(\mathbf{x_j}-\mathbf\mu)\]

If the factor model holds then \(\mathbf \Sigma = \mathbf \Lambda \mathbf \Lambda^T+ \mathbf \Psi\) and the log-likelihood becomes:

\[\ln L(\mathbf{x,\Lambda,\Psi}) = -\frac{n}{2}\ln \vert 2 \pi(\mathbf{\Lambda\Lambda}^T+\mathbf{\Psi}) \vert -\frac{n}{2} tr (\mathbf{\Lambda\Lambda}^T+\mathbf\Psi)^{-1}\mathbf{S}\] Note that in linear algebra \(tr\) stands for the trace of an \(n \times n\) square matrix and is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right).

Finally, the log-likelihood function must be maximized with respect to \(\mathbf\Lambda\) and \(\mathbf\Psi\). In cases one or more elements of the estimate of \(\mathbf\Psi\) become negative, known as Heywood case, the elements of the estimate of \(\mathbf\Psi\) are constrained to be non negative. A major benefit of fitting the factor model by maximum likelihood is that we may evaluate the number of factors to be included in the model by applying a likelihood ratio test, which follows a \(\chi^2\) distribution with \(\frac{1}{2}[(p-m^2-(p+m))]\) degrees of freedom.


Citation

The E-Learning project SOGA-R was developed at the Department of Earth Sciences by Kai Hartmann, Joachim Krois and Annette Rudolph. You can reach us via mail by soga[at]zedat.fu-berlin.de.

Creative Commons License
You may use this project freely under the Creative Commons Attribution-ShareAlike 4.0 International License.

Please cite as follow: Hartmann, K., Krois, J., Rudolph, A. (2023): Statistics and Geodata Analysis using R (SOGA-R). Department of Earth Sciences, Freie Universitaet Berlin.