In the R software factor analysis is implemented by the factanal() function of the build-in stats package. The function performs maximum-likelihood factor analysis on a covariance matrix or data matrix. The number of factors to be fitted is specified by the argument factors. Further, the factor scores may be calculated either by using Thompson’s estimator or Bartlett’s weighted least-squares scores method. The particular method is specified by an additional argument scores = "regression" or scores = "Bartlett". Moreover, by the additional argument rotation the transformation of the factors may be specified by either rotation = "varimax" for orthogonal rotation, rotation = "Bartlett" for oblique rotation or rotation = "none".


The data set

Let us get our hands dirty and apply a factor analysis on a the food-texture data set. We already discussed the data set in the section on principal component analysis, so you are probably familiar with the data set.

This open source data set is available here and describes texture measurements of a pastry-type food.

food <- read.csv("https://userpage.fu-berlin.de/soga/300/30100_data_sets/food-texture.csv",
                 row.names = "X")
str(food)
## 'data.frame':    50 obs. of  5 variables:
##  $ Oil     : num  16.5 17.7 16.2 16.7 16.3 19.1 18.4 17.5 15.7 16.4 ...
##  $ Density : int  2955 2660 2870 2920 2975 2790 2750 2770 2955 2945 ...
##  $ Crispy  : int  10 14 12 10 11 13 13 10 11 11 ...
##  $ Fracture: int  23 9 17 31 26 16 17 26 23 24 ...
##  $ Hardness: int  97 139 143 95 143 189 114 63 123 132 ...

The data set consists of 50 rows (observations) and 5 columns (features/variables). The features are:


The factanal() function call

In addition to the data set the factanal() function requires an estimate of the number of factors (factanal(data, factors = n)). This is a tricky aspect of factor analysis. If we have a hypothesis about the latent variables we may start with an informed guess. If we do not have any clue about the number of factors and the number of variables in the data set is not too large, one may simply try out several values for initializing the model. Another, more sophisticated approach is to use principal components analysis to get a good initial estimate of the number of factors.

In this simple example we just make a guess and set the number of factor to be \(2\). Further, we keep the defaults for the scores (score = "none") and the rotation (rotation = "varimax").

food.fa <- factanal(food, factors = 2)

Interpretation of the results

Before we interpret the results of the factor analysis recall the basic idea behind it. Factor analysis creates linear combinations of factors to abstract the variable’s underlying communality. To the extent that the variables have an underlying communality, fewer factors capture most of the variance in the data set. This allows us to aggregate a large number of observable variables in a model to represent an underlying concept, making it easier to understand the data. The variability in our data, \(\mathbf X\), is given by \(\Sigma\), and its estimate \(\hat \Sigma\) is composed of the variability explained by the factors explained be a linear combination of the factors (communality) and of the variability, which can not be explained by a linear combination of the factors (uniqueness).

\[\hat \Sigma = \underbrace{\hat\Lambda\hat\Lambda^T}_{\text{communality}} + \underbrace{\hat\Psi}_{\text{uniqueness}}\]

food.fa
## 
## Call:
## factanal(x = food, factors = 2)
## 
## Uniquenesses:
##      Oil  Density   Crispy Fracture Hardness 
##    0.334    0.156    0.042    0.256    0.407 
## 
## Loadings:
##          Factor1 Factor2
## Oil      -0.816         
## Density   0.919         
## Crispy   -0.745   0.635 
## Fracture  0.645  -0.573 
## Hardness          0.764 
## 
##                Factor1 Factor2
## SS loadings      2.490   1.316
## Proportion Var   0.498   0.263
## Cumulative Var   0.498   0.761
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 0.27 on 1 degree of freedom.
## The p-value is 0.603
food.fa$uniquenesses
##       Oil   Density    Crispy  Fracture  Hardness 
## 0.3338599 0.1555255 0.0422238 0.2560235 0.4069459
apply(food.fa$loadings^2,1,sum) # communality
##       Oil   Density    Crispy  Fracture  Hardness 
## 0.6661398 0.8444745 0.9577762 0.7439766 0.5930539
1 - apply(food.fa$loadings^2,1,sum) # uniqueness
##       Oil   Density    Crispy  Fracture  Hardness 
## 0.3338602 0.1555255 0.0422238 0.2560234 0.4069461

The residual matrix

Recall the factor analysis model: \(\hat \Sigma = \hat\Lambda\hat\Lambda^T +\hat\Psi\).

Using our factor model food.fa we may calculate \(\hat \Sigma\) and compare it to the observed correlation matrix, \(S\), by simple matrix algebra. The %*% operator performs matrix multiplication. The t() function transposes a matrix. The diag() function takes a vector of \(k\) numbers and creates a \(k \times k\) matrix with the numbers on the diagonal and \(0\)s everywhere else.

Lambda <- food.fa$loadings
Psi <- diag(food.fa$uniquenesses)
S <- food.fa$correlation
Sigma <- Lambda %*% t(Lambda) + Psi

We now subtract the fitted correlation matrix, (Sigma) from our observed correlation matrix (S). We also round the result to 6 digits.

round(S - Sigma, 6)
##                Oil   Density    Crispy  Fracture  Hardness
## Oil       0.000000  0.000001 -0.002613 -0.018220 -0.000776
## Density   0.000001  0.000000 -0.001081 -0.007539 -0.000320
## Crispy   -0.002613 -0.001081  0.000000  0.000000  0.000005
## Fracture -0.018220 -0.007539  0.000000  0.000000  0.000033
## Hardness -0.000776 -0.000320  0.000005  0.000033  0.000000

The resulting matrix is called the residual matrix. Numbers close to \(0\) indicate that our factor model is a good representation of the underlying concept.


Interpretation of the factors

The purpose of a rotation is to produce factors with a mix of high and low loadings and few moderate-sized loadings. The idea is to give meaning to the factors, which helps interpret them. From a mathematical viewpoint, there is no difference between a rotated and unrotated matrix. The fitted model is the same, the uniquenesses are the same, and the proportion of variance explained is the same.

Let us fit three factor models, one with no rotation, one with varimax rotation, and one with promax rotation, and make a scatter plot of the first and second loadings.

food.fa.none <- factanal(food, factors = 2, rotation = "none")
food.fa.varimax <- factanal(food, factors = 2, rotation = "varimax")
food.fa.promax <- factanal(food, factors = 2, rotation = "promax")

par(mfrow = c(1,3))
plot(food.fa.none$loadings[,1], 
     food.fa.none$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2", 
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "No rotation")
abline(h = 0, v = 0)

plot(food.fa.varimax$loadings[,1], 
     food.fa.varimax$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2", 
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "Varimax rotation")

text(food.fa.varimax$loadings[,1]-0.08, 
     food.fa.varimax$loadings[,2]+0.08,
      colnames(food),
      col="blue")
abline(h = 0, v = 0)

plot(food.fa.promax$loadings[,1], 
     food.fa.promax$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2",
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "Promax rotation")
abline(h = 0, v = 0)

Now comes the tricky aspect in factor analysis: Interpreting the factors themselves. If two variables both have large loadings for the same factor, then we know they have something in common. As a researcher we have to understand the data and its meaning in order to give a name to that common ground. Taking a look on the figures above is appears that factor 1 accounts for pastry, which is dense and can be bend a lot before it breaks. Whereas factor 2 accounts for pastry that crispy and hard to break apart. So if we need to names these factors we would probably call them soft pastry (factor 1) and hard pastry (factor 2).