Factor Analysis

Factor Analysis (FA) is a linear-Gaussian latent variable model that is closely related to probabilistic PCA. In contrast to the probabilistic PCA model, the covariance of conditional distribution of the observed variable given the latent variable is diagonal rather than isotropic^[1].

This package defines a FactorAnalysis type to represent a factor analysis model, and provides a set of methods to access the properties.

MultivariateStats.FactorAnalysis — Type

This type contains factor analysis model parameters.

source

The package provides a set of methods to access the properties of the factor analysis model. Let $M$ be an instance of FactorAnalysis, $d$ be the dimension of observations, and $p$ be the output dimension (i.e the dimension of the principal subspace).

StatsAPI.fit — Method

fit(FactorAnalysis, X; ...)

Perform factor analysis over the data given in a matrix X. Each column of X is an observation. This method returns an instance of FactorAnalysis.

Keyword arguments:

Let (d, n) = size(X) be respectively the input dimension and the number of observations:

method: The choice of methods:
- :em: use EM version of factor analysis
- :cm: use CM version of factor analysis (default)
maxoutdim: Maximum output dimension (default d-1)
mean: The mean vector, which can be either of:
- 0: the input data has already been centralized
- nothing: this function will compute the mean (default)
- a pre-computed mean vector
tol: Convergence tolerance (default 1.0e-6)
maxiter: Maximum number of iterations (default 1000)
η: Variance low bound (default 1.0e-6)

Notes: This function calls facm or faem internally, depending on the choice of method.

source

Base.size — Method

size(M::FactorAnalysis)

Returns a tuple with values of the input dimension $d$, i.e the dimension of the observation space, and the output dimension $p$, i.e the dimension of the principal subspace.

source

Statistics.mean — Method

mean(M::FactorAnalysis)

Get the mean vector (of length $d$).

source

Statistics.var — Method

var(M::FactorAnalysis)

Returns the variance of the model M.

source

Statistics.cov — Method

cov(M::FactorAnalysis)

Returns the covariance of the model M.

source

MultivariateStats.projection — Method

projection(M::FactorAnalysis)

Recovers principle components from the weight matrix of the model M.

source

MultivariateStats.loadings — Method

loadings(M::FactorAnalysis)

Returns the factor loadings matrix of the model M.

source

Given a factor analysis model $M$, one can use it to transform observations into latent variables, as

\mathbf{z} =  \mathbf{W}^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})

or use it to reconstruct (approximately) the observations from latent variables, as

\tilde{\mathbf{x}} = \mathbf{\Sigma} \mathbf{W} (\mathbf{W}^T \mathbf{W})^{-1} \mathbf{z} + \boldsymbol{\mu}

Here, $\mathbf{W}$ is the factor loadings or weight matrix, $\mathbf{\Sigma} = \mathbf{\Psi} + \mathbf{W}^T \mathbf{W}$ is the covariance matrix.

The package provides methods to do so:

StatsAPI.predict — Method

predict(M::FactorAnalysis, x)

Transform observations x into latent variables. Here, x can be either a vector of length d or a matrix where each column is an observation.

source

MultivariateStats.reconstruct — Method

reconstruct(M::FactorAnalysis, z)

Approximately reconstruct observations from the latent variable given in z. Here, z can be either a vector of length $p$ or a matrix where each column gives the latent variables for an observation.

source

Auxiliary functions:

MultivariateStats.faem — Function

faem(S, mean, n; ...)

Performs factor analysis using an expectation-maximization algorithm for a given sample covariance matrix S^[2].

Parameters

S: The sample covariance matrix.
mean: The mean vector of original samples, which can be a vector of length $d$,

or an empty vector indicating a zero mean.

n: The number of observations.

Returns the resultant FactorAnalysis model.

Note: This function accepts two keyword arguments: maxoutdim,tol, and maxiter.

source

MultivariateStats.facm — Function

facm(S, mean, n; ...)

Performs factor analysis using a fast conditional maximization algorithm for a given sample covariance matrix S^[3].

Parameters

S: The sample covariance matrix.
mean: The mean vector of original samples, which can be a vector of length $d$,

or an empty vector indicating a zero mean.

n: The number of observations.

Returns the resultant FactorAnalysis model.

Note: This function accepts two keyword arguments: maxoutdim,tol, maxiter, and η.

source

References

1Bishop, C. M. Pattern Recognition and Machine Learning, 2006.
2Rubin, Donald B., and Dorothy T. Thayer. EM algorithms for ML factor analysis. Psychometrika 47.1, 69-76, 1982.
3Zhao, J-H., Philip LH Yu, and Qibao Jiang. ML estimation for factor analysis: EM or non-EM?. Statistics and computing 18.2, 109-123, 2008.