Factor Analysis
Factor Analysis (FA) is a linear-Gaussian latent variable model that is closely related to probabilistic PCA. In contrast to the probabilistic PCA model, the covariance of conditional distribution of the observed variable given the latent variable is diagonal rather than isotropic[1].
This package defines a FactorAnalysis
type to represent a factor analysis model, and provides a set of methods to access the properties.
MultivariateStats.FactorAnalysis
— TypeThis type contains factor analysis model parameters.
The package provides a set of methods to access the properties of the factor analysis model. Let $M$ be an instance of FactorAnalysis
, $d$ be the dimension of observations, and $p$ be the output dimension (i.e the dimension of the principal subspace).
StatsAPI.fit
— Methodfit(FactorAnalysis, X; ...)
Perform factor analysis over the data given in a matrix X
. Each column of X
is an observation. This method returns an instance of FactorAnalysis
.
Keyword arguments:
Let (d, n) = size(X)
be respectively the input dimension and the number of observations:
method
: The choice of methods::em
: use EM version of factor analysis:cm
: use CM version of factor analysis (default)
maxoutdim
: Maximum output dimension (defaultd-1
)mean
: The mean vector, which can be either of:0
: the input data has already been centralizednothing
: this function will compute the mean (default)- a pre-computed mean vector
tol
: Convergence tolerance (default1.0e-6
)maxiter
: Maximum number of iterations (default1000
)η
: Variance low bound (default1.0e-6
)
Notes: This function calls facm
or faem
internally, depending on the choice of method.
Base.size
— Methodsize(M::FactorAnalysis)
Returns a tuple with values of the input dimension $d$, i.e the dimension of the observation space, and the output dimension $p$, i.e the dimension of the principal subspace.
Statistics.mean
— Methodmean(M::FactorAnalysis)
Get the mean vector (of length $d$).
Statistics.var
— Methodvar(M::FactorAnalysis)
Returns the variance of the model M
.
Statistics.cov
— Methodcov(M::FactorAnalysis)
Returns the covariance of the model M
.
MultivariateStats.projection
— Methodprojection(M::FactorAnalysis)
Recovers principle components from the weight matrix of the model M
.
MultivariateStats.loadings
— Methodloadings(M::FactorAnalysis)
Returns the factor loadings matrix of the model M
.
Given a factor analysis model $M$, one can use it to transform observations into latent variables, as
\mathbf{z} = \mathbf{W}^T \mathbf{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu})
or use it to reconstruct (approximately) the observations from latent variables, as
\tilde{\mathbf{x}} = \mathbf{\Sigma} \mathbf{W} (\mathbf{W}^T \mathbf{W})^{-1} \mathbf{z} + \boldsymbol{\mu}
Here, $\mathbf{W}$ is the factor loadings or weight matrix, $\mathbf{\Sigma} = \mathbf{\Psi} + \mathbf{W}^T \mathbf{W}$ is the covariance matrix.
The package provides methods to do so:
StatsAPI.predict
— Methodpredict(M::FactorAnalysis, x)
Transform observations x
into latent variables. Here, x
can be either a vector of length d
or a matrix where each column is an observation.
MultivariateStats.reconstruct
— Methodreconstruct(M::FactorAnalysis, z)
Approximately reconstruct observations from the latent variable given in z
. Here, z
can be either a vector of length $p$ or a matrix where each column gives the latent variables for an observation.
Auxiliary functions:
MultivariateStats.faem
— Functionfaem(S, mean, n; ...)
Performs factor analysis using an expectation-maximization algorithm for a given sample covariance matrix S
[2].
Parameters
S
: The sample covariance matrix.mean
: The mean vector of original samples, which can be a vector of length $d$,
or an empty vector indicating a zero mean.
n
: The number of observations.
Returns the resultant FactorAnalysis
model.
Note: This function accepts two keyword arguments: maxoutdim
,tol
, and maxiter
.
MultivariateStats.facm
— Functionfacm(S, mean, n; ...)
Performs factor analysis using a fast conditional maximization algorithm for a given sample covariance matrix S
[3].
Parameters
S
: The sample covariance matrix.mean
: The mean vector of original samples, which can be a vector of length $d$,
or an empty vector indicating a zero mean.
n
: The number of observations.
Returns the resultant FactorAnalysis
model.
Note: This function accepts two keyword arguments: maxoutdim
,tol
, maxiter
, and η
.
References
- 1Bishop, C. M. Pattern Recognition and Machine Learning, 2006.
- 2Rubin, Donald B., and Dorothy T. Thayer. EM algorithms for ML factor analysis. Psychometrika 47.1, 69-76, 1982.
- 3Zhao, J-H., Philip LH Yu, and Qibao Jiang. ML estimation for factor analysis: EM or non-EM?. Statistics and computing 18.2, 109-123, 2008.