Canonical Correlation Analysis
Canonical Correlation Analysis(CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables $X$ and $Y$, it finds two projections, one for each, to transform them to a common space with maximum correlations.
The package defines a CCA
type to represent a CCA model, and provides a set of methods to access the properties.
MultivariateStats.CCA
— TypeCanonical Correlation Analysis Model
Let M
be an instance of CCA
, dx
be the dimension of X
, dy
the dimension of Y
, and p
the output dimension (i.e the dimension of the common space).
StatsAPI.fit
— Methodfit(CCA, X, Y; ...)
Perform CCA over the data given in matrices X
and Y
. Each column of X
and Y
is an observation.
X
and Y
should have the same number of columns (denoted by n
below).
This method returns an instance of CCA
.
Keyword arguments:
method
: The choice of methods::cov
: based on covariance matrices:svd
: based on SVD of the input data (default)
outdim
: The output dimension, i.e dimension of the common space (default:min(dx, dy, n)
)mean
: The mean vector, which can be either of:0
: the input data has already been centralizednothing
: this function will compute the mean (default)- a pre-computed mean vector
Notes: This function calls ccacov
or ccasvd
internally, depending on the choice of method.
Base.size
— Methodsize(M:CCA)
Return a tuple with the dimension of X
, Y
, and the output dimension.
Statistics.mean
— Methodmean(M::CCA, c::Symbol)
Get the mean vector for the component c
of the model M
. The component parameter can be :x
or :y
.
MultivariateStats.projection
— Methodprojection(M::CCA, c::Symbol)
Get the projection matrix for the component c
of the model M
. The component parameter can be :x
or :y
.
Statistics.cor
— Methodcor(M::CCA)
The correlations of the projected components (a vector of length p
).
StatsAPI.predict
— Methodpredict(M::CCA, Z::AbstractVecOrMat{<:Real}, c::Symbol)
Given a CCA
model, one can transform observations into both spaces into a common space, as
\[\mathbf{z}_x = \mathbf{P}_x^T (\mathbf{x} - \boldsymbol{\mu}_x) \\ \mathbf{z}_y = \mathbf{P}_y^T (\mathbf{y} - \boldsymbol{\mu}_y)\]
Here, $\mathbf{P}_x$ and $\mathbf{P}_y$ are projection matrices for $X$ and $Y$; $\boldsymbol{\mu}_x$ and $\boldsymbol{\mu}_y$ are mean vectors.
Parameter Z
can be either a vector of length dx
, dy
, or a matrix where each column is an observation. The component parameter c
can be :x
or :y
.
Auxiliary functions:
MultivariateStats.ccacov
— Functionccacov(Cxx, Cyy, Cxy, xmean, ymean, p)
Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition, and return CCA
model.
Parameters:
Cxx
: The covariance matrix ofX
.Cyy
: The covariance matrix ofY
.Cxy
: The covariance matrix betweenX
andY
.xmean
: The mean vector of the original samples ofX
, which can be
a vector of length dx
, or an empty vector indicating a zero mean.
ymean
: The mean vector of the original samples ofY
, which can be
a vector of length dy
, or an empty vector indicating a zero mean.
p
: The output dimension, i.e the dimension of the common space.
MultivariateStats.ccasvd
— Functionccasvd(Zx, Zy, xmean, ymean, p)
Compute CCA based on singular value decomposition of centralized sample matrices Zx
and Zy
, and return CCA
model[1].
Parameters:
Zx
: The centralized sample matrix forX
.Zy
: The centralized sample matrix forY
.xmean
: The mean vector of the original samples ofX
, which can be
a vector of length dx
, or an empty vector indicating a zero mean.
ymean
: The mean vector of the original samples ofY
, which can be
a vector of length dy
, or an empty vector indicating a zero mean.
p
: The output dimension, i.e the dimension of the common space.
References
- 1David Weenink, Canonical Correlation Analysis, Institute of Phonetic Sciences, Univ. of Amsterdam, Proceedings 25, 81-99, 2003.