Canonical Correlation Analysis
Canonical Correlation Analysis(CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables $X$ and $Y$, it finds two projections, one for each, to transform them to a common space with maximum correlations.
The package defines a CCA type to represent a CCA model, and provides a set of methods to access the properties.
MultivariateStats.CCA — TypeCanonical Correlation Analysis Model
Let M be an instance of CCA, dx be the dimension of X, dy the dimension of Y, and p the output dimension (i.e the dimension of the common space).
StatsAPI.fit — Methodfit(CCA, X, Y; ...)Perform CCA over the data given in matrices X and Y. Each column of X and Y is an observation.
X and Y should have the same number of columns (denoted by n below).
This method returns an instance of CCA.
Keyword arguments:
method: The choice of methods::cov: based on covariance matrices:svd: based on SVD of the input data (default)
outdim: The output dimension, i.e dimension of the common space (default:min(dx, dy, n))mean: The mean vector, which can be either of:0: the input data has already been centralizednothing: this function will compute the mean (default)- a pre-computed mean vector
Notes: This function calls ccacov or ccasvd internally, depending on the choice of method.
Base.size — Methodsize(M:CCA)Return a tuple with the dimension of X, Y, and the output dimension.
Statistics.mean — Methodmean(M::CCA, c::Symbol)Get the mean vector for the component c of the model M. The component parameter can be :x or :y.
MultivariateStats.projection — Methodprojection(M::CCA, c::Symbol)Get the projection matrix for the component c of the model M. The component parameter can be :x or :y.
Statistics.cor — Methodcor(M::CCA)The correlations of the projected components (a vector of length p).
StatsAPI.predict — Methodpredict(M::CCA, Z::AbstractVecOrMat{<:Real}, c::Symbol)Given a CCA model, one can transform observations into both spaces into a common space, as
\[\mathbf{z}_x = \mathbf{P}_x^T (\mathbf{x} - \boldsymbol{\mu}_x) \\ \mathbf{z}_y = \mathbf{P}_y^T (\mathbf{y} - \boldsymbol{\mu}_y)\]
Here, $\mathbf{P}_x$ and $\mathbf{P}_y$ are projection matrices for $X$ and $Y$; $\boldsymbol{\mu}_x$ and $\boldsymbol{\mu}_y$ are mean vectors.
Parameter Z can be either a vector of length dx, dy, or a matrix where each column is an observation. The component parameter c can be :x or :y.
Auxiliary functions:
MultivariateStats.ccacov — Functionccacov(Cxx, Cyy, Cxy, xmean, ymean, p)Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition, and return CCA model.
Parameters:
Cxx: The covariance matrix ofX.Cyy: The covariance matrix ofY.Cxy: The covariance matrix betweenXandY.xmean: The mean vector of the original samples ofX, which can be
a vector of length dx, or an empty vector indicating a zero mean.
ymean: The mean vector of the original samples ofY, which can be
a vector of length dy, or an empty vector indicating a zero mean.
p: The output dimension, i.e the dimension of the common space.
MultivariateStats.ccasvd — Functionccasvd(Zx, Zy, xmean, ymean, p)Compute CCA based on singular value decomposition of centralized sample matrices Zx and Zy, and return CCA model[1].
Parameters:
Zx: The centralized sample matrix forX.Zy: The centralized sample matrix forY.xmean: The mean vector of the original samples ofX, which can be
a vector of length dx, or an empty vector indicating a zero mean.
ymean: The mean vector of the original samples ofY, which can be
a vector of length dy, or an empty vector indicating a zero mean.
p: The output dimension, i.e the dimension of the common space.
References
- 1David Weenink, Canonical Correlation Analysis, Institute of Phonetic Sciences, Univ. of Amsterdam, Proceedings 25, 81-99, 2003.