Canonical Correlation Analysis

Canonical Correlation Analysis(CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables $X$ and $Y$, it finds two projections, one for each, to transform them to a common space with maximum correlations.

The package defines a CCA type to represent a CCA model, and provides a set of methods to access the properties.

Let M be an instance of CCA, dx be the dimension of X, dy the dimension of Y, and p the output dimension (i.e the dimension of the common space).

StatsAPI.fitMethod
fit(CCA, X, Y; ...)

Perform CCA over the data given in matrices X and Y. Each column of X and Y is an observation.

X and Y should have the same number of columns (denoted by n below).

This method returns an instance of CCA.

Keyword arguments:

  • method: The choice of methods:
    • :cov: based on covariance matrices
    • :svd: based on SVD of the input data (default)
  • outdim: The output dimension, i.e dimension of the common space (default: min(dx, dy, n))
  • mean: The mean vector, which can be either of:
    • 0: the input data has already been centralized
    • nothing: this function will compute the mean (default)
    • a pre-computed mean vector

Notes: This function calls ccacov or ccasvd internally, depending on the choice of method.

source
Base.sizeMethod
size(M:CCA)

Return a tuple with the dimension of X, Y, and the output dimension.

source
Statistics.meanMethod
mean(M::CCA, c::Symbol)

Get the mean vector for the component c of the model M. The component parameter can be :x or :y.

source
MultivariateStats.projectionMethod
projection(M::CCA, c::Symbol)

Get the projection matrix for the component c of the model M. The component parameter can be :x or :y.

source
Statistics.corMethod
cor(M::CCA)

The correlations of the projected components (a vector of length p).

source
StatsAPI.predictMethod
predict(M::CCA, Z::AbstractVecOrMat{<:Real}, c::Symbol)

Given a CCA model, one can transform observations into both spaces into a common space, as

\[\mathbf{z}_x = \mathbf{P}_x^T (\mathbf{x} - \boldsymbol{\mu}_x) \\ \mathbf{z}_y = \mathbf{P}_y^T (\mathbf{y} - \boldsymbol{\mu}_y)\]

Here, $\mathbf{P}_x$ and $\mathbf{P}_y$ are projection matrices for $X$ and $Y$; $\boldsymbol{\mu}_x$ and $\boldsymbol{\mu}_y$ are mean vectors.

Parameter Z can be either a vector of length dx, dy, or a matrix where each column is an observation. The component parameter c can be :x or :y.

source

Auxiliary functions:

MultivariateStats.ccacovFunction
ccacov(Cxx, Cyy, Cxy, xmean, ymean, p)

Compute CCA based on analysis of the given covariance matrices, using generalized eigenvalue decomposition, and return CCA model.

Parameters:

  • Cxx: The covariance matrix of X.
  • Cyy: The covariance matrix of Y.
  • Cxy: The covariance matrix between X and Y.
  • xmean: The mean vector of the original samples of X, which can be

a vector of length dx, or an empty vector indicating a zero mean.

  • ymean: The mean vector of the original samples of Y, which can be

a vector of length dy, or an empty vector indicating a zero mean.

  • p: The output dimension, i.e the dimension of the common space.
source
MultivariateStats.ccasvdFunction
ccasvd(Zx, Zy, xmean, ymean, p)

Compute CCA based on singular value decomposition of centralized sample matrices Zx and Zy, and return CCA model[1].

Parameters:

  • Zx: The centralized sample matrix for X.
  • Zy: The centralized sample matrix for Y.
  • xmean: The mean vector of the original samples of X, which can be

a vector of length dx, or an empty vector indicating a zero mean.

  • ymean: The mean vector of the original samples of Y, which can be

a vector of length dy, or an empty vector indicating a zero mean.

  • p: The output dimension, i.e the dimension of the common space.
source

References

  • 1David Weenink, Canonical Correlation Analysis, Institute of Phonetic Sciences, Univ. of Amsterdam, Proceedings 25, 81-99, 2003.