Canonical Correlation Analysis

Canonical Correlation Analysis(CCA) is a statistical analysis technique to identify correlations between two sets of variables. Given two vector variables $X$ and $Y$, it finds two projections, one for each, to transform them to a common space with maximum correlations.

The package defines a CCA type to represent a CCA model, and provides a set of methods to access the properties.

MultivariateStats.CCA — Type

Canonical Correlation Analysis Model

source

Let M be an instance of CCA, dx be the dimension of X, dy the dimension of Y, and p the output dimension (i.e the dimension of the common space).

StatsAPI.fit — Method

fit(CCA, X, Y; ...)

Perform CCA over the data given in matrices X and Y. Each column of X and Y is an observation.

X and Y should have the same number of columns (denoted by n below).

This method returns an instance of CCA.

Keyword arguments:

method: The choice of methods:
- :cov: based on covariance matrices
- :svd: based on SVD of the input data (default)
outdim: The output dimension, i.e dimension of the common space (default: min(dx, dy, n))
mean: The mean vector, which can be either of:
- 0: the input data has already been centralized
- nothing: this function will compute the mean (default)
- a pre-computed mean vector

Notes: This function calls ccacov or ccasvd internally, depending on the choice of method.

source

Base.size — Method

size(M:CCA)

Return a tuple with the dimension of X, Y, and the output dimension.

source

Statistics.mean — Method

mean(M::CCA, c::Symbol)

Get the mean vector for the component c of the model M. The component parameter can be :x or :y.

source

MultivariateStats.projection — Method

projection(M::CCA, c::Symbol)

Get the projection matrix for the component c of the model M. The component parameter can be :x or :y.

source

Statistics.cor — Method

cor(M::CCA)

The correlations of the projected components (a vector of length p).

source

StatsAPI.predict — Method

predict(M::CCA, Z::AbstractVecOrMat{<:Real}, c::Symbol)

Given a CCA model, one can transform observations into both spaces into a common space, as

\[\mathbf{z}_x = \mathbf{P}_x^T (\mathbf{x} - \boldsymbol{\mu}_x) \\ \mathbf{z}_y = \mathbf{P}_y^T (\mathbf{y} - \boldsymbol{\mu}_y)\]

Here, $\mathbf{P}_x$ and $\mathbf{P}_y$ are projection matrices for $X$ and $Y$; $\boldsymbol{\mu}_x$ and $\boldsymbol{\mu}_y$ are mean vectors.

Parameter Z can be either a vector of length dx, dy, or a matrix where each column is an observation. The component parameter c can be :x or :y.

source

Auxiliary functions:

MultivariateStats.ccacov — Function