Independent Component Analysis
Independent Component Analysis (ICA) is a computational technique for separating a multivariate signal into additive subcomponents, with the assumption that the subcomponents are non-Gaussian and independent from each other.
There are multiple algorithms for ICA. Currently, this package implements the Fast ICA algorithm.
FastICA
This package implements the FastICA algorithm[1]. The package uses the ICA
type to define a FastICA model:
MultivariateStats.ICA
— TypeThis type contains ICA model parameters: mean and component matrix $W$.
Note: Each column of the component matrix $W$ corresponds to an independent component.
Several methods are provided to work with ICA
. Let $M$ be an instance of ICA
:
StatsAPI.fit
— Methodfit(ICA, X, k; ...)
Perform ICA over the data set given in X
.
Parameters: -X
: The data matrix, of size $(m, n)$. Each row corresponds to a mixed signal, while each column corresponds to an observation (e.g all signal value at a particular time step). -k
: The number of independent components to recover.
Keyword Arguments:
alg
: The choice of algorithm (default:fastica
)fun
: The approx neg-entropy functor (defaultTanh
)do_whiten
: Whether to perform pre-whitening (defaulttrue
)maxiter
: Maximum number of iterations (default100
)tol
: Tolerable change of $W$ at convergence (default1.0e-6
)mean
: The mean vector, which can be either of:0
: the input data has already been centralizednothing
: this function will compute the mean (default)- a pre-computed mean vector
winit
: Initial guess of $W$, which should be either of:- empty matrix: the function will perform random initialization (default)
- a matrix of size $(k, k)$ (when
do_whiten
) - a matrix of size $(m, k)$ (when
!do_whiten
)
Returns the resultant ICA model, an instance of type ICA
.
Note: If do_whiten
is true
, the return W
satisfies $\mathbf{W}^T \mathbf{C} \mathbf{W} = \mathbf{I}$, otherwise $W$ is orthonormal, i.e $\mathbf{W}^T \mathbf{W} = \mathbf{I}$.
Base.size
— Methodsize(M::ICA)
Returns a tuple with the input dimension, i.e the number of observed mixtures, and the output dimension, i.e the number of independent components.
Statistics.mean
— Methodmean(M::ICA)
Returns the mean vector.
StatsAPI.predict
— Methodpredict(M::ICA, x)
Transform x
to the output space to extract independent components, as $\mathbf{W}^T (\mathbf{x} - \boldsymbol{\mu})$, given the model M
.
The package also exports functions of the core algorithms. Sometimes, it can be more efficient to directly invoke them instead of going through the fit
interface.
MultivariateStats.fastica!
— Functionfastica!(W, X, fun, maxiter, tol, verbose)
Invoke the Fast ICA algorithm[1].
Parameters:
W
: The initial un-mixing matrix, of size $(m, k)$. The function updates this matrix inplace.X
: The data matrix, of size $(m, n)$. This matrix is input only, and won't be modified.fun
: The approximate neg-entropy functor of typeICAGDeriv
.maxiter
: Maximum number of iterations.tol
: Tolerable change ofW
at convergence.
Returns the updated W
.
Note: The number of components is inferred from W
as size(W, 2)
.
The FastICA method requires a first derivative of a functor $g$ to approximate negative entropy. The package implements an following interface for defining derivative value estimation:
MultivariateStats.ICAGDeriv
— TypeThe abstract type for all g
(derivative) functions.
Let g
be an instance of such type, then update!(g, U, E)
given
U = w'x
returns updated in-place U
and E
, s.t.
g(w'x) --> U
andE{g'(w'x)} --> E
MultivariateStats.Tanh
— TypeDerivative for $(1/a_1)\log\cosh a_1 u$
MultivariateStats.Gaus
— TypeDerivative for $-e^{\frac{-u^2}{2}}$
References
- 1Aapo Hyvarinen and Erkki Oja, Independent Component Analysis: Algorithms and Applications. Neural Network 13(4-5), 2000.