Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) are statistical analysis methods to find a linear combination of features for separating observations in two classes.

  • Note: Please refer to MulticlassLDA for methods that can discriminate between multiple classes.

Suppose the samples in the positive and negative classes respectively with means: $\boldsymbol{\mu}_p$ and $\boldsymbol{\mu}_n$, and covariances $\mathbf{C}_p$ and $\mathbf{C}_n$. Then based on Fisher's Linear Discriminant Criteria, the optimal projection direction can be expressed as:

\[\mathbf{w} = \alpha \cdot (\mathbf{C}_p + \mathbf{C}_n)^{-1} (\boldsymbol{\mu}_p - \boldsymbol{\mu}_n)\]

Here $\alpha$ is an arbitrary non-negative coefficient.

Example

Let's compare LDA and PCA 2D projection of Iris dataset. Principal Component Analysis identifies the directions that explain the most variance in the data.

using MultivariateStats, RDatasets

iris = dataset("datasets", "iris")

X = Matrix(iris[1:2:end,1:4])'
X_labels = Vector(iris[1:2:end,5])

pca = fit(PCA, X; maxoutdim=2)
Ypca = predict(pca, X)

In contrast, Linear Discriminant Analysis attempts to identify the attributes that account for the most variance between classes. Therefore, with LDA, you must use known class labels.

lda = fit(MulticlassLDA, X, X_labels; outdim=2)
Ylda = predict(lda, X)

p = plot(layout=(1,2), size=(800,300))

for s in ["setosa", "versicolor", "virginica"]

    points = Ypca[:,X_labels.==s]
    scatter!(p[1], points[1,:],points[2,:], label=s, legend=:bottomleft)
    points = Ylda[:,X_labels.==s]
    scatter!(p[2], points[1,:],points[2,:], label=s, legend=:bottomleft)

end
plot!(p[1], title="PCA")
plot!(p[2], title="LDA")

Two-class Linear Discriminant Analysis

This package uses the LinearDiscriminant type to capture a linear discriminant functional:

MultivariateStats.LinearDiscriminantType

A linear discriminant functional can be written as

\[ f(\mathbf{x}) = \mathbf{w}^T \mathbf{x} + b\]

Here, $\mathbf{w}$ is the coefficient vector, and $b$ is the bias constant.

source

This type comes with several methods where $f$ be an instance of LinearDiscriminant.

StatsAPI.fitMethod
fit(LinearDiscriminant, Xp, Xn; covestimator = SimpleCovariance())

Performs LDA given both positive and negative samples. The function accepts following parameters:

Parameters

  • Xp: The sample matrix of the positive class.
  • Xn: The sample matrix of the negative class.

Keyword arguments:

  • covestimator: Custom covariance estimator for between-class covariance. The covariance matrix will be calculated as cov(covestimator_between, #=data=#; dims=2, mean=zeros(#=...=#). Custom covariance estimators, available in other packages, may result in more robust discriminants for data with more features than observations.
source
MultivariateStats.evaluateMethod
evaluate(f, x::AbstractVector)

Evaluate the linear discriminant value, i.e $\mathbf{w}'x + b$, it returns a real value.

source
MultivariateStats.evaluateMethod
evaluate(f, X::AbstractMatrix)

Evaluate the linear discriminant value, i.e $\mathbf{w}'x + b$, for each sample in columns of X. The function returns a vector of length size(X, 2).

source
StatsAPI.predictMethod
predict(f, x::AbstractVector)

Make prediction for the vector x. It returns true iff evaluate(f, x) is positive.

source
StatsAPI.coefMethod
coef(f::LinearDiscriminant)

Return the coefficients of the linear discriminant model.

source
StatsAPI.dofMethod
dof(f::LinearDiscriminant)

Return the number of degrees of freedom in the linear discriminant model.

source
StatsAPI.weightsMethod
weights(f::LinearDiscriminant)

Return the linear discriminant model coefficient vector.

source
Base.lengthMethod
length(f::LinearDiscriminant)

Get the length of the coefficient vector.

source

Additional functionality:

MultivariateStats.ldacovFunction
ldacov(C, μp, μn)

Performs LDA given a covariance matrix C and both mean vectors μp & μn. Returns a linear discriminant functional of type LinearDiscriminant.

Parameters

  • C: The pooled covariance matrix (i.e $(\mathbf{C}_p + \mathbf{C}_n)/2$)
  • μp: The mean vector of the positive class.
  • μn: The mean vector of the negative class.
source
ldacov(Cp, Cn, μp, μn)

Performs LDA given covariances and mean vectors. Returns a linear discriminant functional of type LinearDiscriminant.

Parameters

  • Cp: The covariance matrix of the positive class.
  • Cn: The covariance matrix of the negative class.
  • μp: The mean vector of the positive class.
  • μn: The mean vector of the negative class.

Note: The coefficient vector is scaled such that $\mathbf{w}'\boldsymbol{μ}_p + b = 1$ and $\mathbf{w}'\boldsymbol{μ}_n + b = -1$.

source

Multi-class Linear Discriminant Analysis

Multi-class LDA is a generalization of standard two-class LDA that can handle arbitrary number of classes.

Overview

Multi-class LDA is based on the analysis of two scatter matrices: within-class scatter matrix and between-class scatter matrix.

Given a set of samples $\mathbf{x}_1, \ldots, \mathbf{x}_n$, and their class labels $y_1, \ldots, y_n$:

  • The within-class scatter matrix is defined as:

\[\mathbf{S}_w = \sum_{i=1}^n (\mathbf{x}_i - \boldsymbol{\mu}_{y_i}) (\mathbf{x}_i - \boldsymbol{\mu}_{y_i})^T\]

Here, $\boldsymbol{\mu}_k$ is the sample mean of the $k$-th class.

  • The between-class scatter matrix is defined as:

\[\mathbf{S}_b = \sum_{k=1}^m n_k (\boldsymbol{\mu}_k - \boldsymbol{\mu}) (\boldsymbol{\mu}_k - \boldsymbol{\mu})^T\]

Here, $m$ is the number of classes, $\boldsymbol{\mu}$ is the overall sample mean, and $n_k$ is the number of samples in the $k$-th class.

Then, multi-class LDA can be formulated as an optimization problem to find a set of linear combinations (with coefficients $\mathbf{w}$) that maximizes the ratio of the between-class scattering to the within-class scattering, as

\[\hat{\mathbf{w}} = \mathop{\mathrm{argmax}}_{\mathbf{w}} \frac{\mathbf{w}^T \mathbf{S}_b \mathbf{w}}{\mathbf{w}^T \mathbf{S}_w \mathbf{w}}\]

The solution is given by the following generalized eigenvalue problem:

\[\mathbf{S}_b \mathbf{w} = \lambda \mathbf{S}_w \mathbf{w}\]

Generally, at most $m - 1$ generalized eigenvectors are useful to discriminate between $m$ classes.

When the dimensionality is high, it may not be feasible to construct the scatter matrices explicitly. In such cases, see SubspaceLDA below.

Normalization by number of observations

An alternative definition of the within- and between-class scatter matrices normalizes for the number of observations in each group:

\[\mathbf{S}_w^* = n \sum_{k=1}^m \frac{1}{n_k} \sum_{i \mid y_i=k} (\mathbf{x}_i - \boldsymbol{\mu}_{k}) (\mathbf{x}_i - \boldsymbol{\mu}_{k})^T\]

\[\mathbf{S}_b^* = n \sum_{k=1}^m (\boldsymbol{\mu}_k - \boldsymbol{\mu}^*) (\boldsymbol{\mu}_k - \boldsymbol{\mu}^*)^T\]

where

\[\boldsymbol{\mu}^* = \frac{1}{k} \sum_{k=1}^m \boldsymbol{\mu}_k.\]

This definition can sometimes be more useful when looking for directions which discriminate among clusters containing widely-varying numbers of observations.

Multi-class LDA

The package defines a MulticlassLDA type to represent a multi-class LDA model, as:

MultivariateStats.MulticlassLDAType

A multi-class linear discriminant model type has following fields:

  • proj is the projection matrix
  • pmeans is the projected means of all classes
  • stats is an instance of MulticlassLDAStats type that captures all statistics computed to train the model (which we will discuss later).
source

Several methods are provided to access properties of the LDA model. Let M be an instance of MulticlassLDA:

StatsAPI.fitMethod
fit(MulticlassLDA, X, y; ...)

Perform multi-class LDA over a given data set X with corresponding labels y with nc number of classes.

This function returns the resultant multi-class LDA model as an instance of MulticlassLDA.

Parameters

  • X: the matrix of input samples, of size (d, n). Each column in X is an observation.
  • y: the vector of class labels, of length n.

Keyword arguments

  • method: The choice of methods:
    • :gevd: based on generalized eigenvalue decomposition (default).
    • :whiten: first derive a whitening transform from Sw and then solve the problem based on eigenvalue
    decomposition of the whiten Sb.
  • outdim: The output dimension, i.e. dimension of the transformed space min(d, nc-1)
  • regcoef: The regularization coefficient (default: 1.0e-6). A positive value regcoef * eigmax(Sw) is added to the diagonal of Sw to improve numerical stability.
  • covestimator_between: Custom covariance estimator for between-class covariance (default: SimpleCovariance()). The covariance matrix will be calculated as cov(covestimator_between, #=data=#; dims=2, mean=zeros(#=...=#)). Custom covariance estimators, available in other packages, may result in more robust discriminants for data with more features than observations.
  • covestimator_within: Custom covariance estimator for within-class covariance (default: SimpleCovariance()). The covariance matrix will be calculated as cov(covestimator_within, #=data=#; dims=2, mean=zeros(nc)). Custom covariance estimators, available in other packages, may result in more robust discriminants for data with more features than observations.

Notes:

The resultant projection matrix $P$ satisfies:

\[\mathbf{P}^T (\mathbf{S}_w + \kappa \mathbf{I}) \mathbf{P} = \mathbf{I}\]

Here, $\kappa$ equals regcoef * eigmax(Sw). The columns of $\mathbf{P}$ are arranged in descending order of the corresponding generalized eigenvalues.

Note that MulticlassLDA does not currently support the normalized version using $\mathbf{S}_w^*$ and $\mathbf{S}_b^*$ (see SubspaceLDA).

source
StatsAPI.predictMethod
predict(M::MulticlassLDA, x)

Transform input sample(s) in x to the output space of MC-LDA model M. Here, x can be either a sample vector or a matrix comprised of samples in columns.

source
Statistics.meanMethod
mean(M::MulticlassLDA)

Get the overall sample mean vector (of length $d$).

source
Base.sizeMethod
size(M::MulticlassLDA)

Get the input (i.e the dimension of the observation space) and output (i.e the dimension of the transformed features) dimensions of the model M.

source
MultivariateStats.classweightsMethod
classweights(M)

Get the weights of individual classes (a vector of length $m$). If the samples are not weighted, the weight equals the number of samples of each class.

source

Subspace Linear Discriminant Analysis

The package also defines a SubspaceLDA type to represent a multi-class LDA model for high-dimensional spaces. MulticlassLDA, because it stores the scatter matrices, is not well-suited for high-dimensional data. For example, if you are performing LDA on images, and each image has $10^6$ pixels, then the scatter matrices would contain $10^12$ elements, far too many to store directly. SubspaceLDA calculates the projection direction without the intermediary of the scatter matrices, by focusing on the subspace that lies within the span of the within-class scatter. This also serves to regularize the computation.

SubspaceLDA supports all the same methods as MulticlassLDA, with the exception of the functions that return a scatter matrix. The overall projection is represented as a factorization $P*L$, where $P'*x$ projects data points to the subspace spanned by the within-class scatter, and $L$ is the LDA projection in the subspace. The projection directions $w$ (the columns of $projection(M)$) satisfy the equation

\[ \mathbf{P}^T \mathbf{S}_b \mathbf{w} = \lambda \mathbf{P}^T \mathbf{S}_w \mathbf{w}.\]

When $P$ is of full rank (e.g., if there are more data points than dimensions), then this equation guarantees that $\mathbf{S}_b \mathbf{w} = \lambda \mathbf{S}_w \mathbf{w}$ will also hold.

The package defines a SubspaceLDA type to represent a multi-class LDA model, as:

MultivariateStats.SubspaceLDAType

Subspace LDA model type has following fields:

  • projw: the projection matrix of the subspace spanned by the between-class scatter
  • projLDA: the projection matrix of the subspace spanned by the within-class scatter
  • λ: the projection eigenvalues
  • cmeans: the class centroids
  • cweights: the class weights
source

Several methods are provided to access properties of the LDA model. Let M be an instance of SubspaceLDA:

StatsAPI.fitMethod
fit(SubspaceLDA, X, y; normalize=true)

Fit an subspace projection of LDA model over a given data set X with corresponding labels y using the equivalent of $\mathbf{S}_w^*$ and $\mathbf{S}_b^*$`.

Note: Subspace LDA also supports the normalized version of LDA via the normalize keyword.

source
StatsAPI.predictMethod
predict(M::SubspaceLDA, x)

Transform input sample(s) in x to the output space of LDA model M. Here, x can be either a sample vector or a matrix comprised of samples in columns.

source
Base.sizeMethod
size(M)

Get the input (i.e the dimension of the observation space) and output (i.e the dimension of the subspace projection) dimensions of the model M.

source