Basics

The package implements a variety of clustering algorithms:

Most of the clustering functions in the package have a similar interface, making it easy to switch between different clustering algorithms.

Inputs

A clustering algorithm, depending on its nature, may accept an input matrix in either of the following forms:

  • Data matrix $X$ of size $d \times n$, the $i$-th column of $X$ (X[:, i]) is a data point (data sample) in $d$-dimensional space.
  • Distance matrix $D$ of size $n \times n$, where $D_{ij}$ is the distance between the $i$-th and $j$-th points, or the cost of assigning them to the same cluster.

Common Options

Many clustering algorithms are iterative procedures. The functions share the basic options for controlling the iterations:

  • maxiter::Integer: maximum number of iterations.
  • tol::Real: minimal allowed change of the objective during convergence. The algorithm is considered to be converged when the change of objective value between consecutive iterations drops below tol.
  • display::Symbol: the level of information to be displayed. It may take one of the following values:
    • :none: nothing is shown
    • :final: only shows a brief summary when the algorithm ends
    • :iter: shows the progress at each iteration

Results

A clustering function would return an object (typically, an instance of some ClusteringResult subtype) that contains both the resulting clustering (e.g. assignments of points to the clusters) and the information about the clustering algorithm (e.g. the number of iterations and whether it converged).

The following generic methods are supported by any subtype of ClusteringResult:

StatsBase.countsMethod
counts(R::ClusteringResult) -> Vector{Int}

Get the vector of cluster sizes.

counts(R)[k] is the number of points assigned to the $k$-th cluster.

source
Clustering.wcountsMethod
wcounts(R::ClusteringResult) -> Vector{Float64}
wcounts(R::FuzzyCMeansResult) -> Vector{Float64}

Get the weighted cluster sizes as the sum of weights of points assigned to each cluster.

For non-weighted clusterings assumes the weight of every data point is 1.0, so the result is equivalent to convert(Vector{Float64}, counts(R)).

source
Clustering.assignmentsMethod
assignments(R::ClusteringResult) -> Vector{Int}

Get the vector of cluster indices for each point.

assignments(R)[i] is the index of the cluster to which the $i$-th point is assigned.

source