Fuzzy C-means

Fuzzy C-means is a clustering method that provides cluster membership weights instead of "hard" classification (e.g. K-means).

From a mathematical standpoint, fuzzy C-means solves the following optimization problem:

\[\arg\min_\mathcal{C} \ \sum_{i=1}^n \sum_{j=1}^C w_{ij}^\mu \| \mathbf{x}_i - \mathbf{c}_j \|^2, \ \text{where}\ w_{ij} = \left(\sum_{k=1}^{C} \left(\frac{\left\|\mathbf{x}_i - \mathbf{c}_j \right\|}{\left\|\mathbf{x}_i - \mathbf{c}_k \right\|}\right)^{\frac{2}{\mu-1}}\right)^{-1}\]

Here, $\mathbf{c}_j$ is the center of the $j$-th cluster, $w_{ij}$ is the membership weight of the $i$-th point in the $j$-th cluster, and $\mu > 1$ is a user-defined fuzziness parameter.

fuzzy_cmeans(data::AbstractMatrix, C::Integer, fuzziness::Real;
             [dist_metric::SemiMetric], [...]) -> FuzzyCMeansResult

Perform Fuzzy C-means clustering over the given data.


  • data::AbstractMatrix: $d×n$ data matrix. Each column represents one $d$-dimensional data point.
  • C::Integer: the number of fuzzy clusters, $2 ≤ C < n$.
  • fuzziness::Real: clusters fuzziness ($μ$ in the mathematical formulation), $μ > 1$.

Optional keyword arguments:

  • dist_metric::SemiMetric (defaults to Euclidean): the SemiMetric object that defines the distance between the data points
  • maxiter, tol, display, rng: see common options

The output of fuzzy_cmeans function.


  • centers::Matrix{T}: the $d×C$ matrix with columns being the centers of resulting fuzzy clusters
  • weights::Matrix{Float64}: the $n×C$ matrix of assignment weights ($\mathrm{weights}_{ij}$ is the weight (probability) of assigning $i$-th point to the $j$-th cluster)
  • iterations::Int: the number of executed algorithm iterations
  • converged::Bool: whether the procedure converged
wcounts(R::ClusteringResult) -> Vector{Float64}
wcounts(R::FuzzyCMeansResult) -> Vector{Float64}

Get the weighted cluster sizes as the sum of weights of points assigned to each cluster.

For non-weighted clusterings assumes the weight of every data point is 1.0, so the result is equivalent to convert(Vector{Float64}, counts(R)).



using Clustering

# make a random dataset with 1000 points
# each point is a 5-dimensional vector
X = rand(5, 1000)

# performs Fuzzy C-means over X, trying to group them into 3 clusters
# with a fuzziness factor of 2. Set maximum number of iterations to 200
# set display to :iter, so it shows progressive info at each iteration
R = fuzzy_cmeans(X, 3, 2, maxiter=200, display=:iter)

# get the centers (i.e. weighted mean vectors)
# M is a 5x3 matrix
# M[:, k] is the center of the k-th cluster
M = R.centers

# get the point memberships over all the clusters
# memberships is a 20x3 matrix
memberships = R.weights
1000×3 Matrix{Float64}:
 0.333511  0.331566  0.334923
 0.331964  0.334103  0.333933
 0.333935  0.333171  0.332895
 0.333608  0.332892  0.3335
 0.331643  0.334653  0.333704
 0.334142  0.329488  0.33637
 0.335529  0.330078  0.334393
 0.333967  0.334117  0.331916
 0.334097  0.331411  0.334492
 0.333072  0.333008  0.33392
 0.331458  0.337622  0.33092
 0.331155  0.337462  0.331383
 0.333808  0.33039   0.335802
 0.333463  0.333101  0.333436
 0.334747  0.33208   0.333173
 0.331322  0.337339  0.331339
 0.333858  0.3323    0.333842
 0.33344   0.333727  0.332833
 0.332742  0.33511   0.332148