Fuzzy C-means

Fuzzy C-means is a clustering method that provides cluster membership weights instead of "hard" classification (e.g. K-means).

From a mathematical standpoint, fuzzy C-means solves the following optimization problem:

\[\arg\min_\mathcal{C} \ \sum_{i=1}^n \sum_{j=1}^C w_{ij}^\mu \| \mathbf{x}_i - \mathbf{c}_j \|^2, \ \text{where}\ w_{ij} = \left(\sum_{k=1}^{C} \left(\frac{\left\|\mathbf{x}_i - \mathbf{c}_j \right\|}{\left\|\mathbf{x}_i - \mathbf{c}_k \right\|}\right)^{\frac{2}{\mu-1}}\right)^{-1}\]

Here, $\mathbf{c}_j$ is the center of the $j$-th cluster, $w_{ij}$ is the membership weight of the $i$-th point in the $j$-th cluster, and $\mu > 1$ is a user-defined fuzziness parameter.

Clustering.fuzzy_cmeansFunction
fuzzy_cmeans(data::AbstractMatrix, C::Integer, fuzziness::Real;
             [dist_metric::SemiMetric], [...]) -> FuzzyCMeansResult

Perform Fuzzy C-means clustering over the given data.

Arguments

  • data::AbstractMatrix: $d×n$ data matrix. Each column represents one $d$-dimensional data point.
  • C::Integer: the number of fuzzy clusters, $2 ≤ C < n$.
  • fuzziness::Real: clusters fuzziness ($μ$ in the mathematical formulation), $μ > 1$.

Optional keyword arguments:

  • dist_metric::SemiMetric (defaults to Euclidean): the SemiMetric object that defines the distance between the data points
  • maxiter, tol, display, rng: see common options
source
Clustering.FuzzyCMeansResultType
FuzzyCMeansResult{T<:AbstractFloat}

The output of fuzzy_cmeans function.

Fields

  • centers::Matrix{T}: the $d×C$ matrix with columns being the centers of resulting fuzzy clusters
  • weights::Matrix{Float64}: the $n×C$ matrix of assignment weights ($\mathrm{weights}_{ij}$ is the weight (probability) of assigning $i$-th point to the $j$-th cluster)
  • iterations::Int: the number of executed algorithm iterations
  • converged::Bool: whether the procedure converged
source
Clustering.wcountsFunction
wcounts(R::ClusteringResult) -> Vector{Float64}
wcounts(R::FuzzyCMeansResult) -> Vector{Float64}

Get the weighted cluster sizes as the sum of weights of points assigned to each cluster.

For non-weighted clusterings assumes the weight of every data point is 1.0, so the result is equivalent to convert(Vector{Float64}, counts(R)).

source

Examples

using Clustering

# make a random dataset with 1000 points
# each point is a 5-dimensional vector
X = rand(5, 1000)

# performs Fuzzy C-means over X, trying to group them into 3 clusters
# with a fuzziness factor of 2. Set maximum number of iterations to 200
# set display to :iter, so it shows progressive info at each iteration
R = fuzzy_cmeans(X, 3, 2, maxiter=200, display=:iter)

# get the centers (i.e. weighted mean vectors)
# M is a 5x3 matrix
# M[:, k] is the center of the k-th cluster
M = R.centers

# get the point memberships over all the clusters
# memberships is a 20x3 matrix
memberships = R.weights
1000×3 Matrix{Float64}:
 0.331547  0.3372    0.331253
 0.334966  0.331843  0.333191
 0.332288  0.333973  0.333739
 0.335377  0.3303    0.334323
 0.331942  0.333481  0.334577
 0.330755  0.336604  0.332641
 0.336519  0.329899  0.333582
 0.330629  0.33727   0.332101
 0.33612   0.329477  0.334403
 0.33436   0.333144  0.332497
 ⋮                   
 0.332021  0.333548  0.334431
 0.334237  0.330794  0.334968
 0.336422  0.329061  0.334517
 0.334682  0.33261   0.332707
 0.334882  0.331571  0.333548
 0.332936  0.33261   0.334454
 0.332097  0.334498  0.333406
 0.328874  0.339566  0.33156
 0.335696  0.33333   0.330973