Fuzzy C-means
Fuzzy C-means is a clustering method that provides cluster membership weights instead of "hard" classification (e.g. K-means).
From a mathematical standpoint, fuzzy C-means solves the following optimization problem:
\[\arg\min_\mathcal{C} \ \sum_{i=1}^n \sum_{j=1}^C w_{ij}^\mu \| \mathbf{x}_i - \mathbf{c}_j \|^2, \ \text{where}\ w_{ij} = \left(\sum_{k=1}^{C} \left(\frac{\left\|\mathbf{x}_i - \mathbf{c}_j \right\|}{\left\|\mathbf{x}_i - \mathbf{c}_k \right\|}\right)^{\frac{2}{\mu-1}}\right)^{-1}\]
Here, $\mathbf{c}_j$ is the center of the $j$-th cluster, $w_{ij}$ is the membership weight of the $i$-th point in the $j$-th cluster, and $\mu > 1$ is a user-defined fuzziness parameter.
Clustering.fuzzy_cmeans
— Functionfuzzy_cmeans(data::AbstractMatrix, C::Integer, fuzziness::Real;
[dist_metric::SemiMetric], [...]) -> FuzzyCMeansResult
Perform Fuzzy C-means clustering over the given data
.
Arguments
data::AbstractMatrix
: $d×n$ data matrix. Each column represents one $d$-dimensional data point.C::Integer
: the number of fuzzy clusters, $2 ≤ C < n$.fuzziness::Real
: clusters fuzziness ($μ$ in the mathematical formulation), $μ > 1$.
Optional keyword arguments:
dist_metric::SemiMetric
(defaults toEuclidean
): theSemiMetric
object that defines the distance between the data pointsmaxiter
,tol
,display
,rng
: see common options
Clustering.FuzzyCMeansResult
— TypeFuzzyCMeansResult{T<:AbstractFloat}
The output of fuzzy_cmeans
function.
Fields
centers::Matrix{T}
: the $d×C$ matrix with columns being the centers of resulting fuzzy clustersweights::Matrix{Float64}
: the $n×C$ matrix of assignment weights ($\mathrm{weights}_{ij}$ is the weight (probability) of assigning $i$-th point to the $j$-th cluster)iterations::Int
: the number of executed algorithm iterationsconverged::Bool
: whether the procedure converged
Clustering.wcounts
— Functionwcounts(R::ClusteringResult) -> Vector{Float64}
wcounts(R::FuzzyCMeansResult) -> Vector{Float64}
Get the weighted cluster sizes as the sum of weights of points assigned to each cluster.
For non-weighted clusterings assumes the weight of every data point is 1.0, so the result is equivalent to convert(Vector{Float64}, counts(R))
.
Examples
using Clustering
# make a random dataset with 1000 points
# each point is a 5-dimensional vector
X = rand(5, 1000)
# performs Fuzzy C-means over X, trying to group them into 3 clusters
# with a fuzziness factor of 2. Set maximum number of iterations to 200
# set display to :iter, so it shows progressive info at each iteration
R = fuzzy_cmeans(X, 3, 2, maxiter=200, display=:iter)
# get the centers (i.e. weighted mean vectors)
# M is a 5x3 matrix
# M[:, k] is the center of the k-th cluster
M = R.centers
# get the point memberships over all the clusters
# memberships is a 20x3 matrix
memberships = R.weights
1000×3 Matrix{Float64}:
0.333511 0.331566 0.334923
0.331964 0.334103 0.333933
0.333935 0.333171 0.332895
0.333608 0.332892 0.3335
0.331643 0.334653 0.333704
0.334142 0.329488 0.33637
0.335529 0.330078 0.334393
0.333967 0.334117 0.331916
0.334097 0.331411 0.334492
0.333072 0.333008 0.33392
⋮
0.331458 0.337622 0.33092
0.331155 0.337462 0.331383
0.333808 0.33039 0.335802
0.333463 0.333101 0.333436
0.334747 0.33208 0.333173
0.331322 0.337339 0.331339
0.333858 0.3323 0.333842
0.33344 0.333727 0.332833
0.332742 0.33511 0.332148