K-medoids

K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.

Clustering.kmedoidsFunction
kmedoids(dist::AbstractMatrix, k::Integer; ...) -> KmedoidsResult

Perform K-medoids clustering of $n$ points into k clusters, given the dist matrix ($n×n$, dist[i, j] is the distance between the j-th and i-th points).

Arguments

  • init (defaults to :kmpp): how medoids should be initialized, could be one of the following:
    • a Symbol indicating the name of a seeding algorithm (see Seeding for a list of supported methods).
    • an integer vector of length k that provides the indices of points to use as initial medoids.
  • maxiter, tol, display: see common options

Note

The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

source
Clustering.kmedoids!Function
kmedoids!(dist::AbstractMatrix, medoids::Vector{Int};
          [kwargs...]) -> KmedoidsResult

Update the current cluster medoids using the dist matrix.

The medoids field of the returned KmedoidsResult points to the same array as medoids argument.

See kmedoids for the description of optional kwargs.

source
Clustering.KmedoidsResultType
KmedoidsResult{T} <: ClusteringResult

The output of kmedoids function.

Fields

  • medoids::Vector{Int}: the indices of $k$ medoids
  • assignments::Vector{Int}: the indices of clusters the points are assigned to, so that medoids[assignments[i]] is the index of the medoid for the $i$-th point
  • costs::Vector{T}: assignment costs, i.e. costs[i] is the cost of assigning $i$-th point to its medoid
  • counts::Vector{Int}: cluster sizes
  • totalcost::Float64: total assignment cost (the sum of costs)
  • iterations::Int: the number of executed algorithm iterations
  • converged::Bool: whether the procedure converged
source

References

  1. Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
  2. Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16