K-medoids

K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.

Clustering.kmedoids — Function

kmedoids(dist::AbstractMatrix, k::Integer; ...) -> KmedoidsResult

Perform K-medoids clustering of $n$ points into k clusters, given the dist matrix ($n×n$, dist[i, j] is the distance between the j-th and i-th points).

Arguments

init (defaults to :kmpp): how medoids should be initialized, could be one of the following:
- a Symbol indicating the name of a seeding algorithm (see Seeding for a list of supported methods).
- an integer vector of length k that provides the indices of points to use as initial medoids.
maxiter, tol, display: see common options

Note

The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

source

Clustering.kmedoids! — Function

kmedoids!(dist::AbstractMatrix, medoids::Vector{Int};
          [kwargs...]) -> KmedoidsResult

Update the current cluster medoids using the dist matrix.

The medoids field of the returned KmedoidsResult points to the same array as medoids argument.

See kmedoids for the description of optional kwargs.

source

Clustering.KmedoidsResult — Type

KmedoidsResult{T} <: ClusteringResult

The output of kmedoids function.

Fields

medoids::Vector{Int}: the indices of $k$ medoids
assignments::Vector{Int}: the indices of clusters the points are assigned to, so that medoids[assignments[i]] is the index of the medoid for the $i$-th point
costs::Vector{T}: assignment costs, i.e. costs[i] is the cost of assigning $i$-th point to its medoid
counts::Vector{Int}: cluster sizes
totalcost::Float64: total assignment cost (the sum of costs)
iterations::Int: the number of executed algorithm iterations
converged::Bool: whether the procedure converged

source

References

Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16