K-medoids
K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.
Clustering.kmedoids — Functionkmedoids(dist::AbstractMatrix, k::Integer; ...) -> KmedoidsResultPerform K-medoids clustering of $n$ points into k clusters, given the dist matrix ($n×n$, dist[i, j] is the distance between the j-th and i-th points).
Arguments
init(defaults to:kmpp): how medoids should be initialized, could be one of the following:- a
Symbolindicating the name of a seeding algorithm (see Seeding for a list of supported methods). - an integer vector of length
kthat provides the indices of points to use as initial medoids.
- a
maxiter,tol,display: see common options
Note
The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).
Clustering.kmedoids! — Functionkmedoids!(dist::AbstractMatrix, medoids::Vector{Int};
[kwargs...]) -> KmedoidsResultUpdate the current cluster medoids using the dist matrix.
The medoids field of the returned KmedoidsResult points to the same array as medoids argument.
See kmedoids for the description of optional kwargs.
Clustering.KmedoidsResult — TypeKmedoidsResult{T} <: ClusteringResultThe output of kmedoids function.
Fields
medoids::Vector{Int}: the indices of $k$ medoidsassignments::Vector{Int}: the indices of clusters the points are assigned to, so thatmedoids[assignments[i]]is the index of the medoid for the $i$-th pointcosts::Vector{T}: assignment costs, i.e.costs[i]is the cost of assigning $i$-th point to its medoidcounts::Vector{Int}: cluster sizestotalcost::Float64: total assignment cost (the sum ofcosts)iterations::Int: the number of executed algorithm iterationsconverged::Bool: whether the procedure converged
References
- Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
- Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16