K-medoids
K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.
Clustering.kmedoids
— Functionkmedoids(dist::AbstractMatrix, k::Integer; ...) -> KmedoidsResult
Perform K-medoids clustering of $n$ points into k
clusters, given the dist
matrix ($n×n$, dist[i, j]
is the distance between the j
-th and i
-th points).
Arguments
init
(defaults to:kmpp
): how medoids should be initialized, could be one of the following:- a
Symbol
indicating the name of a seeding algorithm (see Seeding for a list of supported methods). - an integer vector of length
k
that provides the indices of points to use as initial medoids.
- a
maxiter
,tol
,display
: see common options
Note
The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).
Clustering.kmedoids!
— Functionkmedoids!(dist::AbstractMatrix, medoids::Vector{Int};
[kwargs...]) -> KmedoidsResult
Update the current cluster medoids
using the dist
matrix.
The medoids
field of the returned KmedoidsResult
points to the same array as medoids
argument.
See kmedoids
for the description of optional kwargs
.
Clustering.KmedoidsResult
— TypeKmedoidsResult{T} <: ClusteringResult
The output of kmedoids
function.
Fields
medoids::Vector{Int}
: the indices of $k$ medoidsassignments::Vector{Int}
: the indices of clusters the points are assigned to, so thatmedoids[assignments[i]]
is the index of the medoid for the $i$-th pointcosts::Vector{T}
: assignment costs, i.e.costs[i]
is the cost of assigning $i$-th point to its medoidcounts::Vector{Int}
: cluster sizestotalcost::Float64
: total assignment cost (the sum ofcosts
)iterations::Int
: the number of executed algorithm iterationsconverged::Bool
: whether the procedure converged
References
- Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
- Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16