K-medoids

# K-medoids

K-medoids is a clustering algorithm that works by finding \$k\$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.

``kmedoids(dist::AbstractMatrix, k::Integer; ...) -> KmedoidsResult``

Perform K-medoids clustering of \$n\$ points into `k` clusters, given the `dist` matrix (\$n×n\$, `dist[i, j]` is the distance between the `j`-th and `i`-th points).

Arguments

• `init` (defaults to `:kmpp`): how medoids should be initialized, could be one of the following:
• a `Symbol` indicating the name of a seeding algorithm (see Seeding for a list of supported methods).
• an integer vector of length `k` that provides the indices of points to use as initial medoids.
• `maxiter`, `tol`, `display`: see common options

Note

The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

source
``````kmedoids!(dist::AbstractMatrix, medoids::Vector{Int};
[kwargs...]) -> KmedoidsResult``````

Update the current cluster `medoids` using the `dist` matrix.

The `medoids` field of the returned `KmedoidsResult` points to the same array as `medoids` argument.

See `kmedoids` for the description of optional `kwargs`.

source
``KmedoidsResult{T} <: ClusteringResult``

The output of `kmedoids` function.

Fields

• `medoids::Vector{Int}`: the indices of \$k\$ medoids
• `assignments::Vector{Int}`: the indices of clusters the points are assigned to, so that `medoids[assignments[i]]` is the index of the medoid for the \$i\$-th point
• `costs::Vector{T}`: assignment costs, i.e. `costs[i]` is the cost of assigning \$i\$-th point to its medoid
• `counts::Vector{Int}`: cluster sizes
• `totalcost::Float64`: total assignment cost (the sum of `costs`)
• `iterations::Int`: the number of executed algorithm iterations
• `converged::Bool`: whether the procedure converged
source

## References

1. Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
2. Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16