Partitioning Around Medoids (PAM) Algorithm

From GM-RKB
Jump to navigation Jump to search

A Partitioning Around Medoids (PAM) Algorithm is a distance-based k-Medoids clustering algorithm that aims to partition a dataset into clusters by minimizing the sum of the dissimilarities between objects labeled to be in a cluster and a point designated as the center of that cluster, known as a medoid.

  • Context:
    • It can be seen as a robust alternative to k-Means clustering, which minimizes the sum of squared distances to the cluster centers, making PAM more resistant to noise and outliers.
    • It can (typically) use a greedy algorithm for the initial selection of medoids and subsequent iterative improvement of the clustering by minimizing the total dissimilarity.
    • It can (often) involve two key phases: the "BUILD" phase for initial medoid selection and the "SWAP" phase for iteratively improving the cluster quality.
    • It can be computationally expensive for large datasets due to its initial runtime complexity of \(O(k(n-k)^2)\), although optimized versions such as FastPAM and FasterPAM significantly reduce this complexity.
  • Example(s):
    • ...
  • Counter-Example(s):
  • See: Greedy Algorithm, k-Medoids, Clustering Algorithm, Outlier.


References

2024

  • (Wikipedia, 2024) ⇒ https://en.wikipedia.org/wiki/k-medoids#Partitioning_Around_Medoids Retrieved:2024-3-22.
    • PAM[1] uses a greedy search which may not find the optimum solution, but it is faster than exhaustive search. It works as follows:
      1. (BUILD) Initialize: greedily select of the data points as the medoids to minimize the cost
      2. Associate each data point to the closest medoid.
      3. (SWAP) While the cost of the configuration decreases:
        1. For each medoid , and for each non-medoid data point :
          1. Consider the swap of and , and compute the cost change
          2. If the cost change is the current best, remember this m and o combination
        2. Perform the best swap of [math]\displaystyle{ m_{\text{best}} }[/math] and [math]\displaystyle{ o_{\text{best}} }[/math] , if it decreases the cost function. Otherwise, the algorithm terminates.
    • The runtime complexity of the original PAM algorithm per iteration of (3) is [math]\displaystyle{ O(k (n-k)^2) }[/math] , by only computing the change in cost. A naive implementation recomputing the entire cost function every time will be in [math]\displaystyle{ O(n^2k^2) }[/math] . This runtime can be further reduced to [math]\displaystyle{ O(n^2) }[/math] , by splitting the cost change into three parts such that computations can be shared or avoided (FastPAM). The runtime can further be reduced by eagerly performing swaps (FasterPAM), at which point a random initialization becomes a viable alternative to BUILD.
  1. Cite error: Invalid <ref> tag; no text was provided for refs named :2