# Divisive Hierarchical Clustering Algorithm

A Divisive Hierarchical Clustering Algorithm is an Hierarchical Clustering Algorithm in which *all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy*.

**AKA:**Top-Down Hierarchical Clustering Algorithm.**Example(s):****Counter-Example(s):****See:**Unsupervised Machine Learning Algorithm, Clustering Algorithm.

## References

### 2019a

- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Hierarchical_clustering Retrieved:2019-5-19.
- In data mining and statistics,
**hierarchical clustering**(also called**hierarchical cluster analysis**or**HCA**) is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:^{[1]}**Agglomerative**: This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.**Divisive**: This is a "top-down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

- In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
The standard algorithm for

**hierarchical agglomerative clustering**(HAC) has a time complexity of [math] \mathcal{O}(n^3) [/math] and requires [math] \mathcal{O}(n^2) [/math] memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity [math] \mathcal{O}(n^2) [/math] ) are known: SLINKK^{[2]}for single-linkage and CLINK^{[3]}for complete-linkage clustering. With a heap the runtime of the general case can be reduced to [math] \mathcal{O}(n^2 \log n) [/math] at the cost of further increasing the memory requirements. In many programming languages, the memory overheads of this approach are too large to make it practically usable.Except for the special case of single-linkage, none of the algorithms (except exhaustive search in [math] \mathcal{O}(2^n) [/math] ) can be guaranteed to find the optimum solution.

Divisive clustering with an exhaustive search is [math] \mathcal{O}(2^n) [/math] , but it is common to use faster heuristics to choose splits, such as k-means.

- In data mining and statistics,

- ↑ Rokach, Lior, and Oded Maimon. "Clustering methods." Data mining and knowledge discovery handbook. Springer US, 2005. 321-352.
- ↑ R. Sibson (1973). "SLINK: an optimally efficient algorithm for the single-link cluster method" (PDF). The Computer Journal. British Computer Society. 16 (1): 30–34. doi:10.1093/comjnl/16.1.30.
- ↑ D. Defays (1977). "An efficient algorithm for a complete-link method". The Computer Journal. British Computer Society. 20 (4): 364–366. doi:10.1093/comjnl/20.4.364.

### 2019b

- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/Hierarchical_clustering#Divisive_clustering Retrieved:2019-5-19.
- The basic principle of divisive clustering was published as the DIANA (DIvisive ANAlysis Clustering) algorithm.
^{[1]}Initially, all data is in the same cluster, and the largest cluster is split until every object is separate.Because there exist [math] O(2^n) [/math] ways of splitting each cluster, heuristics are needed. DIANA chooses the object with the maximum average dissimilarity and then moves all objects to this cluster that are more similar to the new cluster than to the remainder.

- The basic principle of divisive clustering was published as the DIANA (DIvisive ANAlysis Clustering) algorithm.

- ↑ Kaufman, L., & Roussew, P. J. (1990). Finding Groups in Data - An Introduction to Cluster Analysis. A Wiley-Science Publication John Wiley & Sons.

### 2016

- (Saket J & Pandya, 2016) ⇒ Swarndeep Saket J, and Dr Sharnil Pandya. (2016). “An Overview of Partitioning Algorithms in Clustering Techniques.” In: International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 5(6). ISSN:2278 -1323
- QUOTE: Hierarchical clustering method seeks to build a tree-based hierarchical taxonomy from a set of unlabeled data. This grouping process is represented in the form of dendrogram. It can be analyzed with the help of statistical method. There are two types of hierarchical clustering methods. They are 1) Agglomerative Hierarchical Clustering and 2) Divisive Clustering [7]. In the agglomerative approach which is also known as ‘bottom up approach’, Hierarchical algorithms always result into what is called ‘nested set of partitions’. They are called hierarchical because of their structure they represent about the dataset. Divisive and Agglomerative strategies are two important strategies of hierarchical clustering. In case of divisive approach, popularly known as ‘top down approach’, ‘all data points are considered as a single cluster and splitted into a number of clusters based on certain criteria [8]. Examples for such algorithm are BRICH (Balance Iterative Reducing and Clustering using Hierarchies), CURE (Cluster Using Representatives). The most important weakness of hierarchical clustering technique is that it does not scale properly because of time complexity. In addition to this, it is difficult to alter ones the process of analysis has already started.

### 2008

- (Manning et al., 2008) ⇒ Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. (2008). "Hierarchical agglomerative clustering". In: "Introduction to Information Retrieval".
- QUOTE: Hierarchical clustering algorithms are either top-down or bottom-up. [Bottom-Up Hierarchical Clustering Algorithm|Bottom-up algorithm]]s treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC . Top-down clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached. See Section 17.6 . HAC is more frequently used in IR than top-down clustering and is the main subject of this chapter.