Density-based Clustering Algorithm: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
 
m (Text replacement - "]]↵----" to "]]. ----")
 
(36 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A [[Density-based Clustering Algorithm]] is a [[clustering algorithm]] that define areas of higher density than the remainder of the data set.  
A [[Density-based Clustering Algorithm]] is a [[clustering algorithm]] that define areas of higher density than the remainder of the data set.
* <B>Example(s):</B>
* <B>Example(s):</B>
** a [[DBSCAN Algorithm]].
** [[DBSCAN Algorithm]].
** [[OPTICS Algorithm]].
** …
* <B>Counter-Example(s):</B>
* <B>Counter-Example(s):</B>
** a [[Hierarchical Clustering Algorithm]].
** a [[Hierarchical Clustering Algorithm]], such as a [[Partitional Clustering Algorithm]].
** a [[Partitional Clustering Algorithm]].
* <B>See:</B> [[Density Estimation]], [[Subspace Clustering Algorithm]].
* <B><U>See:</U></B> [[Density Estimation]], [[Subspace Clustering Algorithm]]
 
----
----
----
----
==References==
 
== References ==
* http://scholar.google.com/scholar?q=Density-based+clustering
* http://scholar.google.com/scholar?q=Density-based+clustering


===2013===
=== 2013 ===
* http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering
* http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering
** In density-based clustering,<ref>{{Cite journal
** In density-based clustering,<ref>{{Cite journal
Line 23: Line 26:
  | url = http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WIDM30.html
  | url = http://wires.wiley.com/WileyCDA/WiresArticle/wisId-WIDM30.html
  | doi = 10.1002/widm.30
  | doi = 10.1002/widm.30
}}</ref> clusters are defined as areas of higher density than the remainder of the data set. Objects in these sparse areas - that are required to separate clusters - are usually considered to be noise and border points. <P>   The most popular<ref>[http://academic.research.microsoft.com/CSDirectory/paper_category_7.htm Microsoft academic search: most cited data mining articles]: DBSCAN is on rank 24, when accessed on: 4/18/2010</ref> density based clustering method is [[DBSCAN]].<ref>{{Cite conference | author = Martin Ester, [[Hans-Peter Kriegel]], Jörg Sander, Xiaowei Xu | title = A density-based algorithm for discovering clusters in large spatial databases with noise | pages = 226–231 | editors = Evangelos Simoudis, Jiawei Han, Usama M. Fayyad | booktitle = Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) | publisher = [[AAAI Press]] | year = 1996 | isbn = 1-57735-004-9 | url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980}}</ref> In contrast to many newer methods, it features a well-defined cluster model called "density-reachability". Similar to linkage based clustering, it is based on connecting points within certain distance thresholds. However, it only connects points that satisfy a density criterion, in the original variant defined as a minimum number of other objects within this radius. A cluster consists of all density-connected objects (which can form a cluster of an arbitrary shape, in contrast to many other methods) plus all objects that are within these objects' range. Another interesting property of DBSCAN is that its complexity is fairly low - it requires a linear number of range queries on the database - and that it will discover essentially the same results (it is [[deterministic algorithm|deterministic]] for core and noise points, but not for border points) in each run, therefore there is no need to run it multiple times. [[OPTICS algorithm|OPTICS]]<ref>{{Cite conference
}}</ref> clusters are defined as areas of higher density than the remainder of the data set. Objects in these sparse areas - that are required to separate clusters - are usually considered to be noise and border points.       <P>           The most popular<ref>[http://academic.research.microsoft.com/CSDirectory/paper_category_7.htm Microsoft academic search: most cited data mining articles]: DBSCAN is on rank 24, when accessed on: 4/18/2010</ref> density based clustering method is [[DBSCAN]].<ref>{{Cite conference | author = [[Martin Ester]], [[Hans-Peter Kriegel]], Jörg Sander, Xiaowei Xu | title = A density-based algorithm for discovering clusters in large spatial databases with noise | pages = 226–231 | editors = Evangelos Simoudis, [[Jiawei Han]], Usama M. Fayyad | booktitle = Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) | publisher = [[AAAI Press]] | year = 1996 | isbn = 1-57735-004-9 | url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.1980}}</ref> In contrast to many newer methods, it features a well-defined cluster model called "density-reachability". Similar to linkage based clustering, it is based on connecting points within certain distance thresholds. However, it only connects points that satisfy a density criterion, in the original variant defined as a minimum number of other objects within this radius. A cluster consists of all density-connected objects (which can form a cluster of an arbitrary shape, in contrast to many other methods) plus all objects that are within these objects' range. Another interesting property of DBSCAN is that its complexity is fairly low - it requires a linear number of range queries on the database - and that it will discover essentially the same results (it is [[deterministic algorithm|deterministic]] for core and noise points, but not for border points) in each run, therefore there is no need to run it multiple times. [[OPTICS algorithm|OPTICS]]<ref>{{Cite conference
  | author = Mihael Ankerst, Markus M. Breunig, [[Hans-Peter Kriegel]], Jörg Sander
  | author = Mihael Ankerst, Markus M. Breunig, [[Hans-Peter Kriegel]], Jörg Sander
  | title = OPTICS: Ordering Points To Identify the Clustering Structure
  | title = OPTICS: Ordering Points To Identify the Clustering Structure
  | year = 1999
  | year = 1999
  | pages = 49–60
  | pages = 49–60
  | booktitle = ACM SIGMOD international conference on Management of data
  | booktitle = ACM SIGMOD International Conference on Management of data
  | publisher = [[ACM Press]]
  | publisher = [[ACM Press]]
  | url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.6542
  | url = http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.6542
}}</ref> is a generalization of DBSCAN that removes the need to choose an appropriate value for the range parameter <math>\varepsilon</math>, and produces a hierarchical result related to that of [[hierarchical clustering|linkage clustering]]. DeLi-Clu,<ref>{{cite doi | 10.1007/11731139_16}}</ref> Density-Link-Clustering combines ideas from [[single-linkage clustering]] and OPTICS, eliminating the <math>\varepsilon</math> parameter entirely and offering performance improvements over OPTICS by using an [[R-tree]] index.   <P>   The key drawback of [[DBSCAN]] and [[OPTICS]] is that they expect some kind of density drop to detect cluster borders. Moreover they can not detect intrinsic cluster structures which are prevalent in the majority of real life data. A variation of DBSCAN, [[EnDBSCAN algorithm|EnDBSCAN]]<ref>{{Cite conference
}}</ref> is a generalization of DBSCAN that removes the need to choose an appropriate value for the range parameter <math>\varepsilon</math>, and produces a hierarchical result related to that of [[hierarchical clustering|linkage clustering]]. DeLi-Clu,<ref>{{cite doi | 10.1007/11731139_16}}</ref> Density-Link-Clustering combines ideas from [[single-linkage clustering]] and OPTICS, eliminating the <math>\varepsilon</math> parameter entirely and offering performance improvements over OPTICS by using an [[R-tree]] index.       <P>           The key drawback of [[DBSCAN]] and [[OPTICS]] is that they expect some kind of density drop to detect cluster borders. Moreover they can not detect intrinsic cluster structures which are prevalent in the majority of real life data. A variation of DBSCAN, [[EnDBSCAN algorithm|EnDBSCAN]]<ref>{{Cite conference
  | author = S Roy, D K Bhattacharyya
  | author = S Roy, D K Bhattacharyya
  | title = An Approach to find Embedded Clusters Using Density Based Techniques
  | title = An Approach to find Embedded Clusters Using Density Based Techniques
Line 40: Line 43:
<references/>
<references/>


===2011===
=== 2011 ===
* ([[Sander, 2011]]) &rArr; Joerg Sander. (2011). "Density-Based Clustering." In: ([[Sammut & Webb, 2011]]) p.270
* ([[Sander, 2011]]) Joerg Sander. ([[2011]]). “Density-Based Clustering.In: ([[Sammut & Webb, 2011]]) p.270


===1996===
=== 1996 ===
* ([[Ester & al, 1996]]) &rArr; [[Martin Ester]], [[Hans-Peter Kriegel]], Jorg Sander, and [[Xiaowei Xu]]. (1996). "[http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf A Density-based Algorithm for Discovering Clusters in Large Spatial Database with Noise]." In: International Conference on Knowledge Discovery in Databases and Data Mining ([[KDD 1996]]).
* ([[Ester et al., 1996]]) [[Martin Ester]], [[Hans-Peter Kriegel]], Jorg Sander, and [[Xiaowei Xu]]. ([[1996]]). [http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf A Density-based Algorithm for Discovering Clusters in Large Spatial Database with Noise].In: Proceedings of the International Conference on Knowledge Discovery in Databases and Data Mining ([[KDD 1996]]).
** <U>QUOTE</U>: ... Jain (1988) explores a density based approach to identify clusters in k-dimensional point sets. ...
** QUOTE: Jain (1988) explores a density based approach to identify clusters in k-dimensional point sets.


===1988===
=== 1988 ===
* ([[Jain, 1988]]) &rArr; Anil K. Jain. (1988). "Algorithms for Clustering Data." Prentice Hall. ISBN:013022278X
* ([[Jain, 1988]]) ⇒ [[Anil K. Jain]]. (1988). “Algorithms for Clustering Data." Prentice Hall. ISBN:013022278X


----
----
__NOTOC__
__NOTOC__
[[Category:Concept]]
[[Category:Concept]]

Latest revision as of 05:18, 28 November 2023

A Density-based Clustering Algorithm is a clustering algorithm that define areas of higher density than the remainder of the data set.



References

2013

  • http://en.wikipedia.org/wiki/Cluster_analysis#Density-based_clustering
    • In density-based clustering,[1] clusters are defined as areas of higher density than the remainder of the data set. Objects in these sparse areas - that are required to separate clusters - are usually considered to be noise and border points.

      The most popular[2] density based clustering method is DBSCAN.[3] In contrast to many newer methods, it features a well-defined cluster model called "density-reachability". Similar to linkage based clustering, it is based on connecting points within certain distance thresholds. However, it only connects points that satisfy a density criterion, in the original variant defined as a minimum number of other objects within this radius. A cluster consists of all density-connected objects (which can form a cluster of an arbitrary shape, in contrast to many other methods) plus all objects that are within these objects' range. Another interesting property of DBSCAN is that its complexity is fairly low - it requires a linear number of range queries on the database - and that it will discover essentially the same results (it is deterministic for core and noise points, but not for border points) in each run, therefore there is no need to run it multiple times. OPTICS[4] is a generalization of DBSCAN that removes the need to choose an appropriate value for the range parameter [math]\displaystyle{ \varepsilon }[/math], and produces a hierarchical result related to that of linkage clustering. DeLi-Clu,[5] Density-Link-Clustering combines ideas from single-linkage clustering and OPTICS, eliminating the [math]\displaystyle{ \varepsilon }[/math] parameter entirely and offering performance improvements over OPTICS by using an R-tree index.

      The key drawback of DBSCAN and OPTICS is that they expect some kind of density drop to detect cluster borders. Moreover they can not detect intrinsic cluster structures which are prevalent in the majority of real life data. A variation of DBSCAN, EnDBSCAN[6] efficiently detects such kinds of structures. On data sets with, for example, overlapping Gaussian distributions - a common use case in artificial data - the cluster borders produced by these algorithms will often look arbitrary, because the cluster density decreases continuously. On a data set consisting of mixtures of Gaussians, these algorithms are nearly always outperformed by methods such as EM clustering, that are able to precisely model this kind of data.

2011

1996

1988