Text-Document Clustering Algorithm: Difference between revisions

Latest revision as of 21:14, 9 May 2024

A Text-Document Clustering Algorithm is a domain-specific clustering algorithm that can be implemented by a text-document clustering system to solve the text-document clustering task.

Context:
- It can (often) make use of a Document Vectorizer.
- It can range from being a Knowledge-based Text Clustering Algorithm (making use of a knowledge base) to being a Knowledge-Free Text Clustering Algorithm.
- It can range from being a Heuristic Text Clustering Algorithm to being a Data-Driven Text Clustering Algorithm.
Example(s):
- A Webpage Clustering Algorithm can be used to cluster webpages into different categories, such as news articles, blog posts, and product reviews.
- A Text Embedding-based Clustering Algorithm can be used to cluster text documents based on the similarity of their word embeddings.
- A Topic Modeling Algorithm can be used to cluster text documents based on the latent topics that they contain.
- …
Counter-Example(s):
See: Information Retrieval Algorithm, Text Classification Task.

References

2009

(Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066

2008

(Li et al., 2008) ⇒ Yanjun Li, Soon M. Chung, and John D. Holt. (2008). “Text Document Clustering Based on Frequent Word Meaning Sequences.” In: Data & Knowledge Engineering 64(1). doi:10.1016/j.datak.2007.08.001

2007

(Recupero, 2007) ⇒ Diego R. Recupero. (2007). “A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations.” In: Information Retrieval (2007) 10:563–579.

2006

(Yoo et al., 2006) ⇒ Illhoi Yoo, Xiaohua Hu, and Il-Yeol Song. (2006). “Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering.” In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2006).

2005

(Ferragina & Gulli, 2005) ⇒ Paolo Ferragina, and Antonio Gulli. (2005). “A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering.” In: Proceedings of International World Wide Web Conference (WWW 2005).
(Surdeanu et al., 2005) ⇒ Mihai Surdeanu, Jordi Turmo, and Alicia Ageno. (2005). “A Hybrid Unsupervised Approach for Document Clustering.” In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge discovery in data mining ([[KDD] 2005]]).
(Zhong & Ghosh, 2005) ⇒ S. Zhong, and Joydeep Ghosh. (2005). “Generative Model-based Document Clustering: A comparative study.” In: Journal of Knowledge and Information Systems, 8(3).

2004

(Sedding and Kazakov, 2004) ⇒ Julian Sedding and Dimitar Kazakov. (2004). “Wordnet-based Text Document Clustering.” In: COLING-2004 Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND).

2003

(Xu et al., 2003) ⇒ Wei Xu, Xin Liu, and Yihong Gong. (2003). “Document Clustering Based on Non-Negative Matrix Factorization.” In: Proceedings of the 26th ACM SIGIR Conference (SIGIR 2003). doi:10.1145/860435.860485
(Funt et al., 2003) ⇒ Benjamin C. M. Fung, Ke Wang, Martin Ester. “Hierarchical Document Clustering using Frequent Itemsets.” In: Proceedings of the SIAM International Conference on Data Mining 2003 (SDM 2003)
(Hotho et al., 2003) ⇒ Andreas Hotho, Steffen Staab, and Gerd Stumme. (2003). “Wordnet Improves Text Document Clustering.” In: Proceedings of the Semantic Web Workshop (at SIGIR 2003).

2002

(Zhao & Karypsis, 2002) ⇒ Ying Zhao, and George Karypis. (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” In: Conference on Information and Knowledge Management (CIKM 2002). doi:10.1145/584792.584877
(Beil et al., 2002) ⇒ Florian Beil, Martin Ester, and Xiaowei Xu. (2002). “Frequent Term-based Text Clustering.” In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). doi:10.1145/775047.775110
(Zhao & Karypsis, 2002) ⇒ Ying Zhao, and George Karypis. (2002). “Evaluation of Hierarchical Clustering Algorithms for Document Datasets.” In: Conference on Information and Knowledge Management.

2001

(Hotho et al., 2001) ⇒ Andreas Hotho, Alexander Maedche, and Steffen Staab. “Ontology-based Text Clustering.” In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.
(Zhao & Karypsis, 2001) ⇒ Ying Zhao, and George Karypis. (2001). “Criterion Functions for Document Clustering: Experiments and analysis." Technical Report TR #01--40, Department of Computer Science, University of Minnesota, Minneapolis, MN.

2000

(Steinbach, 2000) ⇒ Michael Steinbach, George Karypis, and Vipin Kumar. (2000). “A Comparison of Document Clustering Techniques.” In: Proceedings of Workshop at KDD-2000 on Text Mining.
- We use two metrics for evaluating cluster quality: entropy, which provides a measure of “goodness” for un-nested clusters or for the clusters at one level of a hierarchical clustering, and the F-measure, which measures the effectiveness of a hierarchical clustering. (The F measure was recently extended to document hierarchies in [5].)

1999

(Larsen & Aone, 1999) ⇒ Bjornar Larsen, and Chinatsu Aone. (1999). “Fast and Effective Text Mining Using Linear-time Document Clustering.” In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-1999). doi:10.1145/312129.312186

1997

(Schütze & Silverstein, 1997) ⇒ Hinrich Schütze, and Craig Silverstein. (1997). “Projections for Efficient Document Clustering.” In: ACM SIGIR Forum.
Zamir, O., Oren Etzioni, Madani, O., and Karp, R. (1997). “Fast and Intuitive Clustering of Web Documents.” In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.

1992

(Cutting et al, 1992) ⇒ Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey. (1992). “Scatter/Gather: a cluster-based approach to browsing large document collections.” In: Proceedings of the 15th ACM SIGIR Conference retrieval (SIGIR 1992).

@@ Line 1: / Line 1: @@
 A [[Text-Document Clustering Algorithm]] is a [[domain-specific clustering algorithm]] that can be implemented by a [[text-document clustering system]] to solve the [[text-document clustering task]].
-* <B>AKA:</B> [[Document Clustering Algorithm]], [[Text Document Clustering Algorithm]].
+* <B>Context:</B>
-* <B>Context</U>:</B>
+** It can (often) make use of a [[Document Vectorizer]].
-** It can make use of a [[Document Vectorizer]].
+** It can range from being a [[Knowledge-based Text Clustering Algorithm]] (making use of a [[knowledge base]]) to being a [[Knowledge-Free Text Clustering Algorithm]].
-** It can be a [[Knowledge-based Text Clustering Algorithm]] that makes use of a [[knowledge based]])
 ** It can range from being a [[Heuristic Text Clustering Algorithm]] to being a [[Data-Driven Text Clustering Algorithm]].
 * <B>Example(s):</B>
-** a [[Webpage Clustering Algorithm]].
+** A [[Webpage Clustering Algorithm]] can be used to cluster webpages into different categories, such as news articles, blog posts, and product reviews.
+** A [[Text Embedding-based Clustering Algorithm]] can be used to cluster text documents based on the similarity of their word embeddings.
+** A [[Topic Modeling Algorithm]] can be used to cluster text documents based on the latent topics that they contain.
+** …
 * <B>Counter-Example(s):</B>
 ** a [[Word Sense Clustering Algorithm]].
@@ Line 14: / Line 16: @@
 ** a [[Word Clustering Algorithm]].
 * <B>See:</B> [[Information Retrieval Algorithm]], [[Text Classification Task]].
 ----
 ----
-==References ==
-===2009===
+== References ==
-* ([[2009_ExploitingWikipediaAsExternalKn|Hu & al, 1999]]) &rArr; [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). "Exploiting Wikipedia as External Knowledge for Document Clustering." In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD 2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]
-===2008===
+=== 2009 ===
-* ([[2008_TextDocClusteringBasedOnFreqWordMeaningSeq|Li & al, 2008]]) &rArr; Yanjun Li, Soon M. Chung, and John D. Holt. ([[2008]]). "[http://storm.cis.fordham.edu/~yli/documents/publications/CFWMS.pdf Text Document Clustering Based on Frequent Word Meaning Sequences]." In: Data & Knowledge Engineering 64(1). [http://dx.doi.org/10.1016/j.datak.2007.08.001 doi:10.1016/j.datak.2007.08.001]
+* ([[2009_ExploitingWikipediaAsExternalKn|Hu et al., 1999]]) ⇒ [[Xiaohua Hu]], Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. ([[2009]]). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD-2009]]). [http://dx.doi.org/10.1145/1557019.1557066 doi:10.1145/1557019.1557066]
-===2007===
+=== 2008 ===
-* ([[2007_ANewUnsupMethForDocClust|Recupero, 2007]]) &rArr; Diego R. Recupero. ([[2007]]). "[http://dx.doi.org/10.1007/s10791-007-9035-7 A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations]." In: Information Retrieval ([[2007]]) 10:563–579.
+* ([[2008_TextDocClusteringBasedOnFreqWordMeaningSeq|Li et al., 2008]]) ⇒ Yanjun Li, Soon M. Chung, and John D. Holt. ([[2008]]). “[http://storm.cis.fordham.edu/~yli/documents/publications/CFWMS.pdf Text Document Clustering Based on Frequent Word Meaning Sequences].” In: Data & Knowledge Engineering 64(1). [http://dx.doi.org/10.1016/j.datak.2007.08.001 doi:10.1016/j.datak.2007.08.001]
-===2006===
+=== 2007 ===
-* ([[2006_IntegrationOfSemBipGraphRepAndMutRefForBioLitClust|Yoo & al, 2006]]) &rArr; Illhoi Yoo, Xiaohua Hu, and Il-Yeol Song. ([[2006]]). "[http://dx.doi.org/10.1145/1150402.1150505 Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering]." In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD 2006]]).
+* ([[2007_ANewUnsupMethForDocClust|Recupero, 2007]]) ⇒ Diego R. Recupero. ([[2007]]). “[http://dx.doi.org/10.1007/s10791-007-9035-7 A New Unsupervised Method for Document Clustering by using WordNet Lexical and Conceptual Relations].” In: Information Retrieval ([[2007]]) 10:563–579.
-===2005===
+=== 2006 ===
-* ([[2005_APersonalizedSearchEngBasedOnWebSnipHierClust|Ferragina & Gulli, 2005]]) &rArr; Paolo Ferragina, and Antonio Gulli. ([[2005]]). "[http://dx.doi.org/10.1145/1062745.1062760 A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering]." In: Proceedings of International World Wide Web Conference (WWW 2005).
+* ([[2006_IntegrationOfSemBipGraphRepAndMutRefForBioLitClust|Yoo et al., 2006]]) ⇒ Illhoi Yoo, Xiaohua Hu, and Il-Yeol Song. ([[2006]]). “[http://dx.doi.org/10.1145/1150402.1150505 Integration of Semantic-based Bipartite Graph Representation and Mutual Refinement Strategy for Biomedical Literature Clustering].” In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD-2006]]).
-* ([[2005_AHybridUnsupApprForDocClust|Surdeanu & al, 2005]]) &rArr; [[Mihai Surdeanu]], Jordi Turmo, and Alicia Ageno. ([[2005]]). "[http://dx.doi.org/10.1145/1081870.1081957 A Hybrid Unsupervised Approach for Document Clustering]." In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining ([[KDD] 2005]]).
-* (Zhong & Ghosh, 2005) &rArr; S. Zhong, and [[Joydeep Ghosh]]. ([[2005]]). "Generative Model-based Document Clustering: A comparative study." In: [[Journal of Knowledge and Information Systems]], 8(3).
-===2004===
+=== 2005 ===
-* ([[2004_WordnetBasedTextDocClustring|Sedding and Kazakov, 2004]]) &rArr; Julian Sedding and Dimitar Kazakov. (2004). "[http://acl.ldc.upenn.edu/coling2004/W9/pdf/proceedings.pdf#page=86 Wordnet-based Text Document Clustering]." In: COLING-2004 Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND).
+* ([[2005_APersonalizedSearchEngBasedOnWebSnipHierClust|Ferragina & Gulli, 2005]]) ⇒ Paolo Ferragina, and Antonio Gulli. ([[2005]]). “[http://dx.doi.org/10.1145/1062745.1062760 A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering].” In: Proceedings of International World Wide Web Conference (WWW 2005).
+* ([[2005_AHybridUnsupApprForDocClust|Surdeanu et al., 2005]]) ⇒ [[Mihai Surdeanu]], Jordi Turmo, and Alicia Ageno. ([[2005]]). “[http://dx.doi.org/10.1145/1081870.1081957 A Hybrid Unsupervised Approach for Document Clustering].” In: Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge discovery in data mining ([[KDD] 2005]]).
+* (Zhong & Ghosh, 2005) ⇒ S. Zhong, and [[Joydeep Ghosh]]. ([[2005]]). “Generative Model-based Document Clustering: A comparative study.” In: [[Journal of Knowledge and Information System]]s, 8(3).
-===2003===
+=== 2004 ===
-* ([[2003_DocumentClustBasedOnNonNegMatFact|Xu & al, 2003]]) &rArr; Wei Xu, Xin Liu, and Yihong Gong. (2003). "[http://mall.psy.ohio-state.edu/LexicalSemantics/XuLiuGong03.pdf Document Clustering Based on Non-Negative Matrix Factorization]." In: Proceedings of the 26th  [[ACM SIGIR Conference]]  ([[SIGIR 2003]]). [http://dx.doi.org/10.1145/860435.860485 doi:10.1145/860435.860485]
+* ([[2004_WordnetBasedTextDocClustring|Sedding and Kazakov, 2004]]) ⇒ Julian Sedding and Dimitar Kazakov. ([[2004]]). “[http://acl.ldc.upenn.edu/coling2004/W9/pdf/proceedings.pdf#page=86 Wordnet-based Text Document Clustering].” In: COLING-2004 Workshop on Robust Methods in Analysis of Natural Language Data (ROMAND).
-* ([[2003_HierarchicalDocClusteringUsingFreqItemsets|Funt & al, 2003]]) &rArr; Benjamin C. M. Fung, Ke Wang, [[Martin Ester]]. "[http://www.siam.org/proceedings/datamining/2003/dm03_06FungB.pdf Hierarchical Document Clustering using Frequent Itemsets]." In: Proceedings of the SIAM International Conference on Data Mining 2003 ([[SDM 2003]])
-* ([[2003_WordnetImprovesTextDocumentClustering|Hotho & al, 2003]]) &rArr; [[Andreas Hotho]], [[Steffen Staab]], and Gerd Stumme. (2003). "[http://www.uni-koblenz.de/~staab/Research/Publications/hothoetal-ijcaiws2001.pdf Wordnet Improves Text Document Clustering]." In: Proceedings of the Semantic Web Workshop (at [[SIGIR 2003]]).
-===2002===
+=== 2003 ===
-* ([[2002_EvaluationOfHierClustAlgsForDocDatasets|Zhao & Karypsis, 2002]]) &rArr; Ying Zhao, and [[George Karypis]]. (2002). "[http://www.cs.umn.edu/tech_reports_upload/tr2002/02-022.pdf Evaluation of Hierarchical Clustering Algorithms for Document Datasets]." In: Conference on Information and Knowledge Management ([[CIKM 2002]]). [http://dx.doi.org/10.1145/584792.584877 doi:10.1145/584792.584877]
+* ([[2003_DocumentClustBasedOnNonNegMatFact|Xu et al., 2003]]) ⇒ Wei Xu, Xin Liu, and Yihong Gong. ([[2003]]). “[http://mall.psy.ohio-state.edu/LexicalSemantics/XuLiuGong03.pdf Document Clustering Based on Non-Negative Matrix Factorization].” In: Proceedings of the 26th [[ACM SIGIR Conference]] ([[SIGIR 2003]]). [http://dx.doi.org/10.1145/860435.860485 doi:10.1145/860435.860485]
-* ([[2002_FrequentTermBasedTextClustering|Beil & al, 2002]]) &rArr; [[Florian Beil]], [[Martin Ester]], and [[Xiaowei Xu]]. (2002). "[http://dx.doi.org/10.1145/775047.775110 Frequent Term-based Text Clustering]." In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD 2002]]). [http://dx.doi.org/10.1145/775047.775110 doi:10.1145/775047.775110]
+* ([[2003_HierarchicalDocClusteringUsingFreqItemsets|Funt et al., 2003]]) ⇒ Benjamin C. M. Fung, Ke Wang, [[Martin Ester]]. “[http://www.siam.org/proceedings/datamining/2003/dm03_06FungB.pdf Hierarchical Document Clustering using Frequent Itemsets].” In: Proceedings of the SIAM International Conference on Data Mining 2003 ([[SDM 2003]])
-* ([[2002_EvaluationOfHierClustAlgsForDocDatasets|Zhao & Karypsis, 2002]]) &rArr; Ying Zhao, and [[George Karypis]]. (2002). "[http://dx.doi.org/10.1145/584792.584877 Evaluation of Hierarchical Clustering Algorithms for Document Datasets]." In: Conference on Information and Knowledge Management.
+* ([[2003_WordnetImprovesTextDocumentClustering|Hotho et al., 2003]]) ⇒ [[Andreas Hotho]], [[Steffen Staab]], and Gerd Stumme. ([[2003]]). “[http://www.uni-koblenz.de/~staab/Research/Publications/hothoetal-ijcaiws2001.pdf Wordnet Improves Text Document Clustering].” In: Proceedings of the Semantic Web Workshop (at [[SIGIR 2003]]).
-===2001===
+=== 2002 ===
-* ([[2001_OntologyBasedTextClustering|Hotho & al, 2001]]) &rArr; [[Andreas Hotho]], Alexander  Maedche, and [[Steffen Staab]]. "[http://www.uni-koblenz.de/~staab/Research/Publications/hothoetal-ijcaiws2001.pdf Ontology-based Text Clustering]." In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.
+* ([[2002_EvaluationOfHierClustAlgsForDocDatasets|Zhao & Karypsis, 2002]]) ⇒ Ying Zhao, and [[George Karypis]]. ([[2002]]). “[http://www.cs.umn.edu/tech_reports_upload/tr2002/02-022.pdf Evaluation of Hierarchical Clustering Algorithms for Document Datasets].” In: Conference on Information and Knowledge Management ([[CIKM 2002]]). [http://dx.doi.org/10.1145/584792.584877 doi:10.1145/584792.584877]
-* (Zhao & Karypsis, 2001) &rArr; Ying Zhao, and [[George Karypis]]. (2001). "[http://www.ece.northwestern.edu/~yingliu/datamining_papers/paper1.pdf Criterion Functions for Document Clustering: Experiments and analysis]." Technical Report TR #01--40, Department of Computer Science, University of Minnesota, Minneapolis, MN.
+* ([[2002_FrequentTermbasedTextClustering|Beil et al., 2002]]) ⇒ [[Florian Beil]], [[Martin Ester]], and [[Xiaowei Xu]]. ([[2002]]). “[http://dx.doi.org/10.1145/775047.775110 Frequent Term-based Text Clustering].” In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD-2002]]). [http://dx.doi.org/10.1145/775047.775110 doi:10.1145/775047.775110]
+* ([[2002_EvaluationOfHierClustAlgsForDocDatasets|Zhao & Karypsis, 2002]]) ⇒ Ying Zhao, and [[George Karypis]]. ([[2002]]). “[http://dx.doi.org/10.1145/584792.584877 Evaluation of Hierarchical Clustering Algorithms for Document Datasets].” In: Conference on Information and Knowledge Management.
-===2000===
+=== 2001 ===
-* ([[2000_AComparisonOfDocClustTechniques|Steinbach, 2000]]) &rArr; Michael Steinbach, [[George Karypis]], and [[Vipin Kumar]]. (2000). "[http://www.cs.cmu.edu/~dunja/KDDpapers/Steinbach_IR.pdf A Comparison of Document Clustering Techniques]." In: Proceedings of Workshop at [[KDD 2000]] on Text Mining.
+* ([[2001_OntologyBasedTextClustering|Hotho et al., 2001]]) ⇒ [[Andreas Hotho]], Alexander  Maedche, and [[Steffen Staab]]. “[http://www.uni-koblenz.de/~staab/Research/Publications/hothoetal-ijcaiws2001.pdf Ontology-based Text Clustering].” In: Proceedings of the IJCAI-2001 Workshop on Text Learning: Beyond Supervision.
+* (Zhao & Karypsis, 2001) ⇒ Ying Zhao, and [[George Karypis]]. ([[2001]]). “[http://www.ece.northwestern.edu/~yingliu/datamining_papers/paper1.pdf Criterion Functions for Document Clustering: Experiments and analysis]." Technical Report TR #01--40, Department of Computer Science, University of Minnesota, Minneapolis, MN.
+=== 2000 ===
+* ([[2000_AComparisonOfDocClustTechniques|Steinbach, 2000]]) ⇒ Michael Steinbach, [[George Karypis]], and [[Vipin Kumar]]. ([[2000]]). “[http://www.cs.cmu.edu/~dunja/KDDpapers/Steinbach_IR.pdf A Comparison of Document Clustering Techniques].” In: Proceedings of Workshop at [[KDD-2000]] on Text Mining.
 ** We use two metrics for evaluating cluster quality: entropy, which provides a measure of “goodness” for un-nested clusters or for the clusters at one level of a hierarchical clustering, and the F-measure, which measures the effectiveness of a hierarchical clustering. (The F measure was recently extended to document hierarchies in [5].)
-===1999===
+=== 1999 ===
-* ([[1999_FastAndEffTextMiningUsingLinTimeDocClust|Larsen & Aone, 1999]]) &rArr; Bjornar Larsen, and Chinatsu Aone. (1999). "[http://www.scils.rutgers.edu/~muresan/IR/Docs/Articles/sigkddLarsen1999.pdf Fast and Effective Text Mining Using Linear-time Document Clustering]." In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD 1999]]). [http://dx.doi.org/10.1145/312129.312186 doi:10.1145/312129.312186]
+* ([[1999_FastAndEffTextMiningUsingLinTimeDocClust|Larsen & Aone, 1999]]) ⇒ Bjornar Larsen, and [[Chinatsu Aone]]. ([[1999]]). “[http://www.scils.rutgers.edu/~muresan/IR/Docs/Articles/sigkddLarsen1999.pdf Fast and Effective Text Mining Using Linear-time Document Clustering].” In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ([[KDD-1999]]). [http://dx.doi.org/10.1145/312129.312186 doi:10.1145/312129.312186]
+=== 1997 ===
+* ([[1997_ProjectionsForEfficientDocumentClustering|Schütze & Silverstein, 1997]]) ⇒ [[Hinrich Schütze]], and Craig Silverstein. ([[1997]]). “[http://dx.doi.org/10.1145/278459.258539 Projections for Efficient Document Clustering].” In: ACM SIGIR Forum.
+* Zamir, O., [[Oren Etzioni]], Madani, O., and Karp, R. ([[1997]]). “Fast and Intuitive Clustering of Web Documents.” In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.
-===1997===
+=== 1992 ===
-* ([[1997_ProjectionsForEfficientDocumentClustering|Schütze & Silverstein, 1997]]) &rArr; [[Hinrich Schütze]], and Craig Silverstein. (1997). "[http://dx.doi.org/10.1145/278459.258539 Projections for Efficient Document Clustering]." In: ACM SIGIR Forum.
+* ([[1992_ScatterGatherAClusterBasedApprDocumentColls|Cutting et al, 1992]]) ⇒ Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and [[John W. Tukey]]. ([[1992]]). “[http://dx.doi.org/10.1145/133160.133214 Scatter/Gather: a cluster-based approach to browsing large document collections].” In: Proceedings of the 15th [[ACM SIGIR Conference]] retrieval ([[SIGIR]] 1992).
-* Zamir, O., [[Oren Etzioni]], Madani, O., and Karp, R. (1997). "Fast and Intuitive Clustering of Web Documents." In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.
-===1992===
-* ([[1992_ScatterGatherAClusterBasedApprDocumentColls|Cutting et al, 1992]]) &rArr; Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and [[John W. Tukey]]. (1992). "[http://dx.doi.org/10.1145/133160.133214 Scatter/Gather: a cluster-based approach to browsing large document collections]." In: Proceedings of the 15th  [[ACM SIGIR Conference]]  retrieval ([[SIGIR]] 1992).
 ----
 __NOTOC__
 [[Category:Concept]]

Text-Document Clustering Algorithm: Difference between revisions

Latest revision as of 21:14, 9 May 2024

References

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1997

1992

Navigation menu

Search