2010 ProbabilisticTopicModelsForLearTermOnts

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Probabilistic Topic Model; Terminological Ontology; Corpus-based Ontology Population.

Notes

Quotes

Author Keywords

Knowledge acquisition, Ontology learning, Ontology, Probabilistic topic models.

Abstract

Probabilistic topic models were originally developed and utilized for document modeling and topic extraction in Information Retrieval. In this paper, we describe a new approach for automatic learning of terminological ontologies from text corpus based on such models. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-document interpreted in terms of probability distributions. We propose two algorithms for learning terminological ontologies using the principle of topic relationship and exploiting information theory with the probabilistic topic models learned. Experiments with different model parameters were conducted and learned ontology statements were evaluated by the domain experts. We have also compared the results of our method with two existing concept hierarchy learning methods on the same data set. The study shows that our method outperforms other methods in terms of recall and precision measures. The precision level of the learned ontology is sufficient for it to be deployed for the purpose of browsing, navigation, and information search and retrieval in digital libraries.


References

  • [1] T. Berners-Lee, J. Hendler, and O. Lassila, "The Semantic Web," Scientific Am., vol. 284, no. 5, pp. 34-43, 2001.
  • [2] P. Cimiano, Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York, Inc., 2006.
  • [3] S. Ponzetto and M. Strube, "Deriving a Large Scale Taxonomy from Wikipedia" Proceedings of 22nd Nat'l Conference Artificial Intelligence (AAAI '07), pp. 1440-1447, July 2007.
  • [4] F. M. Suchanek, G. Kasneci, and G. Weikum, "Yago: A Core of Semantic Knowledge," Proceedings of 16th Int'l Conference World Wide Web (WWW '07), pp. 697-706, 2007.
  • [5] H. Cunningham, "Information Extraction, Automatic," Encyclopedia of Language and Linguistics, second ed., Elsevier Science, 2005.
  • [6] P. Cimiano and J. Völker, "Text2onto," Proceedings of Int'l Conference Natural Language to Information Systems (NLDB), pp. 227-238, 2005.
  • [7] A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff, "Semantic Annotation, Indexing, and Retrieval," J. Web Semantics, vol. 2, no. 1, pp. 49-79, 2004.
  • [8] M. Fleischman and E. H. Hovy, "Fine Grained Classification of Named Entities," Proceedings of Int'l Conference Computational Linguistics (COLING '02), 2002.
  • [9] F. M. Suchanek, G. Ifrim, and G. Weikum, "Combining Linguistic and Statistical Analysis to Extract Relations from Web Documents," Proceedings of ACM SIGKDD, pp. 712-717, * 2006.
  • [10] Marius Paşca. (2005). “Finding Instance Names and Alternative Glosses on the Web: Wordnet Reloaded.” In: Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2005).
  • [11] T. Hofmann, "Probabilistic Latent Semantic Analysis," Proceedings of Uncertainty in Artificial Intelligence (UAI), pp. 289-296, 1999.
  • [12] D. M. Blei, A.Y. Ng, and M.I. Jordan, "Latent Dirichlet Allocation," J. Machine Learning Research, vol. 3, pp. 993-1022, 2003.
  • [13] T.L. Griffiths and Mark Steyvers, "Finding Scientific Topics," Proceedings of Nat'l Academy of Sciences USA, vol. 101, no. 1, pp. 5228-5235, Apr. 2004.
  • [14] Mark Steyvers and T. Griffiths, "Probabilistic Topic Models," Latent Semantic Analysis: A Road to Meaning, Thomas K. Landauer, D. Mcnamara, S. Dennis, and W. Kintsch, eds., Lawrence Erlbaum, 2005.
  • [15] R. Navigli and P. Velardi, "Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites," Computational Linguistics, vol. 30, no. 2, pp. 151-179, 2004.
  • [16] M. Sanderson and W.B. Croft, "Deriving Concept Hierarchies from Text," Proceedings of ACM SIGIR, pp. 206-213, 1999.
  • [17] J. Diederich and W.-T. Balke, "The Semantic Growbag Algorithm: Automatically Deriving Categorization Systems," Proceedings of European Conference Digital Libraries (ECDL), pp. 1-13, 2007.
  • [18] Chris Biemann, "Ontology Learning from Text: A Survey of Methods," LDV Forum, vol. 20, no. 2, pp. 75-93, 2005.
  • [19] M. A. Hearst, "Automatic Acquisition of Hyponyms from Large Text Corpora," Proceedings of Int'l Conference Computational Linguistics (COLING), pp. 539-545, 1992.
  • [20] Z. Harris, Mathematical Structures of Language. Wiley, 1968.
  • [21] S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, "Indexing by Latent Semantic Analysis," J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
  • [22] M.W. Berry, S.T. Dumais, and G.W. O'Brien, "Using Linear Algebra for Intelligent Information Retrieval," SIAM Rev., vol. 37, pp. 573-595, 1995.
  • [23] J. Bilmes, "A Gentle Tutorial on the Em Algorithm and Its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," Technical Report ICSI-TR-97-021, Univ. of Berkeley, 1997.
  • [24] G. D'Agostini, "Bayesian Inference in Processing Experimental Data: Principles and Basic Applications," Reports on Progress in Physics, vol. 66, no. 9, pp. 1383-1419, 2003.
  • [25] C. Andrieu, N. de Freitas, A. Doucet, and M.I. Jordan, "An Introduction to MCMC for Machine Learning," Machine Learning, vol. 50, nos. 1/2, pp. 5-43, 2003.
  • [26] T. Griffiths, "Gibbs Sampling in the Generative Model of Latent Dirichlet Allocation," technical report, Stanford Univ., 2002.
  • [27] D.J. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge Univ. Press, 2003.
  • [28] E. Zavitsanos, G. Paliouras, G.A. Vouros, and S. Petridis, "Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora," Proceedings of IEEE/WIC/ACM Int'l Conference Web Intelligence (WI '07), pp. 402-408, 2007.
  • [29] L. Itti and P. Baldi, "Bayesian Surprise Attracts Human Attention," Advances in Neural Information Processing Systems, vol. 19, pp. 547-554, MIT Press, 2006.
  • [30] C. D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval. Cambridge Univ. Press, 2008.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 ProbabilisticTopicModelsForLearTermOntsWang Wei
Payam Barnaghi
Andrzej Bargiela
Probabilistic Topic Models for Learning Terminological Ontologieshttp://baggins.nottingham.edu.my/~wangwei/publications/ieee-tkde.pdf10.1109/TKDE.2009.122