Difference between revisions of "2017 LightweightMultilingualEntityEx"

From GM-RKB
Jump to: navigation, search
m (Text replacement - " �n" to " fin")
m (Text replacement - "i�c" to "ific")
Line 36: Line 36:
 
tac kbp 2013 data and on a standard monolingual data
 
tac kbp 2013 data and on a standard monolingual data
 
set, aida. We demonstrate that our system is lightweight in
 
set, aida. We demonstrate that our system is lightweight in
terms of speed and memory footprint. Speci�c contributions
+
terms of speed and memory footprint. Specific contributions
 
of this work include:
 
of this work include:
 
* with very few features that are easy to extend to multiple languages, we can achieve competitive performance on mention detection,
 
* with very few features that are easy to extend to multiple languages, we can achieve competitive performance on mention detection,
* with meta-linguistic context, speci�cally click data from search logs, we can provide competitive performance for multilingual candidate entity retrieval from documents,
+
* with meta-linguistic context, specifically click data from search logs, we can provide competitive performance for multilingual candidate entity retrieval from documents,
 
and
 
and
 
* through e�cient methods for entity disambiguation, we can get further improvements in NEL accuracy
 
* through e�cient methods for entity disambiguation, we can get further improvements in NEL accuracy
Line 52: Line 52:
 
trained on human-labeled data and using lexical, syntactic
 
trained on human-labeled data and using lexical, syntactic
 
and semantic features which may become quite complex and
 
and semantic features which may become quite complex and
language speci�c [50]. With their joint NER/NEL semi-
+
language specific [50]. With their joint NER/NEL semi-
 
Conditional Random Field (CRF) system including Brown
 
Conditional Random Field (CRF) system including Brown
 
clusters, WordNet clusters and dictionaries, Luo et al. [32]
 
clusters, WordNet clusters and dictionaries, Luo et al. [32]
Line 99: Line 99:
 
consist of word shape and capitalization features, token pre-
 
consist of word shape and capitalization features, token pre-
 
�xes and su�xes (up to length 4), numbers and punctuation.
 
�xes and su�xes (up to length 4), numbers and punctuation.
Finally, we experiment with language-speci�c part-of-speech
+
Finally, we experiment with language-specific part-of-speech
 
(POS) tags; POS tagging adds minimal preprocessing and
 
(POS) tags; POS tagging adds minimal preprocessing and
 
is available for over 40 languages.
 
is available for over 40 languages.
Line 379: Line 379:
 
contain less lexical information for disambiguation, the
 
contain less lexical information for disambiguation, the
 
document context may be used to disambiguate entity mentions.
 
document context may be used to disambiguate entity mentions.
Speci�cally, we cluster the entity embeddings (see
+
Specifically, we cluster the entity embeddings (see
 
Section 3.2) of mentions and candidate entities' CFs. We
 
Section 3.2) of mentions and candidate entities' CFs. We
 
use exemplar clustering [18], which lets us choose certain
 
use exemplar clustering [18], which lets us choose certain
Line 387: Line 387:
 
set for candidate entities' entity embeddings and zeros for
 
set for candidate entities' entity embeddings and zeros for
 
mentions' entity embeddings.
 
mentions' entity embeddings.
In this work we speci�cally use the a�nity propagation  
+
In this work we specifically use the a�nity propagation  
 
avor
 
avor
 
of exemplar clustering as implemented in scikitlearn [40].
 
of exemplar clustering as implemented in scikitlearn [40].
Line 817: Line 817:
  
 
6This comparison is indirect; we could not run their system
 
6This comparison is indirect; we could not run their system
and they did not report hardware speci�cations for their
+
and they did not report hardware specifications for their
 
experiments.
 
experiments.
  

Revision as of 19:56, 15 January 2020

Subject Headings: Text Analytics.

Notes

Cited By

Quotes

Abstract

Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to be able to accurately detect entities in new languages with limited labeled resources. In this paper we present an accurate and lightweight [1], multilingual named entity recognition (NER) and linking (NEL) system. The contributions of this paper are three-fold: 1) Lightweight named entity recognition with competitive accuracy; 2) Candidate entity retrieval that uses search click-log data and entity embeddings to achieve high precision with a low memory footprint; and 3) efficient entity disambiguation. Our system achieves state-of-the-art performance on TAC KBP 2013 multilingual data and on English AIDA CONLL data.


References

  • 1. R. Al-Rfou, V. Kulkarni, B. Perozzi, and S. Skiena. Polyglot-NER: Massive Multilingual Named Entity Recognition. In Proc. ICDM, 2015. doi:10.1137/1.9781611974010.66
  • 2. A. Alhelbawy and R. Gaizauskas. Collective Named Entity Disambiguation Using Graph Ranking and Clique Partitioning Approaches. In Proc. COLING, 2014.
  • 3. S. Austin, R. Schwartz, and P. Placeway. The Forward-backward Search Algorithm. In Proc. ICASSP, 1991. doi:10.1109/ICASSp.1991.150435
  • 4. Roi Blanco, Giuseppe Ottaviano, Edgar Meij, Fast and Space-Efficient Entity Linking for Queries, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, February 02-06, 2015, Shanghai, China doi:10.1145/2684822.2685317
  • 5. R. Bunescu and M. Pasca. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proc. EACL, 2006.
  • 6. Diego Ceccarelli, Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Salvatore Trani, Learning Relatedness Measures for Entity Linking, Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, October 27-November 01, 2013, San Francisco, California, USA doi:10.1145/2505515.2505711
  • 7. W. Che, M. Wang, C. D. Manning, and T. Liu. Named Entity Recognition with Bilingual Constraints. In Proc. HLT-NAACL, 2013.
  • 8. X. Cheng and D. Roth. Relational Inference for Wikification. In Proc. EMNLP, 2013.
  • 9. A. Chisholm and B. Hachey. Entity Disambiguation with Web Links. Trans. of the ACL, 3:145--156, 2015.
  • 10. S. Cucerzan. Large-scale Named Entity Disambiguation based on Wikipedia Data. In Proc. EMNLP, 2007.
  • 11. Bhavana Dalvi, Einat Minkov, Partha P. Talukdar, William W. Cohen, Automatic Gloss Finding for a Knowledge Base Using Ontological Constraints, Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, February 02-06, 2015, Shanghai, China doi:10.1145/2684822.2685288
  • 12. Nemanja Djuric, Hao Wu, Vladan Radosavljevic, Mihajlo Grbovic, Narayan Bhamidipati, Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content, Proceedings of the 24th International Conference on World Wide Web, May 18-22, 2015, Florence, Italy doi:10.1145/2736277.2741643
  • 13. G. Durrett and D. Klein. A Joint Model for Entity Analysis: Coreference, Typing, and Linking. Trans. Of the ACL, 2:477--490, 2014.
  • 14. Peter Elias, Efficient Storage and Retrieval by Content and Address of Static Files, Journal of the ACM (JACM), v.21 n.2, p.246-260, April 1974 doi:10.1145/321812.321820
  • 15. A. Fahrni, B. Heinzerling, T. Göckel, and M. Strube. HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2013. In Proc. TAC, 2013.
  • 16. N. Fernandez Garcia, J. Arias Fisteus, and L. Sanchez Fernandez. Comparative Evaluation of Link-based Approaches for Candidate Ranking in Link-to-wikipedia Systems. Journal of Artificial Intelligence Research, 49:733--773, 2014.
  • 17. Jenny Rose Finkel, Trond Grenager, Christopher Manning, Incorporating Non-local Information Into Information Extraction Systems by Gibbs Sampling, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, p.363-370, June 25-30, 2005, Ann Arbor, Michigan doi:10.3115/1219840.1219885
  • 18. B. J. Frey and D. Dueck. Clustering by Passing Messages Between Data Points. Science, 315(5814):972--976, 2007. doi:10.1126/science.1136800
  • 19. Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, Thomas Hofmann, Probabilistic Bag-Of-Hyperlinks Model for Entity Linking, Proceedings of the 25th International Conference on World Wide Web, April 11-15, 2016, Montréal, Québec, Canada doi:10.1145/2872427.2882988
  • 20. Zhaochen Guo, Denilson Barbosa, Robust Entity Linking via Random Walks, Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, November 03-07, 2014, Shanghai, China doi:10.1145/2661829.2661887
  • 21. B. Hachey, W. Radford, and J. R. Curran. Graph-based Named Entity Linking with Wikipedia. In Proc. WISE, 2011. doi:10.1007/978-3-642-24434-6_16
  • 22. D. Hakkani-Tür Et Al. Probabilistic Enrichment of Knowledge Graph Entities for Relation Detection in Conversational Understanding. In Proc. INTERSPEECH, 2014.
  • 23. Xianpei Han, Le Sun, Jun Zhao, Collective Entity Linking in Web Text: A Graph-based Method, Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, July 24-28, 2011, Beijing, China doi:10.1145/2009916.2010019
  • 24. Z. He Et Al. Learning Entity Representation for Entity Disambiguation. In Proc. ACL, 2013.
  • 25. J. Ho Art Et Al. Robust Disambiguation of Named Entities in Text. In Proc. EMNLP, 2011.
  • 26. H. Ji, J. Nothman, and B. Hachey. Overview of\ TAC-KBP2014 Entity Discovery and Linking Tasks. In Proc. TAC, 2014.
  • 27. Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, Soumen Chakrabarti, Collective Annotation of Wikipedia Entities in Web Text, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28-July 01, 2009, Paris, France doi:10.1145/1557019.1557073
  • 28. J. La Erty, A. McCallum, and F. Pereira. Conditional Random Elds: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. ICML, 2001.
  • 29. G. Lample Et Al. Neural Architectures for Named Entity Recognition. ArXiv Preprint ArXiv:1603.01360, 2016.
  • 30. Q. Le and T. Mikolov. Distributed Representations of Sentences and Documents. In Proc. ICML, 2014.
  • 31. X. Ling, S. Singh, and D. Weld. Design Challenges for Entity Linking. Trans. of the ACL, 3:315--328, 2015.
  • 32. G. Luo, X. Huang, C.-Y. Lin, and Z. Nie. Joint Named Entity Recognition and Disambiguation. In Proc. EMNLP, 2015.
  • 33. X. Ma and E. Hovy. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. ArXiv Preprint ArXiv:1603.01354, 2016.
  • 34. E. Meij, K. Balog, and D. Odijk. Entity Linking and Retrieval Tutorial. http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/, 2014.
  • 35. Y. Merhav Et Al. Basis Technology at TAC 2013 Entity Linking. In Proc. TAC, 2013.
  • 36. T. Mikolov Et Al. Distributed Representations of Words and Phrases and their Compositionality. In Proc. NIPS, 2013.
  • 37. N. Okazaki. CRFsuite: A Fast Implementation of Conditional Random Elds (CRFs). http://www.chokkan.org/software/crfsuite/, 2007.
  • 38. N. Okazaki and J. Nocedal. Liblbfgs: A Library of Limited-memory Broyden- Etcher-goldfarb-shanno (l-bfgs). URL http://www.chokkan.org/software/liblbfgs, 2010.
  • 39. A. Passos, V. Kumar, and A. McCallum. Lexicon Infused Phrase Embeddings for Named Entity Resolution. ArXiv Preprint ArXiv:1404.5367, 2014.
  • 40. F. Pedregosa Et Al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011.
  • 41. D. Rao, P. McNamee, and M. Dredze. Entity Linking: Finding Extracted Entities in a Knowledge Base. In Multi-source, Multilingual Information Extraction and Summarization, Pages 93--115. Springer, 2013.
  • 42. L. Ratinov and D. Roth. Design Challenges and Misconceptions in Named Entity Recognition. In Proc. CoNLL, 2009. doi:10.3115/1596374.1596399
  • 43. (Roth et al., 2014) ⇒ D. Roth, H. Ji, M.-W. Chang, and T. Cassidy. Wiki Cation and Beyond: The Challenges of Entity and Concept Grounding. Proc. ACL, 2014.
  • 44. Wei Shen, Jianyong Wang, Ping Luo, Min Wang, Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 11-14, 2013, Chicago, Illinois, USA doi:10.1145/2487575.2487686
  • 45. M. Shirakawa Et Al. Entity Disambiguation based on a Probabilistic Taxonomy. Technical Report MSR-TR-2011-125, Microsoft Research, 2011.
  • 46. Avirup Sil, Alexander Yates, Re-ranking for Joint Named-entity Recognition and Linking, Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, October 27-November 01, 2013, San Francisco, California, USA doi:10.1145/2505515.2505601
  • 47. M. Speriosu, N. Sudan, S. Upadhyay, and J. Baldridge. Twitter Polarity Classi Cation with Label Propagation over Lexical Links and the Follower Graph. In Proc. EMNLP, 2011.
  • 48. J. Suzuki and H. Isozaki. Semi-supervised Sequential Labeling and Segmentation Using Giga-word Scale Unlabeled Data. In Proc. ACL-HLT, 2008.
  • 49. Partha Pratim Talukdar, Koby Crammer, New Regularized Algorithms for Transductive Learning, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II, September 07-11, 2009, Bled, Slovenia doi:10.1007/978-3-642-04174-7_29
  • 50. Erik F. Tjong Kim Sang, Fien De Meulder, Introduction to the CoNLL-2003 Shared Task: Language-independent Named Entity Recognition, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, p.142-147, May 31, 2003, Edmonton, Canada doi:10.3115/1119176.1119195
  • 51. M. Yu, S. Wang, C. Zhu, and T. Zhao. Semi-supervised Learning for Word Sense Disambiguation Using Parallel Corpora. In Proc. FSKD, 2011.
  • 52. Y. Zhou Et Al. Resolving Surface Forms to Wikipedia Topics. In Proc. COLING, 2010.
  • 53. Erik F. Tjong Kim Sang, Introduction to the CoNLL-2002 Shared Task: Language-independent Named Entity Recognition, Proceedings of the 6th Conference on Natural Language Learning, p.1-4, August 31, 2002 doi:10.3115/1118853.1118877
  • 54. E. F. Tjong Kim Sang. Introduction to the CoNLL-2002 Shared Task: Language-independent Named Entity Recognition. In Proc. CoNLL, 2002.;


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2017 LightweightMultilingualEntityExAasish Pappu
Roi Blanco
Yashar Mehdad
Amanda Stent
Kapil Thadani
Lightweight Multilingual Entity Extraction and Linking10.1145/3018661.30187242017
  1. By lightweight, we mean easily extensible to additional languages, with a low memory footprint, and fast.
AuthorAasish Pappu +, Roi Blanco +, Yashar Mehdad +, Amanda Stent + and Kapil Thadani +
doi10.1145/3018661.3018724 +
titleLightweight Multilingual Entity Extraction and Linking +
year2017 +