- (Xu et al., 2009) ⇒ Gu Xu, Shuang-Hong Yang, and Hang Li. (2009). “Named Entity Mining from Click-through Data Using Weakly Supervised Latent Dirichlet Allocation.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557165
- Categories and Subject Descriptors: H.2.8 Database Management: Data Mining — Log Mining; H.3.3 Information Storage and Retrieval: Information Search and Retrieval — Query formulation
- General Terms: Algorithms, Experimentation.
This paper addresses Named Entity Mining (NEM), in which we mine knowledge about named entities such as movies, games, and books from a huge amount of data. NEM is potentially useful in many applications including web search, online advertisement, and recommender system. There are three challenges for the task : finding suitable data source, coping with the ambiguities of named entity classes, and incorporating necessary human supervision into the mining process. This paper proposes conducting NEM by using click-through data collected at a web search engine, employing a topic model that generates the click-through data, and learning the topic model by weak supervision from humans. Specifically, it characterizes each named entity by its associated queries and URLs in the click-through data. It uses the topic model to resolve ambiguities of named entity classes by representing the classes as topics. It employs a method, referred to as Weakly Supervised Latent Dirichlet Allocation (WS-LDA), to accurately learn the topic model with partially labeled named entities. Experiments on a large scale click-through data containing over 1.5 billion query-URL pairs show that the proposed approach can conduct very accurate NEM and significantly outperforms the baseline.
|2009 NamedEntityMiningfromClickthrou||Gu Xu|
|Named Entity Mining from Click-through Data Using Weakly Supervised Latent Dirichlet Allocation||KDD-2009 Proceedings||10.1145/1557019.1557165||2009|
|Author||Gu Xu +, Shuang-Hong Yang + and Hang Li +|
|journal||Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +|
|title||Named Entity Mining from Click-through Data Using Weakly Supervised Latent Dirichlet Allocation +|