2003 ANoteOnTheUnifOfIEandDM

Jump to navigation Jump to search

Subject Headings: Position Paper, Information Extraction Algorithm, Data Mining Algorithm, Conditional Random Field.


Cited By




Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach.

This position paper proposes the use of unified, relational, undirected graphical models for information extraction and data mining, in which extraction decisions and data-mining decisions are made in the same probabilistic “currency,” with a common inference procedure — each component thus being able to make up for the weaknesses of the other and therefore improving the performance of both. For example, data mining run on a partially-filled database can find patterns that provide “topdown” accuracy-improving constraints to information extraction. Information extraction can provide a much richer set of “bottom-up” hypotheses to data mining if the mining is set up to handle additional uncertainty information from extraction.

We outline an approach and describe several models, but provide no experimental results.


  • [Agrawal et al., 1993] Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Peter Buneman and Sushil Jajodia, editors, Proceedings of the 1993 ACM SIGMOD Conference, pages 207–216, Washington, D.C., 26–28 1993.
  • [Anderson et al., 2002] C. Anderson, Pedro Domingos, and D.Weld. Relational markov models and their application to adaptive web navigation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). ACM Press, 2002.
  • [Appelt et al., 1995] Douglas E. Appelt, Jerry R. Hobbs, J. Bear, D. Israel, M. Kameyama, A. Kehler, D. Martin, K. Myers, and M. Tyson. Proceedings of the Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann, 1995.
  • [Bikel et al., 1997] Daniel M. Bikel, Scott Miller, Richard Schwartz, and Ralph Weischedel. Nymble: a high-performance learning name-finder. In: Proceedings of ANLP-97, pages 194–201, 1997.
  • [Bilenko and Mooney, 2002] Mikhail Bilenko and Raymond Mooney. Learning to combine trained distance metrics for duplicate detection in databases. Technical Report AI 02-296, Artificial Intelligence Lab, University of Texas at Austin, February 2002.
  • [Blei et al., 2002] David Blei, Drew Bagnell, and Andrew McCallum. Learning with scope, with application to information extraction and classification. In Uncertainty in Artificial Intelligence (UAI), 2002.
  • [Borthwick et al., 2000] Andrew Borthwick, Vikki Papadouka, and Deborah Walker. The MEDD de-duplication project. In Immunization Registry Conference, 2000.
  • [Breiman et al., 1984] Leo Breiman, Jerome H. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Chapman & Hall, New York, 1984.
  • [Breiman, 1996] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
  • [Bruninghaus and Ashley, 2001] Stefanie Bruninghaus and Kevin D. Ashley. Improving the representation of legal case texts with information extraction methods. In: Proceedings of the 8th International Conference on Artificial Intelligence and Law, 2001.
  • [Carreras et al., 2002] Xavier Carreras, Lluis Marques, and Lluis Padro. (2002). “Named Entity Extraction Using Adaboost.” In: Proceedings of CoNLL 2002.
  • [Cohen and Hirsh, 1998] William W. Cohen and Haym Hirsh. Joins that generalize: text classification using WHIRL. In Rakesh Agrawal, Paul E. Stolorz, and Gregory Piatetsky-Shapiro, editors, Proceedings of KDD-98, 4th International Conference on Knowledge Discovery and Data Mining, pages 169–173, New York, US, (1998). AAAI Press, Menlo Park, US.
  • [Collins, 2002] Michael Collins. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In ACL-02, 2002.
  • [Craven et al., 1998] M Craven, D DiPasquo, D Freitag, A McCallum, T Mitchell, K Nigam, and S Slattery. Learning to extract symbolic knowledge from the World Wide Web. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 509–516, 1998.
  • [DARPA, 2002] DARPA. Darpa automatic content extraction program, 2002.
  • [Domingos and Richardson, 2001] Pedro Domingos and * [Matthew Richardson]]. Mining the network value of customers. In: Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining, pages 57–66, 2001.
  • [Fawcett and Provost, 1997] Tom Fawcett, and Foster J. Provost. Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3):291–316, 1997.
  • [Freitag and McCallum, 1999] Dayne Freitag and Andrew Kachites McCallum. Information extraction with hmms and shrinkage. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Informatino Extraction, 1999.
  • [Getoor et al., 2001] Lise Getoor, Nir Friedman, Daphne Koller, and A. Pfeffer. Learning probabilistic relational models. In S. Dzeroski and N. Lavrac, editors, Relational Data Mining. Springer-Verlag, 2001.
  • [Ghahramani and Jordan, 1995] Zoubin Ghahramani and Michael I. Jordan. Factorial hidden Markov models. In David S. Touretzky, Michael C. Mozer, and Michael E. Hasselmo, editors, Proceedings of Conference Advances in Neural Information Processing Systems, NIPS, volume 8, pages 472–478. MIT Press, 1995.
  • [Ghani et al., 2000] Rayid Ghani, Rosie Jones, Dunja Mladenić, Kamal Nigam, and Sean Slattery. (2000). “Data Mining on Symbolic Knowledge Extracted from the Web.” In: Proceedings of the KDD-2000 Workshop on Text Mining.
  • [Hammersley and Clifford, 1971] J. Hammersley and P. Clifford. Markov fields on finite graphs and lattices. Unpublished manuscript, 1971.
  • [Hearst, 1999] Marti Hearst. Untangling text data mining. In: Proceedings of ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics, 1999.
  • [Jaakkola et al., 2001] T. Jaakkola, M. Wainwright, and A. Willsky. Tree-based reparameterization for approximate estimation on graphs with cycles. In Neural Information Processing Systems (NIPS), 2001.
  • [Jensen and Neville, 2002] D. Jensen and J. Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In: Proceedings of the Nineteenth International Conference on Machine Learning (ICML2002), pages 259–266. Morgan Kaufmann, 2002.
  • [Jensen and Neville, 2003] D. Jensen and J. Neville. Randomization tests for relational learning. In International Joint Conference on Artificial Intelligence (submitted), 2003.
  • [Jensen et al., 2003a] D. Jensen, J. Neville, and M. Hay. Degree disparity leads to aggregation bias in relational models. In: Proceedings of The International Conference on Machine Learning (submitted), 2003.
  • [Jensen et al., 2003b] D. Jensen, M. Rattigan, and H. Blau. Misclassification errors and collective inference in relational data. In Conference on Knowledge Discovery and Data Mining (submitted), 2003.
  • [Jensen, 1999] D. Jensen. Statistical challenges to inductive inference in linked data. In Papers of the 7th International Workshop on Artificial Intelligence and Statistics, 1999.
  • [Jerry et al., 1996] H. Jerry, R. Douglas, E. Appelt, J. Bear, D. Israel, M. Kameyama, M. Stickel, and M. Tyson. Fastus: A cascaded finite-state transducer for extracting information from natural language text, 1996.
  • [Jordan, 1998] Michael I. Jordan, editor. Learning in Graphical Models. MIT Press, Cambridge, 1998.
  • [Klein et al., 2003] Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. Named entity recognition with character-level models. In: Proceedings the Seventh Conference on Natural Language Learning, 2003.
  • [Kosala and Blockeel, 2000] Kosala and Blockeel. Web mining research: A survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 2, 2000.
  • [Lafferty et al., 2001] John D. Lafferty,
  • Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pages 282–289, 2001.
  • [Lawrence et al., 1999] Steve Lawrence, C. Lee Giles, and Kurt Bollacker. Digital libraries and Autonomous Citation Indexing. IEEE Computer, 32(6):67–71, 1999.
  • [Leek, 1997] Timothy R. Leek. Information extraction using hidden Markov models. Master’s thesis, UC San Diego, 1997.
  • [Malouf, 2002] Robert Malouf. A comparison of algorithms for maximum entropy parameter estimation. In Sixth Workshop on Computational Language Learning (CoNLL-2002), 2002.
  • [McCallum and Li, 2003] Andrew McCallum and Wei Li. Early results for named entity extraction with conditional random fields, feature induction and web-enhanced lexicons. In Seventh Conference on Natural Language Learning (CoNLL), 2003.
  • [McCallum and Wellner, 2003] Andrew McCallum and Ben Wellner. Toward conditional models of identity uncertainty with application to proper noun coreference. In IJCAI Workshop on Information Integration on the Web, 2003.
  • [McCallum et al., 2000a] Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of ICML, pages 591–598, 2000.
  • [McCallum et al., 2000b] Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the contruction of internet portals with machine learning. Information Retrieval Journal, 3:127–163, 2000.
  • [McCallum et al., 2000c] Andrew McCallum, Kamal Nigam, and Lyle H. Ungar. Efficient clustering of high-dimensional data sets with application to reference matching. In Knowledge Discovery and Data Mining, pages 169–178, 2000.
  • [McCallum, 2002] Andrew McCallum, (2002). Personal experience at WhizBang Labs, Inc.
  • [McCallum, 2003] Andrew McCallum. Efficiently inducing features of conditional random fields. In Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI03), 2003.
  • [McCullagh and Nelder, 1989] P. McCullagh and J. Nelder. Generalized Linear Models. Chapman & Hall, New York, 1989.
  • [Miller et al., 2000] Scott Miller, Heidi Fox, Lance Ramshaw, and RalphWeischedel. A novel use of statistical parsing to extract information from text. In 6th Applied Natural Language Processing Conference, 2000.
  • [Mitchell, 1997] Tom M. Mitchell. Machine Learning. McGraw Hill, 1997.
  • [Morton, 1997] Thomas Morton. Coreference for NLP applications. In: Proceedings ACL, 1997.
  • [Nahm and Mooney, 2000] Un Yong Nahm and
  • [Raymond Mooney]]. A mutually beneficial integration of data mining and information extraction. In AAAI/IAAI, pages 627–632, 2000.
  • (Neville & Jensen, 2000) ⇒ Jennifer Neville, and David Jensen. (2000). “Iterative Classification in Relational Data.” In: Proceedings of the AAAI-2000 Workshop on Statistical Relational Learning.
  • [Pasula et al., 2002] Hanna Pasula, Bhaskara Marthi, Brian Milch, Stuart Russell, and Ilya Shpitser. Identity uncertainty and citation matching. In Advances in Neural Information Processing (NIPS), 2002.
  • [Pinto et al., 2003] David Pinto, Andrew McCallum, Xen Lee, and W. Bruce Croft. Combining classifiers in text categorization. In Submitted to SIGIR ’03: Proceedings of the Twenty-sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003.
  • [Quinlan, 1993] R. Quinlan. C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo, CA, 1993.
  • [Ray and Craven, 2001] S. Ray and M. Craven. Representing sentence structure in hidden markov models for information extraction. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 2001.
  • [Rohanimanesh and McCallum, 2003] Khashayar Rohanimanesh and Andrew McCallum. Factorial conditional random fields. Technical report, Department of Computer Science, University of Massachusetts, 2003.
  • [Roth and Yih, 2002] Dan Roth and Wen-tau Yih. Probabilistic reasoning for entity and relation recognition. In COLING’02, 2002.
  • [Schapire, 1999] Robert E. Schapire. Theoretical views of boosting. Lecture Notes in Computer Science, 1572:1–10, 1999.
  • [Sha and Pereira, 2003a] Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. Technical Report CIS TR MS-CIS-02-35, University of Pennsylvania, 2003.
  • [Sha and Pereira, 2003b] Fei Sha and Fernando Pereira. Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology, NAACL, 2003.
  • [Soderland, 1997] S. Soderland. Learning to extract text-based information from the world wide web. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), 1997.
  • [Taskar et al., 2002] Ben Taskar, P. Abbeel, and Daphne Koller. Discriminative probabilistic models for relational data. In Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI02), 2002.
  • [Wang, 1999] Xue Z. Wang. Data Mining and Knowledge Discovery for Process Monitoring and Control. Springer Verlag, 1999.
  • [Zelenko et al., 2003] Dmitry Zelenko, Chinatsu Aone, and Anthony Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research (submitted), 2003.


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2003 ANoteOnTheUnifOfIEandDMDavid JensenA Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Modelshttp://www.cs.umass.edu/~mccallum/papers/iedatamining-ijcaiws03.pdf