2007 LIPTUS

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Document Entity Linking

Notes

  • Describes heuristics used to join emails to the the corresponding customer.

Cited By

~7 http://scholar.google.com/scholar?cites=18425155918898670485

Quotes

Abstract

  • Growing competition has made today's banks understand the value of knowing their customers better. In this paper, we describe a tool, LIPTUS, that associates the customer interactions (emails and transcribed phone calls) with customer and account profiles stored in an existing data warehouse. The associations discovered by LIPTUS enable analytics spanning the customer and account profiles on one hand and the meta-data associated or derived from the interaction (using text mining techniques) on the other. We illustrate the value derived from this consolidated analysis through specific customer intelligence applications. LIPTUS is today being extensively used in a large bank in India. A highlight of this paper is a discussion of the technical challenges encountered while building LIPTUS and deploying it on real-life customer data.

1. Introduction

  • In this paper, we describe a tool, LIPTUS (LInking and Processing Tool for Unstructured and Structured information) that addresses these issues. LIPTUS automatically associates the customer interactions (emails and transcribed phone calls) with customer and account profiles stored in an existing database.

3.1 Cleaning the Customer Interaction Text

  • The customer interactions contain a significant amount of irrelevant and redundant text (including irrelevant advertisements, disclaimers, canned greetings, text of earlier messages repeated as history, etc.). This useless additional text makes analysis of the interaction content not only slower, but also less effective since it tends to obscure the actual information contained in the interaction. In this section, we describe the cleaning steps that try to identify and remove the irrelevant and redundant text present in the transactions.

3.2 The Linking Procedure

  • This task is far more difficult than merely looking for numeric sequences in the text and then disambiguating these sequences based on the number of digits, prefix sequences and other patterns. This is because of a variety of reasons, some of which are listed below:
  • LIPTUS uses annotators based on the Unstructured Information Management Architecture (UIMA) [6] to identify the customer and ccount ids. At its simplest, an annotator tokenizes the text and applies pattern-based rules on the token sequence obtained to dentify the interesting tokens (customer and account ids in our case). These rules combine the hints mentioned above (size of the numeric sequence, identifying prefixes) and take the presence of hyphens and whitespaces into account as well. Moreover, they also take hints from the surrounding text to identify the type of the id identified (for instance, a credit card number could be urrounded by the words such as “visa”, “mastercard”, and “expiry”). The annotator also takes hints from the category the interaction is ssociated with (“credit card inquiry”, “cheque status inquiry”, “premium payment”) to identify a small set of alternatives; a cheque status inquiry, for instance, can only relate to a savings or current account.
  • We again emphasize that this extraction process is essentially a best-effort solution, and there is a possibility of an incorrect sequence being extracted as a customer or account id, as well as of a valid customer or account id not being extracted. On the interactions we considered, however, we found that these simple heuristics performed well enough.

References

  • 1. Borthwick, A., Sterling, J., Eugene Agichtein, and Grishman, R. Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In Workshop on Very Large Corpora (1998).
  • 2. (Chakaravarthy et al., 2006) ⇒ Venkatesan T. Chakaravarthy, Himanshu Gupta, Prasan Roy, and Mukesh Mohania. (2006). “Efficiently Linking Text Documents with Relevant Structured Information.” In: Proceedings of VLDB 2006.
  • 3. Chawla, N., Japkowicz, N., and Kotcz, A. Editorial: Special issue on learning from imbalanced data sets. In SIGKDD Explorations (2004).
  • 4. Hao Chen, Jianying Hu, Richard W. Sproat, Integrating geometrical and linguistic analysis for email signature block parsing, ACM Transactions on Information Systems (TOIS), v.17 n.4, p.343-366, Oct. 1999 doi:10.1145/326440.326442
  • 5. William W. Cohen, and Sunita Sarawagi, Exploiting dictionaries in named entity extraction: Combining semi-markov extraction process and data integration methods. In SIGKDD (2004).
  • 6. T. Götz, O. Suhre, Design and implementation of the UIMA common analysis system, IBM Systems Journal, v.43 n.3, p.476-489, July 2004
  • 7. Yoshihiko Hamamoto, Shunji Uchimura, Shingo Tomita, A Bootstrap Technique for Nearest Neighbor Classifier Design, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.19 n.1, p.73-79, January 1997 doi:10.1109/34.566814
  • 8. Hu, M., and Bing Liu Mining and summarizing customer reviews. In SIGKDD (2004).
  • 9. IBM. Made in IBM Labs: IBM Helps HDFC Bank Extract Information Insight to Enhance Customer Care. “http://www.ibm.com/press/us/en/pressrelease/20729.wss”.
  • 10. Joshi, S., Ramakrishnan, G., Balakrishnan, S., and Srinivasan, A. Aggregating contextual patterns for information extraction. In IJCAI 2007 Workshopon Text Mining and Link Analysis (2007).
  • 11. Christopher D. Manning, Hinrich Schütze, Foundations of statistical natural language processing, MIT Press, Cambridge, MA, 1999
  • 12. Mladenić, D., and Grobelnik, M. Feature selection for unbalanced class distribution and naive bayes. In ICML (1999).
  • 13. Roy, P., Mohania, M., Bamba, B. and Raman, S. Associating relevant unstructured content with structured database query results. In ACM CIKM(2005).
  • 14. Fabrizio Sebastiani, Machine learning in automated text categorization, ACM Computing Surveys (CSUR), v.34 n.1, p.1-47, March 2002 doi:10.1145/505282.505283.
  • 15. Jordi Turmo, Alicia Ageno, Neus Català, Adaptive information extraction, ACM Computing Surveys (CSUR), v.38 n.2, p.4-es, 2006 doi:10.1145/1132956.1132957
  • 16. Yang, Y., and Pedersen, J. A comparative study on feature selection in text categorization. In ICML (1997).
  • 17. Yi, J., and Niblack, W. Sentiment mining in web-fountain. In ICDE (2005).

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2007 LIPTUSPrasan Roy
Manish A. Bhide
Ajay Gupta
Rahul Gupta
Mukesh K. Mohania
Zenita Ichhaporia
LIPTUS: associating structured and unstructured information in a banking environmenthttp://dx.doi.org/10.1145/1247480.124758710.1145/1247480.1247587