Learning Source Descriptions (LSD) System
A Learning Source Descriptions (LSD) System is a Schema Matching System that implements several machine learning algorithms to semi-automatically create semantic maps.
- Context:
- It was first proposed by Doan et al.(2001).
- …
- Example(s):
- the Doan et al.(2001)'s propose system.
- …
- Counter-Example(s):
- See: Machine Learning System, Semantic Map, XML, Data Integration System, Data Sharing System.
References
2002
- (Kurgan et al., 2002) ⇒ Lukasz Kurgan, Waldemar Swiercz, and Krzysztof J. Cios (2002, June). "Semantic Mapping of XML Tags Using Inductive Machine Learning". In ICMLA (pp. 99-109).
- QUOTE: Several mapping systems work with XML documents. The TransScm system (Milo & Zohar, 1998) matches schema based on the structure and names of the SGML tags extracted from DTD files by using concept of labeled graphs. The LSD system (Doan et al., 2001) uses multistrategy learning by utilizing several machine learning (ML) algorithms based on the user-specified mappings to discover matching patterns. Based on these patterns, the mappings between leaf nodes in the DTD trees for two XML documents are generated. We will compare results of semantic mapping generated by our system with the results generated by the LSD system.
2001
- (Doan et al., 2001) ⇒ AnHai Doan, Pedro Domingos, and Alon Y. Halevy. (2001). “Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach.” In: ACM SIGMOD Record, 30(2). doi:10.1145/376284.375731
- QUOTE: A key bottleneck in building data-integration systems is the acquisition of semantic mappings. Today these mappings are provided manually by the builders of the system, resulting in a laborious and error-prone process. The emergence of XML as a standard syntax for sharing data among sources farther fuels data sharing applications, and hence underscores the need to develop methods for acquiring semantic mappings. Clearly, while the task of finding semantic mappings cannot be fully automated, the development of tools for assisting the process is crucial to truly achieve large-scale data integration.
In this paper we describe the LSD (Learning Source Descriptions) system that uses and extends machine learning techniques to semi-automatically creates semantic mappings. Throughout the discussion, we shall assume that the sources present their data in XML, and that the mediated and source schemes are represented with DTDs. Then, the schema-matching problem is to find correspondences among the elements of the mediated schema and the source DTDs. The key idea underlying our approach is that after a small set of data sources have been manually mapped to the mediated schema, LSD should be able to glean significant information from these mappings to successfully propose mappings for subsequent data sources.
- QUOTE: A key bottleneck in building data-integration systems is the acquisition of semantic mappings. Today these mappings are provided manually by the builders of the system, resulting in a laborious and error-prone process. The emergence of XML as a standard syntax for sharing data among sources farther fuels data sharing applications, and hence underscores the need to develop methods for acquiring semantic mappings. Clearly, while the task of finding semantic mappings cannot be fully automated, the development of tools for assisting the process is crucial to truly achieve large-scale data integration.
2000
- (Doan et al., 200) ⇒ AnHai Doan, Pedro Domingos, and Alon Y. Halevy. (2000, May). "Learning Source Description for Data Integration". In WebDB (Informal Proceedings) (pp. 81-86).
- QUOTE: In general, there are many different types of information that a learner can exploit, such as names, formats, word frequencies, positions, and characteristics of value distribution. Clearly, no single learner will be able to exploit effectively all such types of information. Hence, our work takes a multi-strategy learning approach. We apply a set of learners, each of which learns well certain kinds of patterns, and then the predictions of the learners are combined using a meta-learner. In addition to providing accuracy superior to any single learner, this technique has the advantage of being extensible when new learners are developed. We describe the LSD (Learning Source Descriptions) system that we built for testing this approach, and our initial experimental results. The results show that with the current set of three learners, we already obtain predictive accuracy of 62-75% prediction in a fairly complex domain of real-estate data sources. Our work currently focuses on finding one-to-one mappings for the leaf elements of source schemas.
1998
- (Milo & Zohar, 1998) ⇒ Tova Milo, and Sagit Zohar (1998, August). "Using schema matching to simplify heterogeneous data translation". In vldb (Vol. 98, pp. 24-27).