Database Schema Object Matching Task

From GM-RKB
(Redirected from Data Schema Matching Task)
Jump to navigation Jump to search

A Database Schema Object Matching Task is a Record Matching Task that requires the matching model of a Database Schema Object from a Schema Set.



References

2009

  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Data_mapping
    • Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks including:
      • Data transformation or data mediation between a data source and a destination
      • Identification of data relationships as part of data lineage analysis
      • Discovery of hidden sensitive data such as the last four digits social security number hidden in another user id as part of a data masking or de-identification project
      • Consolidation of multiple databases into a single data base and identifying redundant columns of data for consolidation or elimination
    • For example, a company that would like to transmit and receive purchases and invoices with other companies might use data mapping to create data maps from a company's data to standardized ANSI ASC X12 messages for items such as purchase orders and invoices.
  • (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Schema_Matching
    • The terms schema matching and mapping are often used interchangeably. For this article, we differentiate the two as follows: Schema matching is the process of identifying that two objects are semantically related (scope of this article) while mapping refers to the transformations between the objects. For example, in the two schemas DB1.Student (Name, SSN, Level, Major, Marks) and DB2.Grad-Student (Name, ID, Major, Grades); possible matches would be: DB1.Student ≈ DB2.Grad-Student; DB1.SSN = DB2.ID etc. and possible transformations or mappings would be: DB1.Marks to DB2.Grades (100-90 A; 90-80 B..)
    • Automating these two approaches has been one of the fundamental tasks of data integration. In general it is not possible to determine fully automatically the different correspondences between two schemas, primarily because of the differing and often not explicated or documented semantics of the two schemas.
  • http://www.cs.ubc.ca/~rap/teaching/534a/readings/VLDBJ-Dec2001.pdf

2008

  • (Wick et al., 2008) ⇒ Michael Wick, Khashayar Rohanimanesh, Karl Schultz, and Andrew McCallum. (2008). “A Unified Approach for Schema Matching, Coreference, and Canonicalization.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008).
    • As the amount of electronically available information continues to grow, automatic knowledge discovery is becoming increasingly important. Unfortunately, electronic information is typically spread across multiple heterogeneous resources (databases with different schemas, or web documents with different structures) making it necessary to consolidate the data into a single repository or representation before data mining can be successfully applied. However, data integration is a challenging problem. Even the task of merging two databases with similar schemas about the same realworld entities is non-trivial. An automatic system must be able to perform coreference (to identify duplicate records), canonicalization (to pick the best string representation of the duplicate record), and schema matching (to align the fields across schemas).
  • (Madhavan et al., 2005) ⇒ Jayant Madhavan, Philip A. Bernstein, AnHai Doan, and Alon Halevy. (2005). “Corpus-based Schema Matching.” In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005).
    • Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches is inherently difficult to automate.