Canonicalization Task
Jump to navigation
Jump to search
A Canonicalization Task is a Task that requires the creation of a Canonical Item for some Thing.
- AKA: Canonicalize, Canonicalization, Merging.
- Context:
- Input: Dataset.
- output: Canonical Item.
- …
- Example(s):
- See: Canonical Form.
References
2009
- (Wikipedia, 2009) ⇒ http://en.wikipedia.org/wiki/Canonicalization
- In computer science, canonicalization (abbreviated c14n, where 14 represents the number of letters between the C and the N), (also sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
- In web search and search engine optimization (SEO), URL canonicalization deals with web content that has more than one possible URL. Having multiple URLs for the same web content can cause problems for search engines - specifically in determining which URL should be shown in search results. [1]
- Example:
- http://www.wikipedia.com
- http://wikipedia.com
- http://www.wikipedia.com/
- http://www.wikipedia.com/?source=asdf
- All of these URLs point to the homepage of Wikipedia, but a search engine will only consider one of them to be the canonical form of the URL.
2008
- (Wick et al., 2008) ⇒ Michael Wick, Khashayar Rohanimanesh, Karl Schultz, and Andrew McCallum. (2008). “A Unified Approach for Schema Matching, Coreference, and Canonicalization.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008).
1998
- UTF-8, a transformation format of ISO 10646. http://www.ietf.org/rfc/rfc2279.txt