2014 EmpiricalGlitchExplanations

From GM-RKB

Jump to navigation Jump to search

(Dasu et al., 2014) ⇒ Tamraparni Dasu, Ji Meng Loh, and Divesh Srivastava. (2014). “Empirical Glitch Explanations.” In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2014) Journal. ISBN:978-1-4503-2956-9 doi:10.1145/2623330.2623716

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Crossover subsampling; data mining; data quality; glitch explanations; quantitative data cleaning

Abstract

Data glitches are unusual observations that do not conform to data quality expectations, be they logical, semantic or statistical. By applying data integrity constraints, potentially large sections of data could be flagged as being noncompliant. Ignoring or repairing significant sections of the data could fundamentally bias the results and conclusions drawn from analyses. In the context of Big Data where large numbers and volumes of feeds from disparate sources are integrated, it is likely that significant portions of seemingly noncompliant data are actually legitimate usable data.

In this paper, we introduce the notion of Empirical Glitch Explanations - concise, multi-dimensional descriptions of subsets of potentially dirty data - and propose a scalable method for empirically generating such explanatory characterizations. The explanations could serve two valuable functions: (1) Provide a way of identifying legitimate data and releasing it back into the pool of clean data. In doing so, we reduce cleaning-related statistical distortion of the data; (2) Used to refine existing data quality constraints and generate and formalize domain knowledge.

We conduct experiments using real and simulated data to demonstrate the scalability of our method and the robustness of explanations. In addition, we use two real world examples to demonstrate the utility of the explanations where we reclaim over 99% of the suspicious data, keeping data repair related statistical distortion close to 0.

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2014 EmpiricalGlitchExplanations	Tamraparni Dasu Ji Meng Loh Divesh Srivastava			Empirical Glitch Explanations				10.1145/2623330.2623716		2014

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2014_EmpiricalGlitchExplanations&oldid=850201"

Facts

... more about "2014 EmpiricalGlitchExplanations"

Tamraparni Dasu +, Ji Meng Loh + and Divesh Srivastava +

10.1145/2623330.2623716 +

Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining +

Empirical Glitch Explanations +

2014 +