1996 TheKDDProcess

Subject Headings: Data Mining Activity, KDD Process.

Notes

AS WE MARCH INTO THE AGE of digital information, the problem of data overload looms ominously ahead. Our ability to analyze and understand massive datasets lags far behind our ability to gather and store the data. A new generation of computational techniques and tools is required to support the extraction of useful knowledge from the rapidly growing volumes of data. These techniques and tools are the subject of the emerging field of knowledge discovery in databases (KDD) and data mining.

Finding useful patterns in data is known by different names (including data mining) in different communities (e.g., knowledge extraction, information discovery, information harvesting, data archeology, and data pattern processing). The term “data mining” is used most by statisticians, database researchers, and more recently by the MIS and business communities. Here we use the term “KDD” to refer to the overall process of discovering useful knowledge from data. Data mining is a particular step in this process — application of specific algorithms for extracting patterns (models) from data. The additional steps in the KDD process, such as data preparation, data selection, data cleaning, incorporation of appropriate prior knowledge, and proper interpretation of the results of mining ensure that useful knowledge is derived from the data. Blind application of data mining methods can be a dangerous activity leading to discovery of meaningless patterns.
KDD has evolved, and continues to evolve, from the intersection of research in such fields as databases, machine learning, pattern recognition, statistics, artificial intelligence and reasoning with uncertainty, knowledge acquisition for expert systems, data visualization, machine discovery [7], scientific discovery, information retrieval, and high-performance computing. KDD software systems incorporate theories, algorithms, and methods from all of these fields.

1. Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
2. John F. Elder, IV, Dary Pregibon, A statistical perspective on knowledge discovery in databases, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
3. Usama M. Fayyad, and Uthurusamy, R., Eds. Proceedings of KDD-95: The First International Conference on Knowledge Discovery and Data Mining. AAAI Press, Menlo Park, Calif., 1995.
4. Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, From data mining to knowledge discovery: an overview, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
5. Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
6. David Heckerman, Bayesian networks for knowledge discovery, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996. 7. Pat Langley, Herbert A. Simon, Applications of machine learning and rule induction, Communications of the ACM, v.38 n.11, p.54-64, Nov. 1995 doi:10.1145/219717.219768,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
1996 TheKDDProcess	Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth			The KDD Process for Extracting Useful Knowledge from Volumes of Data			http://www.mccombs.utexas.edu/faculty/Maytal.Saar-Tsechansky/Teaching/Documents/fayyad.pdf	10.1145/240455.240464