2001 PrinciplesOfDataMining

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Data Mining Textbook

Notes

Cited By

Quotes

Preface

The science of extracting useful information from large data sets or databases is known as data mining. It is a new [[Scientific Discipline|discipline, lying at the intersection of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. All of these are concerned with certain aspects of data analysis, so they have much in common — but each also has its own distinct flavor, emphasizing particular problems and types of solution. …

This text has a different bias. We have attempted to provide a foundational view of data mining. Rather than discuss specific data mining applications at length (such as, say, collaborative filtering, credit scoring, and fraud detection), we have instead focused on the underlying theory and algorithms that provide the “glue” for such applications. This is not to say that we do not pay attention to the applications. Data mining is fundamentally an applied discipline, and with this in mind we make frequent references to case studies and specific applications where the basic theory can (or has been) applied. In our view a mastery of data mining requires an understanding of both statistical and computational issues. This requirement to master two different areas of expertise presents quite a challenge for student and teacher alike. For the typical computer scientist, the statistics literature is relatively impenetrable: a litany of jargon, implicit assumptions, asymptotic arguments, and lack of details on how the theoretical and mathematical concepts are actually realized in the form of a data analysis algorithm. The situation is effectively reversed for statisticians: the computer science literature on machine learning and data mining is replete with discussions of algorithms, pseudocode, computational efficiency, and so forth, often with little reference to an underlying model or inference procedure. An important point is that both approaches are nonetheless essential when dealing with large data sets. An understanding of both the “mathematical modeling” view, and the “computational algorithm” view are essential to properly grasp the complexities of data mining. …


References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2001 PrinciplesOfDataMiningPadhraic Smyth
Heikki Mannila
David J. Hand
Principles of Data Mininghttp://books.google.com/books?id=SdZ-bhVhZGYC2001