2013 DataScienceforBusinessWhatYouNe
- (Provost & Fawcett, 2013) ⇒ Foster Provost, and Tom Fawcett. (2013). “Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking.” O'Reilly Media. ISBN:1449374298
Subject Headings: Business Data Mining Task, Business Task, Data Mining Task.
Notes
Cited By
Quotes
Book Overview
Data Science for Business is a new book by Foster Provost and Tom Fawcett intended for those who need to understand data science/data mining, and those who want to develop their skill at data-analytic thinking. Data Science for Business is not a book about algorithms. Instead it presents a set of fundamental principles for extracting useful knowledge from data. These fundamental principles are the foundation for many algorithms and techniques for data mining, but also underlie the processes and methods for approaching business problems data-analytically, evaluating particular data science solutions, and evaluating general data science plans.
- Design
The book builds up the reader's understanding of data science by discussing the fundamental principles in the context of business examples, and then shows specifically how the principles can provide understanding of many of the most common methods and techniques used in data science. After reading the book, the reader should be able to (i) discuss data science intelligently with data scientists and with other stakeholders, (ii) better understand proposals for data science projects and data science investments, and (iii) participate integrally in data science projects.
As one example, a fundamental principle of data science is that solutions for extracting useful knowledge from data must carefully consider the problem from the business perspective. This may sound obvious at first, but the notion underlies many choices that must be made in the process of data analytics, including problem formulation, method choice, solution evaluation, and general strategy formulation. Another fundamental principle is that some data items can give us information about other data items. This principle manifests itself throughout data science: in the basic notion of finding “correlations” among variables, in the specific design of many particular data mining procedures, and more generally as the basis for all predictive analytics.
- Audience
Data Science for Business is intended for business people who will be managing or working with data scientists, for developers who will be implementing data science solutions, as well as for aspiring data scientists. By its very nature the material is somewhat technical --- the goal is to really understand data science, not to give a high-level overview. However, the book does not presume a sophisticated mathematical background, relegating the few technical details to optional "starred" sections.
Table of Contents
Chapter 1 Introduction: Data-Analytic Thinking
The Ubiquity of Data Opportunities
Example: Hurricane Frances
Example: Predicting Customer Churn
Data Science, Engineering, and Data-Driven Decision Making
Data Processing and “Big Data”
From Big Data 1.0 to Big Data 2.0
Data and Data Science Capability as a Strategic Asset
Data-Analytic Thinking
This Book
Data Mining and Data Science, Revisited
Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist
Summary
Chapter 2 Business Problems and Data Science Solutions
From Business Problems to Data Mining Tasks
Supervised Versus Unsupervised Methods
Data Mining and Its Results
The Data Mining Process
Implications for Managing the Data Science Team
Other Analytics Techniques and Technologies
Summary
Chapter 3 Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
Models, Induction, and Prediction
Supervised Segmentation
Visualizing Segmentations
Trees as Sets of Rules
Probability Estimation
Example: Addressing the Churn Problem with Tree Induction
Summary
Chapter 4 Fitting a Model to Data
Classification via Mathematical Functions
Regression via Mathematical Functions
Class Probability Estimation and Logistic “Regression”
Example: Logistic Regression versus Tree Induction
Nonlinear Functions, Support Vector Machines, and Neural Networks
Summary
Chapter 5 Overfitting and Its Avoidance
Generalization
Overfitting
Overfitting Examined
Example: Overfitting Linear Functions
* Example: Why Is Overfitting Bad?
From Holdout Evaluation to Cross-Validation
The Churn Dataset Revisited
Learning Curves
Overfitting Avoidance and Complexity Control
Summary
Chapter 6 Similarity, Neighbors, and Clusters
Similarity and Distance
Nearest-Neighbor Reasoning
Some Important Technical Details Relating to Similarities and Neighbors
Clustering
Stepping Back: Solving a Business Problem Versus Data Exploration
Summary
Chapter 7 Decision Analytic Thinking I: What Is a Good Model?
Evaluating Classifiers
Generalizing Beyond Classification
A Key Analytical Framework: Expected Value
Evaluation, Baseline Performance, and Implications for Investments in Data
Summary
Chapter 8 Visualizing Model Performance
Ranking Instead of Classifying
Profit Curves
ROC Graphs and Curves
The Area Under the ROC Curve (AUC)
Cumulative Response and Lift Curves
Example: churn performance analytics for modeling performance analytics, for modeling churn Performance Analytics for Churn Modeling
Summary
Chapter 9 Evidence and Probabilities
Example: Targeting Online Consumers With Advertisements
Combining Evidence Probabilistically
Applying Bayes’ Rule to Data Science
A Model of Evidence “Lift”
Example: Evidence Lifts from Facebook "Likes"
Summary
Chapter 10 Representing and Mining Text
Why Text Is Important
Why Text Is Difficult
Representation
Example: Jazz Musicians
* The Relationship of IDF to Entropy
Beyond Bag of Words
Example: Mining News Stories to Predict Stock Price Movement
Summary
Chapter 11 Decision Analytic Thinking II: Toward Analytical Engineering
Targeting the Best Prospects for a Charity Mailing
Our Churn Example Revisited with Even More Sophistication
Chapter 12 Other Data Science Tasks and Techniques
Co-occurrences and Associations: Finding Items That Go Together
Profiling: Finding Typical Behavior
Link Prediction and Social Recommendation
Data Reduction, Latent Information, and Movie Recommendation
Bias, Variance, and Ensemble Methods
Data-Driven Causal Explanation and a Viral Marketing Example
Summary
Chapter 13 Data Science and Business Strategy
Thinking Data-Analytically, Redux
Achieving Competitive Advantage with Data Science
Sustaining Competitive Advantage with Data Science
Attracting and Nurturing Data Scientists and Their Teams
Examine Data Science Case Studies
Be Ready to Accept Creative Ideas from Any Source
Be Ready to Evaluate Proposals for Data Science Projects
A Firm’s Data Science Maturity
Chapter 14 Conclusion
The Fundamental Concepts of Data Science
What Data Can’t Do: Humans in the Loop, Revisited
Privacy, Ethics, and Mining Data About Individuals
Is There More to Data Science?
Final Example: From Crowd-Sourcing to Cloud-Sourcing
Final Words
References
,
| Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2013 DataScienceforBusinessWhatYouNe | Foster Provost Tom Fawcett | Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking | 2013 |