2013 ADataDrivenMethodforInGameDecis

Subject Headings:

Notes

Professional sports is a roughly $500 billion dollar industry that is increasingly data-driven. In this paper we show how machine learning can be applied to generate a model that could lead to better on-field decisions by managers of professional baseball teams. Specifically we show how to use regularized linear regression to learn pitcher-specific predictive models that can be used to help decide when a starting pitcher should be replaced. A key step in the process is our method of converting categorical variables (e.g., the venue in which a game is played) into continuous variables suitable for the regression. Another key step is dealing with situations in which there is an insufficient amount of data to compute measures such as the effectiveness of a pitcher against specific batters.

For each season we trained on the first 80% of the games, and tested on the rest. The results suggest that using our model could have led to better decisions than those made by major league managers. Applying our model would have led to a different decision 48% of the time. For those games in which a manager left a pitcher in that our model would have removed, the pitcher ended up performing poorly 60% of the time.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2013 ADataDrivenMethodforInGameDecis	Ganeshapillai Gartheeban John Guttag			A Data-driven Method for in-game Decision Making in MLB: When to Pull a Starting Pitcher				10.1145/2487575.2487660		2013