- (Lafferty & Wasserman, 2009) ⇒ John Lafferty, Larry Wasserman. (2009). “Statistical Machine Learning - Course: 10-702." Spring 2009, Carnegie Mellon Institute.
- It assumes knowledge of: Central Limit Theorem, Maximum Likelihood, Delta Method, Fisher Information, Bayesian Inference, Posterior Distribution, Bias, Variance, Mean Squared Error, Determinant, Eigenvalue, Eigenvector.
- It covers: Parametric Methods, Nonparametric Methods, Data Sparsity, Kernel Methods, and others.
- It does not cover: Instance-based Learning Algorithms (that make use of a Distance Function).
- This course builds on the material presented in Machine Learning (10-701) and Intermediate Statistics (36-705), introducing new learning methods and going more deeply into statistical and computational aspects. Topics include convexity, the bootstrap, directed graphs and conditional independence, undirected graphical models, causal inference, nonparametric curve estimation, smoothing using wavelets and orthogonal functions, classification, consistency, approximate inference algorithms, kernel methods, and stochastic simulation.
- Statistical Machine Learning is a second graduate level course in machine learning, assuming students have taken Machine Learning (10-701) and Intermediate Statistics (36-705). The term "statistical" in the title reflects the emphasis on statistical analysis and methodology, which is the predominant approach in modern machine learning.
- The course combines methodology with theoretical foundations and computational aspects. It treats both the "art" of designing good learning algorithms and the "science" of analyzing an algorithm's statistical properties and performance guarantees. Theorems are presented together with practical aspects of methodology and intuition to help students develop tools for selecting appropriate methods and approaches to problems in their own research.
- The course includes topics in statistical theory that are now becoming important for researchers in machine learning, including consistency, minimax estimation, and concentration of measure. It also presents topics in computation including elements of convex optimization, variational methods, randomized projection algorithms, and techniques for handling large data sets.
- Topics will be chosen from the following basic outline, which is subject to change.
- Statistical theory: Maximum likelihood, Bayes, minimax, Parametric versus Nonparametric Methods, Bayesian versus Non-Bayesian Approaches, classification, regression, density estimation.
- Convexity and optimization: Convexity, conjugate functions, unconstrained and constrained optimization, KKT conditions.
- Parametric methods: Linear Regression, Model Selection, Generalized Linear Models, Mixture Models, Classification (linear, logistic, support vector machines), Graphical Models, Structured Prediction, Hidden Markov Models.
- Sparsity: High Dimensional Data and Sparsity, Basis Pursuit and the Lasso Revisited, Sparsistency, Consistency, Persistency, Greedy Algorithms for Sparse Linear Regression, Sparsity in Nonparametric Regression. Sparsity in Graphical Models, Compressed Sensing.
- Nonparametric methods: Nonparametric Regression and Density Estimation, Nonparametric Classification, Boosting, Clustering and Dimension Reduction, PCA, Manifold Methods, Principal Curves, Spectral Methods, The Bootstrap and Subsampling, Nonparametric Bayes.
- Advanced theory: Concentration of Measure, Covering numbers, Learning theory, Risk Minimization, Tsybakov noise, minimax rates for classification and regression, surrogate loss functions, boosting, sparsistency, Minimax theory.
- Kernel methods: Mercel kernels, reproducing kernel Hilbert spaces, relationship to nonparametric statistics, kernel classification, kernel PCA, kernel tests of independence.
- Computation: The EM Algorithm, Simulation, Variational Methods, Regularization Path Algorithms, Graph Algorithms.
- Other learning methods: Functional Data, Semi-Supervised Learning, Reinforcement Learning, Minimum Description Length, Online Learning, The PAC Model, Active Learning.
We will assume that you are familiar with the following concepts: 1. convergence in probability 2. central limit theorem 3. maximum likelihood 4. delta method 5. Fisher information 6. Bayesian inference 7. posterior distribution 8. bias, variance and mean squared error 9. determinants, eigenvalues, eigenvectors,
|2009 StatisticalMachineLearningCourse10-702||John D. Lafferty|
|Statistical Machine Learning - Course: 10-702||http://www.cs.cmu.edu/~10702/||2009|