2014 MachineLearningAnAlgorithmicPer

(Marsland, 2014) ⇒ Stephen Marsland. (2014). “Machine Learning: An Algorithmic Perspective, Second Edition.” Chapman & Hall/CRC. ISBN:1466583282, 9781466583283

Subject Headings: ML Textbook; Accuracy Metrics; Activation Function; AdaBoost; Approximate Inference; Associative Memory; Auto-Associative Network; Back-Propagation of Error; Basis Expansion; Batch Training; Baum–Welch; Forward–Backward Algorithm; Bayesian Networks; Bias Input; Boundary Conditions; Competitive Learning; Computing the Posterior; Confusion Matrix; Conjugate Gradients; Constrained Optimisation Problem; Continuous Hopfield Network; Covariance Functions; Cubic Spline; Curse of Dimensionality; Data Compression; Data Preparation; Deep Belief Networks (DBN); Directed Belief Network; Distance Measures; Energy Function; Entropy in Information Theory; Error Function; Evaluating Fitness; Exclusive Or (XOR) Function; Exhaustive Search; Expectation-Maximisation (EM) Algorithm; Fitting the Spline to the Data; Forward Algorithm; Gaussian Random Numbers; Genetic Algorithms; Gibbs Sampling; Gini Impurity; Hidden Layers; ID3; Information Criteria; Initialising the Weights; KD-Tree; Kalman Filter; Laplace Approximation; Learning Rate; Learning the Parameters; Levenberg–Marquardt Algorithm; Logic Functions; Markov Chains; Markov Decision Processes; Markov Property; McCulloch and Pitts Neuronal Model; McCulloch and Pitts Neurons; Measurement Precision; Metropolis–Hastings Algorithm; Stochastic Gradient Descent; Minimising Risk; Multi-Class Classification; Multi-Dimensional Scaling (MDS); Multi-layer Perceptron; Nearest Neighbour Smoothing; Network Dimensionality; Non-Linearly Separable Problem; Overfitting; PCA; Particle Filter; Perceptron Convergence Theorem; Perceptron Learning Algorithm; Pima Indian Dataset; Random Numbers; Receiver Operator Characteristic (ROC) Curve; Regression Problem; Regression in Trees; Restricted Boltzmann Machine; Reward Function; SOM Algorithm; SVM Regression; Sequential Training; Simulated Annealing; Slack Variables; Smoothing Splines; String Representation; Stumping; Supervised Learning; Testing Dataset; The Four Peaks Problem; The Knapsack Problem; Time-Series Prediction; Training Data; Training Dataset; Training Neural Networks; Unbalanced Datasets; Validation Dataset; Viterbi Algorithm; Weight Update Rule; When to Stop Learning; k-Means Neural Network.

Notes

Cited By

Quotes

Abstract

A proven, hands-on approach for students without a strong statistical foundation since the best-selling first edition was published, there have been several prominent developments in the field of machine learning, including the increasing work on the statistical interpretations of machine learning algorithms. Unfortunately, computer science students without a strong statistical background often find it hard to get started in this area. Remedying this deficiency, Machine Learning: An Algorithmic Perspective, Second Edition helps students understand the algorithms of machine learning. It puts them on a path toward mastering the relevant mathematics and statistics as well as the necessary programming and experimentation. New to the Second Edition two new chapters on deep belief networks and Gaussian processes. Reorganization of the chapters to make a more natural flow of content. Revision of the support vector machine material, including a simple implementation for experiments. New material on random forests, the perceptron convergence theorem, accuracy methods, and conjugate gradient optimization for the multi-layer perceptron. Additional discussions of the Kalman and particle filters Improved code, including better use of naming conventions in Python. Suitable for both an introductory one-semester course and more advanced courses, the text strongly encourages students to practice with the code. Each chapter includes detailed examples along with further reading and problems. All of the code used to create the examples is available on the authors website.

Features

Reflects recent developments in machine learning, including the rise of deep belief networks
Presents the necessary preliminaries, including basic probability and statistics.
Discusses supervised learning using neural networks.
Covers dimensionality reduction, the EM algorithm, nearest neighbor methods, optimal decision boundaries, kernel methods, and optimization.
Describes evolutionary learning, reinforcement learning, tree-based learners, and methods to combine the predictions of many learners.
Examines the importance of unsupervised learning, with a focus on the self-organizing feature map.
Explores modern, statistically based approaches to machine learning.
Provides working Python code for all the algorithms on the author’s website, enabling students to experiment with the code

 CHAPTER 1 Introduction 1
 1.1 IF DATA HAD MASS, THE EARTH WOULD BE A BLACK HOLE 1
 1.2 LEARNING 4
 1.2.1 Machine Learning 4
 1.3 TYPES OF MACHINE LEARNING 5
 1.4 SUPERVISED LEARNING 6
 1.4.1 Regression 6
 1.4.2 Classification 8
 1.5 THE MACHINE LEARNING PROCESS 10
 1.6 A NOTE ON PROGRAMMING 11
 1.7 A ROADMAP TO THE BOOK 12
 FURTHER READING 13

 CHAPTER 2 Preliminaries 15
 2.1 SOME TERMINOLOGY 15
 2.1.1 Weight Space 16
 2.1.2 The Curse of Dimensionality 17
 2.2 KNOWING WHAT YOU KNOW: TESTING MACHINE LEARNING ALGORITHMS 19
 2.2.1 Overfitting 19
 2.2.2 Training, Testing, and Validation Sets 20
 2.2.3 The Confusion Matrix 21
 2.2.4 Accuracy Metrics 22
 2.2.5 The Receiver Operator Characteristic (ROC) Curve 24
 2.2.6 Unbalanced Datasets 25
 2.2.7 Measurement Precision 25
 2.3 TURNING DATA INTO PROBABILITIES 27
 2.3.1 Minimising Risk 30
 SOME BASIC STATISTICS 32
 2.4.1 Averages 32
 2.4.2 Variance and Covariance 32
 2.4.3 The Gaussian 34
 2.5 THE BIAS-VARIANCE TRADEOFF 35
 FURTHER READING 36
 PRACTICE QUESTIONS 37

 CHAPTER 3 Neurons, Neural Networks, and Linear Discriminants 39
 3.1 THE BRAIN AND THE NEURON 39
 3.1.1 Hebb’s Rule 40
 3.1.2 McCulloch and Pitts Neurons 40
 3.1.3 Limitations of the McCulloch and Pitts Neuronal Model 42
 3.2 NEURAL NETWORKS 43
 3.3 THE PERCEPTRON 43
 3.3.1 The Learning Rate 46
 3.3.2 The Bias Input 46
 3.3.3 The Perceptron Learning Algorithm 47
 3.3.4 An Example of Perceptron Learning: Logic Functions 48
 3.3.5 Implementation 49
 3.4 LINEAR SEPARABILITY 55
 3.4.1 The Perceptron Convergence Theorem 57
 3.4.2 The Exclusive Or (XOR) Function 58
 3.4.3 A Useful Insight 59
 3.4.4 Another Example: The Pima Indian Dataset 61
 3.4.5 Preprocessing: Data Preparation 63
 3.5 LINEAR REGRESSION 64
 3.5.1 Linear Regression Examples 66
 FURTHER READING 67
 PRACTICE QUESTIONS 68

 CHAPTER 4 The Multi-layer Perceptron 71
 4.1 GOING FORWARDS 73
 4.1.1 Biases 73
 4.2 GOING BACKWARDS: BACK-PROPAGATION OF ERROR 74
 4.2.1 The Multi-layer Perceptron Algorithm 77
 4.2.2 Initialising the Weights 80
 4.2.3 Different Output Activation Functions 81
 4.2.4 Sequential and Batch Training 82
 4.2.5 Local Minima 82
 4.2.6 Picking Up Momentum 84
 4.2.7 Minibatches and Stochastic Gradient Descent 85
 4.2.8 Other Improvements 85
 4.3 THE MULTI-LAYER PERCEPTRON IN PRACTICE 85
 4.3.1 Amount of Training Data 86
 4.3.2 Number of Hidden Layers 86
 4.3.3 When to Stop Learning 88
 4.4 EXAMPLES OF USING THE MLP 89
 4.4.1 A Regression Problem 89
 4.4.2 Classification with the MLP 92
 4.4.3 A Classification Example: The Iris Dataset 93
 4.4.4 Time-Series Prediction 95
 4.4.5 Data Compression: The Auto-Associative Network 97
 4.5 A RECIPE FOR USING THE MLP 100
 4.6 DERIVING BACK-PROPAGATION 101
 4.6.1 The Network Output and the Error 101
 4.6.2 The Error of the Network 102
 4.6.3 Requirements of an Activation Function 103
 4.6.4 Back-Propagation of Error 104
 4.6.5 The Output Activation Functions 107
 4.6.6 An Alternative Error Function 108
 FURTHER READING 108
 PRACTICE QUESTIONS 109

 CHAPTER 5 Radial Basis Functions and Splines 111
 5.1 RECEPTIVE FIELDS 111
 5.2 THE RADIAL BASIS FUNCTION (RBF) NETWORK 114
 5.2.1 Training the RBF Network 117
 5.3 INTERPOLATION AND BASIS FUNCTIONS 119
 5.3.1 Bases and Basis Expansion 122
 5.3.2 The Cubic Spline 123
 5.3.3 Fitting the Spline to the Data 123
 5.3.4 Smoothing Splines 124
 5.3.5 Higher Dimensions 125
 5.3.6 Beyond the Bounds 127
 FURTHER READING 127
 PRACTICE QUESTIONS 128

 CHAPTER 6 Dimensionality Reduction 129
 6.1 LINEAR DISCRIMINANT ANALYSIS (LDA) 130
 6.2 PRINCIPAL COMPONENTS ANALYSIS (PCA) 133
 6.2.1 Relation with the Multi-layer Perceptron 137
 6.2.2 Kernel PCA 138
 6.3 FACTOR ANALYSIS 141
 6.4 INDEPENDENT COMPONENTS ANALYSIS (ICA) 142
 6.5 LOCALLY LINEAR EMBEDDING 144
 6.6 ISOMAP 147
 6.6.1 Multi-Dimensional Scaling (MDS) 147
 FURTHER READING 150
 PRACTICE QUESTIONS 151

 CHAPTER 7 Probabilistic Learning 153
 7.1 GAUSSIAN MIXTURE MODELS 153
 7.1.1 The Expectation-Maximisation (EM) Algorithm 154
 7.1.2 Information Criteria 158
 7.2 NEAREST NEIGHBOUR METHODS 158
 7.2.1 Nearest Neighbour Smoothing 160
 7.2.2 Efficient Distance Computations: the KD-Tree 160
 7.2.3 Distance Measures 165
 FURTHER READING 167
 PRACTICE QUESTIONS 168

 CHAPTER 8 Support Vector Machines 169
 8.1 OPTIMAL SEPARATION 170
 8.1.1 The Margin and Support Vectors 170
 8.1.2 A Constrained Optimisation Problem 172
 8.1.3 Slack Variables for Non-Linearly Separable Problems 175
 8.2 KERNELS 176
 8.2.1 Choosing Kernels 178
 8.2.2 Example: XOR 179
 8.3 THE SUPPORT VECTOR MACHINE ALGORITHM 179
 8.3.1 Implementation 180
 8.3.2 Examples 183
 8.4 EXTENSIONS TO THE SVM 184
 8.4.1 Multi-Class Classification 184
 8.4.2 SVM Regression 186
 8.4.3 Other Advances 187
 FURTHER READING 187
 PRACTICE QUESTIONS 188

 CHAPTER 9 Optimisation and Search 189
 9.1 GOING DOWNHILL 190
 9.1.1 Taylor Expansion 193
 9.2 LEAST-SQUARES OPTIMISATION 194
 9.2.1 The Levenberg–Marquardt Algorithm 194
 9.3 CONJUGATE GRADIENTS 198
 9.3.1 Conjugate Gradients Example 201
 9.3.2 Conjugate Gradients and the MLP 201
 9.4 SEARCH: THREE BASIC APPROACHES 204
 9.4.1 Exhaustive Search 204
 9.4.2 Greedy Search 205
 9.4.3 Hill Climbing 205
 9.5 EXPLOITATION AND EXPLORATION 206
 9.6 SIMULATED ANNEALING 207
 9.6.1 Comparison 208
 FURTHER READING 209
 PRACTICE QUESTIONS 209

 CHAPTER 10 Evolutionary Learning 211
 10.1 THE GENETIC ALGORITHM (GA) 212
 10.1.1 String Representation 213
 10.1.2 Evaluating Fitness 213
 10.1.3 Population 214
 10.1.4 Generating Offspring: Parent Selection 214
 10.2 GENERATING OFFSPRING: GENETIC OPERATORS 216
 10.2.1 Crossover 216
 10.2.2 Mutation 217
 10.2.3 Elitism, Tournaments, and Niching 218
 10.3 USING GENETIC ALGORITHMS 220
 10.3.1 Map Colouring 220
 10.3.2 Punctuated Equilibrium 221
 10.3.3 Example: The Knapsack Problem 222
 10.3.4 Example: The Four Peaks Problem 222
 10.3.5 Limitations of the GA 224
 10.3.6 Training Neural Networks with Genetic Algorithms 225
 10.4 GENETIC PROGRAMMING 225
 10.5 COMBINING SAMPLING WITH EVOLUTIONARY LEARNING 227
 FURTHER READING 228
 PRACTICE QUESTIONS 229

 CHAPTER 11 Reinforcement Learning 231
 11.1 OVERVIEW 232
 11.2 EXAMPLE: GETTING LOST 233
 11.2.1 State and Action Spaces 235
 11.2.2 Carrots and Sticks: The Reward Function 236
 11.2.3 Discounting 237
 11.2.4 Action Selection 237
 11.2.5 Policy 238
 11.3 MARKOV DECISION PROCESSES 238
 11.3.1 The Markov Property 238
 11.3.2 Probabilities in Markov Decision Processes 239
 11.4 VALUES 240
 11.5 BACK ON HOLIDAY: USING REINFORCEMENT LEARNING 244
 11.6 THE DIFFERENCE BETWEEN SARSA AND Q-LEARNING 245
 11.7 USES OF REINFORCEMENT LEARNING 246
 FURTHER READING 247
 PRACTICE QUESTIONS 247

 CHAPTER 12 Learning with Trees 249
 12.1 USING DECISION TREES 249
 12.2 CONSTRUCTING DECISION TREES 250
 12.2.1 Quick Aside: Entropy in Information Theory 251
 12.2.2 ID3 251
 12.2.3 Implementing Trees and Graphs in Python 255
 12.2.4 Implementation of the Decision Tree 255
 12.2.5 Dealing with Continuous Variables 257
 12.2.6 Computational Complexity 258
 12.3 CLASSIFICATION AND REGRESSION TREES (CART) 260
 12.3.1 Gini Impurity 260
 12.3.2 Regression in Trees 261
 12.4 CLASSIFICATION EXAMPLE 261
 FURTHER READING 263
 PRACTICE QUESTIONS 264

 CHAPTER 13 Decision by Committee: Ensemble Learning 267
 13.1 BOOSTING 268
 13.1.1 AdaBoost 269
 13.1.2 Stumping 273
 13.2 BAGGING 273
 13.2.1 Subagging 274
 13.3 RANDOM FORESTS 275
 13.3.1 Comparison with Boosting 277
 13.4 DIFFERENT WAYS TO COMBINE CLASSIFIERS 277
 FURTHER READING 279
 PRACTICE QUESTIONS 280

 CHAPTER 14 Unsupervised Learning 281
 14.1 THE K-MEANS ALGORITHM 282
 14.1.1 Dealing with Noise 285
 14.1.2 The k-Means Neural Network 285
 14.1.3 Normalisation 287
 14.1.4 A Better Weight Update Rule 288
 14.1.5 Example: The Iris Dataset Again 289
 14.1.6 Using Competitive Learning for Clustering 290
 14.2 VECTOR QUANTISATION 291
 14.3 THE SELF-ORGANISING FEATURE MAP 291
 14.3.1 The SOM Algorithm 294
 14.3.2 Neighbourhood Connections 295
 14.3.3 Self-Organisation 297
 14.3.4 Network Dimensionality and Boundary Conditions 298
 14.3.5 Examples of Using the SOM 300
 FURTHER READING 300
 PRACTICE QUESTIONS 303

 CHAPTER 15 Markov Chain Monte Carlo (MCMC) Methods 305
 15.1 SAMPLING 305
 15.1.1 Random Numbers 305
 15.1.2 Gaussian Random Numbers 306
 15.2 MONTE CARLO OR BUST 308
 15.3 THE PROPOSAL DISTRIBUTION 310
 15.4 MARKOV CHAIN MONTE CARLO 313
 15.4.1 Markov Chains 313
 15.4.2 The Metropolis–Hastings Algorithm 315
 15.4.3 Simulated Annealing (Again) 316
 15.4.4 Gibbs Sampling 318
 FURTHER READING 319
 PRACTICE QUESTIONS 320

 CHAPTER 16 Graphical Models 321
 16.1 BAYESIAN NETWORKS 322
 16.1.1 Example: Exam Fear 323
 16.1.2 Approximate Inference 327
 16.1.3 Making Bayesian Networks 329
 16.2 MARKOV RANDOM FIELDS 330
 16.3 HIDDEN MARKOV MODELS (HMMS) 333
 16.3.1 The Forward Algorithm 335
 16.3.2 The Viterbi Algorithm 337
 16.3.3 The Baum–Welch or Forward–Backward Algorithm 339
 16.4 TRACKING METHODS 343
 16.4.1 The Kalman Filter 343
 16.4.2 The Particle Filter 350
 FURTHER READING 355
 PRACTICE QUESTIONS 356

 CHAPTER 17 Symmetric Weights and Deep Belief Networks 359
 17.1 ENERGETIC LEARNING: THE HOPFIELD NETWORK 360
 17.1.1 Associative Memory 360
 17.1.2 Making an Associative Memory 361
 17.1.3 An Energy Function 365
 17.1.4 Capacity of the Hopfield Network 367
 17.1.5 The Continuous Hopfield Network 368
 17.2 STOCHASTIC NEURONS — THE BOLTZMANN MACHINE 369
 17.2.1 The Restricted Boltzmann Machine 371
 17.2.2 Deriving the CD Algorithm 375
 17.2.3 Supervised Learning 380
 17.2.4 The RBM as a Directed Belief Network 381
 17.3 DEEP LEARNING 385
 17.3.1 Deep Belief Networks (DBN) 388
 FURTHER READING 393
 PRACTICE QUESTIONS 393

 CHAPTER 18 Gaussian Processes 395
 18.1 GAUSSIAN PROCESS REGRESSION 397
 18.1.1 Adding Noise 398
 18.1.2 Implementation 402
 18.1.3 Learning the Parameters 403
 18.1.4 Implementation 404
 18.1.5 Choosing a (set of) Covariance Functions 406
 18.2 GAUSSIAN PROCESS CLASSIFICATION 407
 18.2.1 The Laplace Approximation 408
 18.2.2 Computing the Posterior 408
 18.2.3 Implementation 410
 FURTHER READING 412
 PRACTICE QUESTIONS 413

 APPENDIX A Python 415
 A.1 INSTALLING PYTHON AND OTHER PACKAGES 415
 A.2 GETTING STARTED 415
 A.2.1 Python for MATLAB® and R users 418
 A.3 CODE BASICS 419
 A.3.1 Writing and Importing Code 419
 A.3.2 Control Flow 420
 A.3.3 Functions 420
 A.3.4 The doc String 421
 A.3.5 map and lambda 421
 A.3.6 Exceptions 422
 A.3.7 Classes 422
 A.4 USING NUMPY AND MATPLOTLIB 423
 A.4.1 Arrays 423
 A.4.2 Random Numbers 427
 A.4.3 Linear Algebra 427
 A.4.4 Plotting 427
 A.4.5 One Thing to Be Aware of 429
 FURTHER READING 430
 PRACTICE QUESTIONS 430

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2014 MachineLearningAnAlgorithmicPer	Stephen Marsland			Machine Learning: An Algorithmic Perspective, Second Edition						2014