2000 PatternClassification

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Pattern Classification.

Notes

Cited By

Quotes

Book Overview

The first edition, published in 1973, has become a classic reference in the field. Now with the second edition, readers will find information on key new topics such as neural networks and statistical pattern recognition, the theory of machine learning, and the theory of invariances. Also included are worked examples, comparisons between different methods, extensive graphics, expanded exercises and computer project topics.

Table of Contents

 Chapter : Bayesian Decision Theory.
 Chapter : Maximum-Likelihood and Bayesian Parameter Estimation.
 Chapter : Nonparametric Techniques.
 Chapter : Linear Discriminant Functions.
 Chapter : Multilayer Neural Networks.
 Chapter : Stochastic Methods.
 Chapter : Nonmetric Methods.
 Chapter : Algorithm-Independent Machine Learning.
 Chapter : Unsupervised Learning and Clustering.
 Chapter 1: Introduction
   * 1.1 Machine perception
   * 1.2 An example
         o 1.2.1 Related fields
   * 1.3 Pattern recognition systems
         o 1.3.1 Sensing
         o 1.3.2 Segmentation and grouping
         o 1.3.3 Feature extraction
         o 1.3.4 Classification
         o 1.3.5 Post processing
   * 1.4 The design cycle
         o 1.4.1 Data collection
         o 1.4.2 Feature choice
         o 1.4.3 Model choice
         o 1.4.4 training
         o 1.4.5 Evaluation
         o 1.4.6 Computational complexity
   * 1.5 Learning and adaptation
         o 1.5.1 Supervised learning
         o 1.5.2 Unsupervised learning
         o 1.5.3 Reinforcement learning
   * 1.6 Conclusion
   * Summary by chapters
   * Bibliographical and historical remarks
   * Bibliography
 Chapter 2: Bayesian decision theory
   * 2.1 Introduction
   * 2.2 Bayesian decision theory–Continuous features
         o 2.2.1 Two-category classification
   * 2.3 Minimum-error-rate classification
         o 2.3.1 Minimax criterion
         o 2.3.2 Neyman-Pearson criterion
   * 2.4 Classifiers, discriminant functions, and decision surfaces
         o 2.4.1 The multicategory case
         o 2.4.2 The two-category case
   * 2.5 The normal density
         o 2.5.1 Univariate density
         o 2.5.2 Multivariate density
   * 2.6 Discriminant function for the normal density
         o 2.6.1 Case 1: Σi = σ2 I
         o 2.6.2 Case 2: Σi = Σ
         o 2.6.2 Case 3: Σi = arbitrary
   * 2.7 Error probabilities and integrals
   * 2.8 Error bounds for normal densities
         o 2.8.1 Chernoff bound
         o 2.8.2 Bhattacharyya bound
         o 2.8.3 Signal detection theory and operating characteristics
   * 2.9 Bayes decision theory — discrete features
         o 2.9.1 Independent binary features
   * 2.10 Missing and noisy features
         o 2.10.1 Missing features
         o 2.10.2 Noisy features
   * 2.11 Bayesian belief networks
   * 2.12 Compound Bayesian decision theory and context
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 3: Maximum-likelihood and Bayesian parameter estimation
   * 3.1 Introduction
   * 3.2 Maximum-likelihood estimation
         o 3.2.1 The general principle
         o 3.2.2 The Gaussian case: Unknown μ
         o 3.2.3 The Gaussian case: Unknown μ and Σ
         o 3.2.4 Bias
   * 3.3 Bayesian estimation
         o 3.3.1 The class-conditional densities
         o 3.3.2 The parameter distribution
   * 3.4 Bayesian parameter estimation: Gaussian case
         o 3.4.1 The univariate case: p(μ|D)
         o 3.4.2 The univariate case: p(x|D)
         o 3.4.3 The multivariate case
   * 3.5 Bayesian parameter estimation: General theory
         o 3.5.1 When do maximum-likelihood and Bayes methods differ?
         o 3.5.2 Noninformative priors and invariance
         o 3.5.3 Gibbs algorithm
   * 3.6 Sufficient statistics
         o 3.6.1 Sufficient statistics and the exponential family
   * 3.7 Problems of dimensionality
         o 3.7.1 Accuracy, dimension, ad training sample size
         o 3.7.2 Computational complexity
         o 3.7.3 Overfitting
   * 3.8 Component analysis and discriminants
         o 3.8.1 Principal component analysis (PCA)
         o 3.8.2 Fisher linear discriminant
         o 3.8.3 Multiple discriminant analysis
   * 3.9 Expectation-maximization (EM)
   * 3.10 Hidden Markov models
         o 3.10.1 First-order Markov models
         o 3.10.2 First-order hidden Markov models
         o 3.10.3 Hidden Markov model computation
         o 3.10.4 Evaluation
         o 3.10.5 Decoding
         o 3.10.6 Learning
   * Summary
   * Biographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 4: Nonparametric techniques
   * 4.1 Introduction
   * 4.2 Density estimation
   * 4.3 Parzen windows
         o 4.3.1 Convergence of the mean
         o 4.3.2 Convergence of the variance
         o 4.3.3 Illustrations
         o 4.3.4 Classification problem
         o 4.3.5 Probabilistic neuralnetworks (PNNs)
         o 4.3.6 Choosing the window function
   * 4.4 kn-nearest-neighbor estimation
         o 4.4.1 kn-nearest-neighbor and Parzen-window estimation
         o 4.4.2 Estimation of a priori probabilities
   * 4.5 The nearest-neighbor rule
         o 4.5.1 Convergence of the nearest neighbor
         o 4.5.2 Error rate for the nearest-neighbor rule
         o 4.5.3 Error bounds
         o 4.5.4 the k-nearest-neighbor rule
         o 4.5.5 Computational complexity of the k-nearest-neighbor rule
   * 4.6 Metrics and nearest-neighbor classification
         o 4.6.1 Properties of metrics
         o 4.6.2 Tangent distance
   * 4.7 Fuzzy classification
   * 4.8 Reduced Coulomb energy networks
   * 4.9 Approximations by series expansions
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 5: Linear discriminant functions
   * 5.1 Introduction
   * 5.2 Linear discriminant functions and decision surfaces
         o 5.2.1 The two-category case
         o 5.2.2 The multicategory case
   * 5.3 Generalized linear discriminant functions
   * 5.4 The two-category linearly separable case
         o 5.4.1 Geometry and terminology
         o 5.4.2 Gradient descent procedures
   * 5.5 Minimizing the perceptron criterion function
         o 5.5.1 The perceptron criterion function
         o 5.5.2 Convergence proof for single-sample correction
         o 5.5.3 Some direct generalizations
   * 5.6 Relaxation procedures
         o 5.6.1 The descent algorithm
         o 5.6.2 Convergence proof
   * 5.7 Nonseparable behavior
   * 5.8 Minimum squared-error procedures
         o 5.8.1 Minimumsquared-error and the pseudoinverse
         o 5.8.2 Relation to Fisher's linear discriminant
         o 5.8.3 Asymptotic approximation to an optimal discriminant
         o 5.8.4 The Widrow-Hoff or LMS procedure
         o 5.8.5 Stochastic approximation methods
   * 5.9 The Ho-Kashyap procedures
         o 5.9.1 The descent procedure
         o 5.9.2 Convergence proof
         o 5.9.3 Nonseparable bhaviorUntitled Item
         o 5.9.4 Some related procedures
   * 5.10 Linear programming algorithms
         o 5.10.1 Linear programming
         o 5.10.2 The linearly separable case
         o 5.10.3 Minimizing the perceptron criterion function
   * 5.11 Support vector machines
         o 5.11.1 SVM training
   * 5.12 Multicategory generalizations
         o 5.12.1 Kesler's construction
         o 5.12.2 Convergence of the fixed-increment rule
         o 5.12.3 Generalizations for MSE procedures
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 6: Multilayer neural networks
   * 6.1 Introduction
   * 6.2 Feedforward operation and classification
         o 6.2.1 General feedforward operation
         o 6.2.2 Expressive power of multilayer networks
   * 6.3 Backpropagation algorithm
         o 6.3.1 Network learning
         o 6.3.2 Training protocols
         o 6.3.3 Learning curves
   * 6.4 Error surfaces
         o 6.4.1 Some small networks
         o 6.4.2 The exclusive-OR (XOR)
         o 6.4.3 Larger networks
         o 6.4.4 How important are multiple minima?
   * 6.5 Backpropagation as feature mapping
         o 6.5.1 Representations at the hidden layer — weights
   * 6.6 Backpropagation, Bayes theory and probability
         o 6.6.1 Bayes discriminants and neural networks
         o 6.6.2 Outputs as probabilities
   * 6.7 Related statistical techniques
   * 6.8 Practical techniques for improving backpropagation
         o 6.8.1 Activation function
         o 6.8.2 Parameters for the sigmoid
         o 6.8.3 Scaling input
         o 6.8.4 Target values
         o 6.8.5 Training with noise
         o 6.8.6 Manufacturing data
         o 6.8.9 Learning rates
         o 6.8.10 Momentum
         o 6.8.11 Weight decay
         o 6.8.12 Hints
         o 6.8.13 On-line, stochastic or batch training?
         o 6.8.14 Stopped training
         o 6.8.15 Number of hidden layers
         o 6.8.16 Criterion function
   * 6.9 Second-order methods
         o 6.9.1 Hessian matrix
         o 6.9.2 Newton's method
         o 6.9.3 Quickprop
         o 6.9.4 Conjugate gradient descent
   * 6.10 Additional networks and training methods
         o 6.10.1 Radial basis function networks (RBFs)
         o 6.10.2 Special bases
         o 6.10.3 Matched filters
         o 6.10.4 Convolutional networks
         o 6.10.5 Recurrent networks
         o 6.10.6 Cascade-correlation
   * 6.11 Regularization, complexity adjustment and pruning
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 7: Stochastic methods
   * 7.1 Introduction
   * 7.2 Stochastic search
         o 7.2.1 Simulated annealing
         o 7.2.2 The Boltzmann factor
         o 7.2.3 Deterministic simulated anealing
   * 7.3 Boltzmann learning
         o 7.3.1 Stochastic Boltmann learning of visible states
         o 7.3.2 Missing features and category constraints
         o 7.3.3 Deterministic Boltzmann learning
         o 7.3.4 Initialization and setting parameters
   * 7.4 Boltzmann networks and graphical models
         o 7.4.1 Other graphical models
   * 7.5 Evolutionary methods
         o 7.5.1 Genetic algorithms
         o 7.5.2 Further heuristics
         o 7.5.3 Why do they work?
   * 7.6 Genetic programming
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 8: Nonmetric methods
   * 8.1 Introduction
   * 8.2 Decision trees
   * 8.3 CART
         o 8.3.1 Number of splits
         o 8.3.2 Query selection and node impurity
         o 8.3.3 When to stop splitting
         o 8.3.4 Pruning
         o 8.3.5 Assignment of leaf node labels
         o 8.3.6 Computational complexity
         o 8.3.7 Feature choice
         o 8.3.8 Multivariate decision trees
         o 8.3.9 Priors and costs
         o 8.3.10 Missing attributes
   * 8.4 Other tree methods
         o 8.4.1 ID3
         o 8.4.2 C4.5
         o 8.4.3 Which tree classifier is best?
   * 8.5 Recognition with strings
         o 8.5.1 String matching
         o 8.5.2 Edit distance
         o 8.5.3 Computational complexity
         o 8.5.4 String matching with errors
         o 8.5.5 String matching with the "don't care" symbol
   * 8.6 Grammatical methods
         o 8.6.1 Grammars
         o 8.6.2 Types of string grammars
         o 8.6.3 Recognition using grammars
   * 8.7 Grammatical inference
   * 8.8 Rule-based methods
         o 8.8.1 Learning rules
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 9: Algorithm-independent machine learning
   * 9.1 Introduction
   * 9.2 Lack of inherent superiority of any classifier
         o 9.2.1 No free lunch theorm
         o 9.2.2 Ugly duckling theorem
         o 9.2.3 Minimum description length (MDL)
         o 9.2.4 Minimum description length principle
         o 9.2.5 Overfitting avoidance and Occam's razor
   * 9.3 Bias and variance
         o 9.3.1 Bias and variance for regression
         o 9.3.2 Bias and variance for classification
   * 9.4 Resampling for estimating statistics
         o 9.4.1 Jackknife
         o 9.4.2 Bootstrap
   * 9.5 Resampling for classifier design
         o 9.5.1 Bagging
         o 9.5.2 Boosting
         o 9.5.3 Learning with queries
         o 9.5.4 Arcing, learning with queries, bias and variance
   * 9.6 Estimating and comparing classifiers
         o 9.6.1 Parametric models
         o 9.6.2 Cross-validation
         o 9.6.3 Jackknife and bootstrap estimation of classification accuracy
         o 9.6.4 Maximum-likelihood model comparison
         o 9.6.5 Bayesian model comparison
         o 9.6.6 The problem-average error rate
         o 9.6.7 Predicting final performance from learning curves
         o 9.6.8 The capacity of a separating plane
   * 9.7 Combining classifiers
         o 9.7.1 Component classifiers with discriminant functions
         o 9.7.2 Component classifiers without discriminant functions
   * Summary
   * Bibliographical and historical remarks
   * Problems
   * Computer exercises
   * Bibliography
 Chapter 10: Unsupervised learning and clustering
   * 10.1 Introduction
   * 10.2 Mixture densities and identifiability
   * 10.3 Maximum-likelihood estimates
   * 10.4 Application to normal densities
         o 10.4.1 Case 1: Unknown mean vectors
         o 10.4.2: Case 2: All parameters unknown
         o 10.4.3: k-means clustering
         o 10.4.4: Fuzzy k-means clustering
   * 10.5: Unsupervised Bayesian learning
         o 10.5.1: The Bayes classifier
         o 10.5.2: Learning the parameter vector
         o 10.5.3: Decision-directed approximation
   * 10.6 Data description and clustering
         o 10.6.1: Similarity measures
   * 10.7: Criterion functions for clustering
         o 10.7.1 The sum-of-square-error criterion
         o 10.7.2: Related minimum variance criterion
         o 10.7.3: Scatter criteria
   * 10.8: Iterative optimization
   * 10.9: Hierarchical clustering
         o 10.9.1: Definitions
         o 10.9.2: Agglomerative hierarchical clustering
         o 10.9.3: Stepwise-optimal hierarchical clustering
         o 10.9.4: Hierarchical clustering and induced metrics
   * 10.10: The problem of validity
   * 10.11: On-line clustering
         o 10.11.1: Unknown number of clusters
         o 10.11.2: Adaptive resonance
         o 10.11.3: Learning with a critic
   * 10.12: Graph-theoretic methods
   * 10.13: Component analysis
         o 10.13.1: Principal component analysis (PCA)
         o 10.13.2: Nonlinear component analysis (NLCA)
         o 10.13.3: Independent component analysis (ICA)
   * 10.14: Low-dimensional representtions and multidimensional scaling (MDS)
         o 10.14.1: Self-organizing feature maps
         o 10.14.2: Clustering and dimensionality reduction

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2000 PatternClassificationRichard O. Duda
Peter E. Hart
David G. Stork
2000 Year
Randomized Algorithmshttp://books.google.com/books?id=QKVY4mDivBEC