1997 MachineLearning

Quotes

• 1. Introduction
• 2. Concept Learning and the General-to-Specific Ordering
• 3. Decision Tree Learning
• 4. Artificial Neural Networks
• 5. Evaluating Hypotheses
• 6. Bayesian Learning
• 7. Computational Learning Theory
• 8. Instance-based Learning
• 9. Genetic Algorithms
• 10. Learning Sets of Rules
• 11. Analytical Learning
• 12. Combining Inductive and Analytical Learning
• 13. Reinforcement Learning

1.2.2 Choosing the Target Function.

The next design choice is to determin exasctly what type of kowledge will be learned and how this will be used by the performance program. … Let us call this target function $\displaystyle{ V }$ and again use the notation $\displaystyle{ V }$ : $\displaystyle{ B }$R to denote that $\displaystyle{ V }$ maps any legal board state from the set $\displaystyle{ B }$ to some real value. We intend for this target function $\displaystyle{ V }$ to assign higher scores to better board states … Thus, we have reduced the learning task in this case to the problem of discover an operational description of the ideal target function V. It may be very difficult in general to learn such an operational form of $\displaystyle{ V }$ perfectly. In fact we often expect learning algorithms to acquire only some approximation to the target function, and for this reason the process of learning the target function is often called function approximation. In the current discussion we will use the symbol V^ to refer to the function that is actually learned by our program, to distinguish it from the ideal target function V.

8.2.3

Much of the literature on nearest-neighbor methods and weighted local regression uses a terminology that has arisen from the field of statistical pattern recognition....

• Regression means approximating a real-valued target function.
• Residual is the error f^(x) - $\displaystyle{ f }$(x) in approximating the target function.
• Kernel function is the function of distance that is used to determine the weight of each training example. In other words, the kernel function is the function $\displaystyle{ K }$ such that wi = K(d(xi, xq)).

8.6 Remarks on Lazy and Easter Learning

In this chapter we considered three lazy learning methods: the k-Nearest Neighbor algorithm, locally weighted regression, and case-based reasoning. We call these methods lazy because they defer the decision of how to generalize beyond the training data until each new query instance in encountered. We also discussed on eager learning method: the method for learning radial basis function networks. We call this method eager because it generalize beyond the training data before observe the new query, committing at training time to the network structure and weights that define its approximation to the target function. In this same sense, every other algorithm discussed elsewhere in this book (e.g., Backpropagation, C4.5) is an eager learning algorithm.

• Lazy methods may consider the query instance xq when deciding how to generalize beyond the training data D.
• Eager methods cannot. By the time they observe the query instance xq they have already chosen their (global) approximation to the target function.

The key point in the above paragraph is that a lazy learning has the option of (implicitly) representing the target function by a combination of many local approximations, whereas an eager learner must commit at training time to a single global approximation. The distinction between eager and lazy learning is thus related to the distinction between global and local approximations to the target function.

References

,

volumeDate ValuetitletypejournaltitleUrldoinoteyear
1997 MachineLearningMachine Learninghttp://www.cs.cmu.edu/~tom/mlbook.html