Subject Headings: Machine Learning Research, Machine Learning Algorithm
- A scientific field is best defined by the central question it studies. The field of Machine Learning seeks to answer the question “How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?”
- Whereas Computer Science has focused primarily on how to manually program computers, Machine Learning focuses on the question of how to get computers to program themselves (from experience plus some initial structure).
- Whereas Statistics has focused primarily on what conclusions can be inferred from data, Machine Learning incorporates additional questions about what computational architectures and algorithms can be used to most effectively capture, store, index, retrieve and merge these data, how multiple learning subtasks can be orchestrated in a larger system, and questions of computational tractability.
- A Computer Scientist asks: “How can we build machines that solve problems, and which problems are inherently tractable/intractable?”
- A Statistician asks: “What can be inferred from data plus a set of modeling assumptions, with what reliability?”
- For example, economics is interested in questions such as how distributed collections of self-interested individuals may form a system (market) that learns prices leading to pareto-optimal allocations for the greatest common good. And control theory, especially adaptive control theory, is interested in questions such as how a servo-control system can improve its control strategy through experience. Interestingly, the mathematical models for adaptation in these other fields are somewhat different from those commonly used in machine learning, suggesting significant potential for cross-fertilization of models and theories.
Place of Machine Learning within Computer Science
- machine learning methods are already the best methods available for developing particular types of software, in applications where:
- The application is too complex for people to manually design the algorithm. For example, software for sensor-base perception tasks, such as speech recognition and computer vision, fall into this category. All of us can easily label which photographs contain a picture of our mother, but none of us can write down an algorithm to perform this task. Here machine learning is the software development method of choice simply because it is relatively easy to collect labeled training data, and relatively ineffective to try writing down a successful algorithm.
- The application requires that the software customize to its operational environment after it is fielded. One example of this is speech recognition systems that customize to the user who purchases the software. Machine learning here provides the mechanism for adaptation. Software applications that customize to users are growing rapidly - e.g., bookstores that customize to your purchasing preferences, or email readers that customize to your particular definition of spam. This machine learning niche within the software world is growing rapidly.
Some Current Research Questions
- Here is a sample of current research questions:
- Can unlabeled data be helpful for supervised learning? Supervised learning involves estimating some function f : X ! Y given a set of labeled training examples fhxi; yiig. We could dramatically reduce the cost of supervised learning if we could make use of unlabeled data as well (e.g., images that are unlabeled). Are there situations where unlabeled data can be guaranteed to improve the expected learning accuracy? Interesting, the answer is yes, for several special cases of learning problems that satisfy additional assumptions. These include practical problems such as learning to classify web pages or spam. Exploration of new algorithms and new subclasses of problems where unlabeled data is provably useful is an active area of current research.
- How can we transfer what is learned for one task to improve learning in other related tasks?. Note the above formulation of supervised learning involves learning a single function f. In many practical problems we might like to learn a family of related functions (e.g., a diagnosis function for patients in New York hospitals, and one for patients in Tokyo hospitals). Although we expect the diagnosis function to be somewhat different in the two cases, we also expect some commonalities. Methods such as hierarchical Bayesian approaches provide one way to tackle this problem, by assuming the learning parameters of the NY function and the Tokyo function share similar prior probabilities, but allowing the data from each hospital to override these priors as appropriate. The situation becomes more subtle when the transfer between functions is more complex – e.g.., a robot learning both a next-state function and a function to chose control actions should be able to learn better by taking advantage of the logical relationship between these two types of learned information.
- What is the relationship between different learning algorithms, and which should be used when?. Many different learning algorithms have been proposed and evaluated experimentally in different application domains. One theme of research is to develop a theoretical understanding of the relationships among these algorithms, and of when it is appropriate to use each. For example, two algorithms for supervised learning, Logistic Regression and the Naive Bayes classifier, behave differently on many data sets, but can be proved to be equivalent when applied to certain types of data sets (i.e., when the modeling assumptions of Naive Bayes are satisfied, and as the number of training examples approaches infinity). This understanding suggests, for example, that Naive Bayes should be preferred if data is sparse but one is confident of the modeling assumptions. More generally, the theoretical characterization of learning algorithms, their convergence properties, and their relative strengths and weaknesses remains a major research topic.
- For learners that actively collect their own training data, what is the best strategy? Imagine a mobile robot charged with the task of learning to find its master’s slippers anywhere in the house, and imagine that it is allowed to practice during the day, by viewing the slippers from different viewpoints of its choice, and moving the slippers to different locations with different lighting conditions. What is the most efficient training strategy for actively collecting new data as its learning proceeds? A second example of this problem involves drug testing where one wishes to learn the drug effectiveness while minimizing the exposure of patients to possible unknown side effects. This is a part of a more broad research thrust into learning systems that take more active control over the learning setting, rather than passively using data collected by others.
- To what degree can we have both data privacy and the benefits of data mining? There are many beneficial uses of machine learning, such as training a medical diagnosis system on data from all hospitals in the world, which are not being pursued largely because of privacy considerations. Although at first it might seem that we must choose between privacy and the benefits of data mining, in fact we might be able to have both in some cases. For example, rather than forcing hospitals to sacrifice privacy and pass around their patient records to a central data repository, we might instead pass around a learning algorithm to the hospitals, allowing each to run it under certain restrictions, then pass it along to the next hospital. This is an active research area, building both on past statistical work on data disclosure and on more recent cryptographic approaches.
Longer Term Research Questions
- Below are some additional research topics which I feel hold the potential to significantly change the face of machine learning over the coming decade.
- Can we build never-ending learners? The vast majority of machine learning work to date involves running programs on particular data sets, then putting the learner aside and using the result. In contrast, learning in humans and other animals is an ongoing process in which the agent learns many different capabilities, often in a sequenced curriculum, and uses these different learned facts and capabilities in a highly synergistic fashion. Why not build machine learners that learn in this same cumulative way, becoming increasingly competent rather than halting at some plateau? For example, a robot in the same office building for months or years should learn a variety of capabilities, starting with simpler tasks (e.g., how to recognize objects in that dark end of the hallway), to more complex problems that build on previous learning (e.g., where to look first to find the missing recycling container). Similarly, a program to learn to read the web might learn a graded set of capabilities beginning with simpler abilities such as learning to recognize names of people and places, and extending to extracting complex relational information spread across multiple sentences and web pages. A key research issue here is self-supervised learning and constructing an appropriate graded curriculum.
- Can machine learning theories and algorithms help explain human learning? Recently, theories and algorithms from machine learning have been found relevant to understanding aspects of human and animal learning. For example, reinforcement learning algorithms and theories predict surprisingly well the neural activity of dopaminergic neurons in animals during reward-based learning. And machine learning algorithms for discovering sparse representations of naturally occurring images predict surprisingly well the types of visual features found in the early visual cortex of animals. However, theories of animal learning involve considerations that have not yet been considered in machine learning, such as the role of motivation, fear, urgency, forgetting, and learning over multiple time scales. There is a rich opportunity for cross fertilization here, an opportunity to develop a general theory of learning processes covering animals as well as machines, and potential implications for improved strategies for teaching students.
- Can we design programming languages containing machine learning primitives? Can a new generation of computer programming languages directly support writing programs that learn? In many current machine learning applications, standard machine learning algorithms are integrated with handcoded software into a final application program. Why not design a new computer programming language that supports writing programs in which some subroutines are hand-coded while others are specified as “to be learned.” Such a programming language could allow the programmer to declare the inputs and outputs of each “to be learned” subroutine, then select a learning algorithm from the primitives provided by the programming language. Interesting new research issues arise here, such as designing programming language constructs for declaring what training experience should be given to each “to be learned” subroutine, when, and with what safeguards against arbitrary changes to program behavior.
- Will computer perception merge with machine learning? Given the increasing use of machine learning for state-of-the-art computer vision, computer speech recognition, and other forms of computer perception, can we develop a general theory of perception grounded in learning processes? One intriguing opportunity here the incorporation of multiple sensory modalities (e.g., vision, sound, touch) to provide a setting in which self-supervised learning could be applied to predict one sensory experience from the others. Already researchers in developmental psychology and education have observed that learning can be more effective when people are provided multiple input modalities, and work on co-training methods from machine learning suggests the same.
3 Where to Learn More
- To find out more about Machine Learning, see the top conferences and journals in the field, including: