2008 CuriousMachinesActiveLearningwi

(Settles, 2008) ⇒ Burr Settles. (2008). “Curious Machines: Active Learning with Structured Instances.” PhD. Thesis, University of Wisconsin at Madison. ISBN: 978-1-109-04741-7.

Subject Headings: Active Learning, Named Entity Recognition

Notes

Cited By

Quotes

Abstract

Supervised machine learning is a branch of artificial intelligence concerned with automatically inducing predictive models from labeled data. Such learning approaches are useful for many interesting real-world applications, but particularly shine for tasks involving the automatic organization, extraction, and retrieval of information from large collections of data (e.g., text, images, and other digital media).

In traditional supervised learning, one uses “labeled” training data to induce a model. However, labeled instances for real-world applications are often difficult, expensive, or time consuming to obtain. Consider a complex task such as extracting key person and organization names from text documents. While gathering large amounts of unlabeled documents for these tasks is often relatively easy (e.g., from the World Wide Web), labeling these texts usually requires experienced human annotators with specific domain knowledge and training. There are implicit costs associated with obtaining these labels from domain experts, such as limited time and financial resources. This is especially true for applications that involve learning from instances with complex structures, which can require labels at varying levels of granularity.

Active learning addresses this inherent bottleneck by allowing the learner to selectively choose which parts of the available data are labeled for training. The goal is to maximize the accuracy of the learner through such “queries,” while minimizing the work required of human annotators. In this thesis, I explore several important questions regarding active learning for these and similar tasks involving structured instances. What query strategies are available for these learning algorithms, and how do they compare? How might a learner pose queries at different levels of granularity, as with multiple-instance learning? Are there relationships between certain properties of a query and its difficulty for the annotator? If so, can these relationships be learned and exploited during active learning? The answers to the questions illustrate the utility and promise of active learning algorithms in complex real-world learning systems.

…

Thesis Statement

This thesis aims to explore various key aspects of active learning for tasks that involve structured instances. The chapters that follow (i) describe machine learning approaches to various structured learning tasks, (ii) present the active learning scenarios and algorithms I have developed for these learning methods, and (iii) discuss how these approaches can mitigate the amount of work required to acquire labeled data in practice. Specifically, I focus on the following hypotheses:

i. Strategies that take into account how “representative” or “relevant” query instances are can produce more accurate systems with fewer labeled instances than strategies that do not.
ii. When querying instances with complex structures (e.g., labels on individual words in a sentence), strategies that consider the structured instance as a whole can perform better than strategies that aggregate individual label information.
iii. For some structured instances, labels can be acquired at multiple levels of granularity (e.g., documents and paragraphs). By selectively querying at these various granularities, particularly when one is easier to label than another, we can even further reduce annotation effort.
iv. Not all instances have equal annotation cost. To truly minimize the cost of acquiring labeled data, an active learning system should not only consider how informative each query is to the learner, but also take into account how expensive it will be for an annotator to label.

…

Active Learning

There are three general scenarios in which active learning is possible: (i) query instances may be synthesized by the learner de novo, (ii) instances are provided in a stream and the learner chooses to query or discard each one sequentially, or (iii) there exists a large pool U of unlabeled data which the learner may examine and select queries from. For many real-world tasks, synthesizing queries de novo can lead to instances that are unnatural or difficult for humans to interpret. For example, Baum and Lang (1992) found that a model learning to recognize handwritten characters generated query images that were not real characters at all, but artificial combinations of existing letters and digits. Therefore, the stream-based and pool-based scenarios are often more realistic. In this thesis, I focus on the pool-based setting, since large repositories of unlabeled texts, images, and the like are usually available for these sorts of problems.

…

Biomedical Named Entity Recognition

Named entity recognition (NER) is a subtask of information extraction, focused on finding mentions of various entities that belong to semantic classes of interest. In the biomedical domain, entities of interest are usually references to genes, proteins, cell types, and the like.

…

References

N. Abe and H. Mamitsuka. Query learning strategies using boosting and bagging. In: Proceedings of the International Conference on Machine Learning (ICML), pages 1–9. Morgan Kaufmann, 1998.
A. Abi-Haidar, J. Kaur, A. Maguitman, P. Radivojac, A. Retchsteiner, K. Verspoor, Z. Wang, and L.M. Rocha. Uncovering protein-protein interactions in the bibliome. In: Proceedings of the BioCreative2 Workshop, pages 247–255, 2007.
S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS), volume 15, pages 561–568. MIT Press, 2003.
D. Angluin. Queries revisited. In: Proceedings of the International Conference on Algorithmic Learning Theory, pages 12–31. Springer-Verlag, 2001.
D. Angluin. Queries and concept learning. Machine Learning, 2:319–342, 1988..
M.F. Balcan, A. Beygelzimer, and J. Langford. Agnostic active learning. In: Proceedings of the International Conference on Machine Learning (ICML), pages 65–72. ACM Press, 2006.
M.F. Balcan, S. Hanneke, and J. Wortman. The true sample complexity of active learning. In: Proceedings of the Conference on Learning Theory (COLT), pages 45–56. Springer, 2008.
J. Baldridge and M. Osborne. Active learning and the total cost of annotation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9–16. ACL Press, 2004.
A. Bateman. Editorial. Nucleic Acids Research, 36(Database issue):D1, 2008.

E.B. Baum and K. Lang. Query learning can work poorly when a human oracle is used. In: Proceedings of the IEEE International Joint Conference on Neural Networks, 1992.

J. Baxter, A. Tridgell, and L. Weaver. Reinforcement learning and chess. In J. Furnkranz and M. Kubat, editors, Machines that Learn to Play Games, pages 91–116. Nova Science Publishers, 2001.
A.L. Berger, V.J. Della Pietra, and S.A. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996. 113
S. Bethard, Z. Lu, J.H. Martin, and L. Hunter. Semantic role labeling for protein transport predicates. BMC Bioinformatics, 9:277, 2008.
F.R. Blattner, G. Plunkett, C.A. Bloch, N.T. Perna, V. Burland, M. Riley, J. Collado-Vides, J.D. Glasner, C.K. Rode, G.F. Mayhew, J. Gregor, N.W. Davis, H.A. Kirkpatrick, M.A. Goeden, D.J. Rose, B. Mau, and Y. Shao. The complete genome sequence of Escherichia coli K-12. Science, 277:1453–1474, 1997.
A. Blum and Tom M. Mitchell. Combining Labeled and Unlabeled Data with Co-training. In: Proceedings of the Conference on Learning Theory (COLT), pages 92–100. Morgan Kaufmann, 1998.
C. Bonwell and J. Eison. Active Learning: Creating Excitement in the Classroom. Jossey-Bass, 1991.
Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
K. Brinker. Incorporating diversity in active learning with support vector machines. In: Proceedings of the International Conference on Machine Learning (ICML), pages 59–66. AAAI Press, 2003.
T. Brow, B. Settles, and M. Craven. Classifying biomedical articles by making localized decisions. In: Proceedings of the Text Retrieval Conference (TREC), 2006.
C. Buciluˇa, Rich Caruana, and A. Niculescu-Mizil. Model compression. In: Proceedings of the

International Conference on Knowledge Discovery and Data Mining (KDD), pages 535–541. ACM Press, 2006.

A. Cakmak and G. Ozsoyoglu. Annotating genes using textual patterns. In: Proceedings of the

Pacific Symposium on Biocomputing (PSB), volume 12, pages 221–232. World Scientific Press, 2007.

V.R. Carvalho and W. Cohen. Learning to extract signature and reply lines from email. In: Proceedings

of the Conference on Email and Anti-Spam (CEAS), 2004.

N. Cesa-Bianchi, C. Gentile, A. Tironi, and L. Zaniboni. Worst-case analysis of selective sampling

for linear-threshold algorithms. In Advances in Neural Information Processing Systems (NIPS), volume 17, pages 233–240. MIT Press, 2005.

J.T. Chang, Hinrich Schütze, and R.B. Altman. GAPSCORE: finding gene and protein names one word

at a time. Bioinformatics, 20(2):216–225, 2004.

O. Chapelle, P. Haffner, and Vladimir N. Vapnik. Support vector machines for histogram-based image

classification. IEEE Transactions on Neural Networks, 10(5):1055–1064, 1999.

D. Cohn. Neural network exploration using optimal experiment design. In Advances in Neural Information Processing Systems (NIPS), volume 6, pages 679–686. Morgan Kaufmann, 1994.
D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201–221, 1994.

D. Cohn, Zoubin Ghahramani, and M.I. Jordan. Active learning with statistical models. Journal of Artificial Intelligence Research, 4:129–145, 1996. C. Cortes and Vladimir N. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. M. Craven and J. Shavlik. Extracting tree-structured representations of trained networks. In Advances in Neural Information Processing Systems (NIPS), volume 8, pages 24–30. MIT Press, 1996.

M. Craven, D. DiPasquo, Dayne Freitag, Andrew McCallum, Tom M. Mitchell, K. Nigam, and S. Slattery. Learning

to extract symbolic knowledge from the world wide web. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 509–516. AAAI Press, 1998. Aron Culottaand Andrew McCallum. Reducing labeling effort for stuctured prediction tasks. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 746–751. AAAI Press, 2005. I. Dagan and S. Engelson. Committee-based sampling for training probabilistic classifiers. In Proceedings of the International Conference on Machine Learning (ICML), pages 150–157. Morgan Kaufmann, 1995. S. Dasgupta. Analysis of a greedy active learning strategy. In Advances in Neural Information Processing Systems (NIPS), volume 16, pages 337–344. MIT Press, 2004. S. Dasgupta, A. Kalai, and C. Monteleoni. Analysis of perceptron-based active learning. In: Proceedings of the Conference on Learning Theory (COLT), pages 249–263. Springer, 2005.

S. Dasgupta, D. Hsu, and C. Monteleoni. A general agnostic active learning algorithm. In Advances in Neural Information Processing Systems (NIPS), volume 20, pages 353–360. MIT Press, 2008.
V.R. de Sa. Learning classification with unlabeled data. In Advances in Neural Information Processing Systems (NIPS), volume 6, pages 112–119. MIT Press, 1994.
Arthur P. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the EM

algorithm. Journal of the Royal Statistical Society, 39:1–38, 1977.

Thomas G. Dietterich, R. Lathrop, and T. Lozano-Perez. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 89:31–71, 1997.
G. Druck, G. Mann, and Andrew McCallum. Learning from labeled features using generalized expectation criteria. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 595–602. ACM Press, 2008.
R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley-Interscience, 2001.
R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge

University Press, 1998.

J.T. Eppig, C.J. Bult, J.A. Kadin, J.E. Richardson, J.A. Blake, and the members of the Mouse

Genome Database Group. The Mouse Genome Database (MGD): from genes to mice– a community resource for mouse biology. Nucleic Acids Research, 33:D471–D475, 2005. http://www.informatics.jax.org.

V. Federov. Theory of Optimal Experiments. Academic Press, 1972.
A. Figueroa and G. Neumann. Identifying protein-protein interactions in biomedical publications.

In: Proceedings of the BioCreative2 Workshop, pages 217–225, 2007.

Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application

to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.

Yoav Freund, H.S. Seung, E. Shamir, and N. Tishby. Selective samping using the query by committee

algorithm. Machine Learning, 28:133–168, 1997.

C.M. Friedrich, T. Revillion, M. Hofmann, and J. Fluck. Biomedical and chemical named entity recognition with conditional random fields: The advantage of dictionary features. In: Proceedings of the International Symposium on Semantic Mining in Biomedicine (SMBM), pages 85–89, 2006.
A. Fujii, T. Tokunaga, K. Inui, and H. Tanaka. Selective sampling for example-based word sense

disambiguation. Computational Linguistics, 24(4):573–597, 1998.

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural

Computation, 4:1–58, 1992.

R. Gilad-Bachrach, A. Navot, and N. Tishby. Query by committee made real. In Advances in Neural Information Processing Systems (NIPS), volume 18, pages 443–450. MIT Press, 2006.

The GO Consortium. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research, 32:D258–D261, (2004). http://www.geneontology.org.

T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, * J. Downing, M. Caligiuri, C. Bloomfield, and E. Lander. Molecular classification of cancer:

Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.

G. Gonzalez, L. Tari, A. Gitter, R. Leaman, S. Nikkila, R. Wendt, A. Zeigler, and C. Baral. Integrating knowledge extracted from biomedical literature: Normalization and evidence statements for interactions. In: Proceedings of the BioCreative2 Workshop, pages 227–235, 2007.
R. Greiner, A. Grove, and Dan Roth. Learning cost-sensitive active classifiers. Artificial Intelligence, 139:137–174, 2002.
Y. Guo and R. Greiner. Optimistic active learning using mutual information. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 823–829. AAAI Press, 2007.
Y. Guo and D. Schuurmans. Discriminative batch mode active learning. In Advances in Neural

Information Processing Systems (NIPS), number 20, pages 593–600. MIT Press, Cambridge, MA, 2008.

S. Hanneke. A bound on the label complexity of agnostic active learning. In: Proceedings of the

International Conference on Machine Learning (ICML), pages 353–360. ACM Press, 2007.

A. Hauptmann, W. Lin, R. Yan, J. Yang, and M.Y. Chen. Extreme video retrieval: joint maximization

of human and computer performance. In: Proceedings of the ACM Workshop on Multimedia Image Retrieval, pages 385–394. ACM Press, 2006.

D. Haussler. Learning conjunctive concepts in structural domains. Machine Learning, 4(1):7–40, 1994.
W. Hersh, A.M. Cohen, P. Roberts, and H.K. Rekapalli. Trec 2006 genomics track overview. In

Proceedings of the Text Retrieval Conference (TREC), 2007.

W.R. Hersh, C. Buckley, T.J. Leone, and D.H. Hickam. OHSUMED: An interactive retrieval evaluation

and new large test collection for research. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 192–201. ACM Press, 1994.

S.C.H. Hoi, R. Jin, and M.R. Lyu. Large-scale text categorization by batch mode active learning.

In: Proceedings of the International Conference on the World Wide Web, pages 633–642. ACM Press, 2006a.

S.C.H. Hoi, R. Jin, J. Zhu, and M.R. Lyu. Batch mode active learning and its application to medical

image classification. In: Proceedings of the International Conference on Machine Learning (ICML), pages 417–424. ACM Press, 2006b.

D. Hoiem, A. Efros, and M. Hebert. Putting objects in perspective. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pages 2137–2144. IEEE Press, 2006.
A. Huang, S. Ding, H. Wang, and X. Zhu. Mining physical protein-protein interactions from literature. In: Proceedings of the BioCreative2 Workshop, 2007.
J. Hull. A database for handwriting recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5):550–554, 1994.
R. Hwa. Sample selection for statistical parsing. Computational Linguistics, 30(3):73–77, 2004.
R Kabiljo, D Stoycheva, and AJ Shepard. ProSpecTome: A new tagged corpus for protein named entity recognition. In: Proceedings of the ISMB BioLINK, pages 24–27. Oxford University Press, 2007.
A. Kapoor, Eric Horvitz, and Sugato Basu. Selective supervision: Guiding supervised learning with

decision-theoretic active learning,. In: Proceedings of International Joint Conference on Artifi- cial Intelligence (IJCAI), pages 877–882. AAAI Press, 2007.

J. Kim, T. Ohta, Y. Teteisi, and Jun'ichi Tsujii. GENIA corpus — a semantically annotated corpus for

bio-textmining. Bioinformatics, 19(suppl. 1):i180–i182, 2003.

J. Kim, T. Ohta, Yoshimasa Tsuruoka, Y. Tateisi, and N. Collier. Introduction to the bio-entity recognition

task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), pages 70–75, 2004.

S. Kim, Y. Song, K. Kim, J.W. Cha, and G.G. Lee. MMR-based active machine learning for

bio named entity recognition. In: Proceedings of Human Language Technology and the North American Association for Computational Linguistics (HLT-NAACL), pages 69–72. ACL Press, 2006.

R.D. King, K.E. Whelan, F.M. Jones, P.G. Reiser, C.H. Bryant, S.H. Muggleton, D.B. Kell, and S.G. Oliver. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971):247–52, 2004.
V. Krishnamurthy. Algorithms for optimal scheduling and management of hidden markov model

sensors. IEEE Transactions on Signal Processing, 50(6):1382–1397, 2002. S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79–86, 1951.

John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for

segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (ICML), pages 282–289. Morgan Kaufmann, 2001.

K. Lam, J.L.Y. Koh, B. Veeravalli, and V. Brusic. Incremental maintenance of biological databases

using association rule mining. In S. Istrail, P. Pevzner, and M. Waterman, editors, Pattern Recognition in Bioinformatics, pages 140–150. Springer, 2006.

K. Lang. Newsweeder: Learning to filter netnews. In: Proceedings of the International Conference

on Machine Learning (ICML), pages 331–339. Morgan Kaufmann, 1995.

K. Lari and S. J. Young. The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language, 4:35–56, 1990.
L. Lee. Measures of distributional similarity. In: Proceedings of the Association for Computational Linguistics (ACL), pages 25–32. ACL Press, 1999.
D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the International Conference on Machine Learning (ICML), pages 148–156. Morgan

Kaufmann, 1994.

D. Lewis and W. Gale. A sequential algorithm for training text classifiers. In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3–12. ACM/Springer, 1994.
D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In

Proceedings of the Symposium on Document Analysis and Information Retrieval, pages 81–93, 1994.

T. Li, M. Ogihara, and Q. Li. A comparative study on content-based music genre classification.

In: Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 282–289. ACM Press, 2003. Y. Li, H. Lin, and Z. Yang. Two approaches for biomedical text classification. In: Proceedings of the International Conference Bioinformatics and Biomedical Engineering (ICBBE), pages 310–313. IEEE Press, 2007. M. Light, X.Y. Qiu, and P. Srinivasan. The language of bioscience: Facts, speculations, and statements in between. In: Proceedings of the ISMB BioLINK, pages 17–24. ACM Press, 2004. M. Lindenbaum, S. Markovitch, and D. Rusakov. Selective sampling for nearest neighbor classi- fiers. Machine Learning, 54(2):125–152, 2004. D.C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization methods. Mathematical Programming, 45:503–528, 1989. Y. Liu. Active learning with support vector machine applied to gene expression data for cancer classification. Journal of Chemical Information and Computer Sciences, 44:1936–1941, 2004. R. Lomasky, C.E. Brodley, M. Aernecke, D. Walt, and M. Friedl. Active class selection. In: Proceedings of the European Conference on Machine Learning (ECML), pages 640–647. Springer, 2007.

D. MacKay. Information-based objective functions for active data selection. Neural Computation,

4(4):590–604, 1992. A. Madkour, K. Darwish, H. Hassan, A. Hassan, and O. Emam. BioNoculars: Extracting proteinprotein interatctions from biomedical text. In BioNLP 2007: Biological, translational, and clinical language processing, pages 89–96. ACM Press, 2007. O. Mangasarian, W.N. Street, and W. Wolberg. Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4):570–577, 1995. G. Mann and Andrew McCallum. Simple, robust, scalable semi-supervized learning via expectation regularization. In: Proceedings of the International Conference on Machine Learning (ICML), pages 593–600. ACM Press, 2007a. G. Mann and Andrew McCallum. Efficient computation of entropy gradient for semi-supervised conditional random fields. In: Proceedings of the North American Association for Computational Linguistics (NAACL), pages 109–112. ACL Press, 2007b. Christopher D. Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. D. Margineantu. Active cost-sensitive learning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pages 1622–1623. AAAI Press, 2005. O. Maron and T. Lozano-Perez. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (NIPS), volume 10, pages 570–576. MIT Press, 1998. Andrew McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI Workshop on Learning for Text Categorization, pages 41–48, 1998a. Andrew McCallum and K. Nigam. Employing EM in pool-based active learning for text classification. In: Proceedings of the International Conference on Machine Learning (ICML), pages 359–367. Morgan Kaufmann, 1998b. Prem Melville and Raymond Mooney. Diverse ensembles for active learning. In: Proceedings of the International Conference on Machine Learning (ICML), pages 584–591. Morgan Kaufmann, 2004. Prem Melville, M. Saar-Tsechansky, F. Provost, and Raymond Mooney. Active feature-value acquisition for classifier induction. In: Proceedings of the IEEE Conference on Data Mining (ICDM), pages 483–486. IEEE Press, 2004. L. Mihalkova and Raymond Mooney. Using active relocation to aid reinforcement learning. In: Proceedingscedings of the Florida Artificial Intelligence Research Society (FLAIRS), pages 580–585. AAAI Press, 2006.

S. Mika and B. Rost. Protein names precisely peeled off free text. Bioinformatics, 20(suppl 1):

I241–I247, 2004. Tom M. Mitchell. Generalization as search. Artificial Intelligence, 18:203–226, 1982. Tom M. Mitchell. Machine Learning. McGraw-Hill, 1997. R. Moskovitch, N. Nissim, D. Stopel, C. Feher, R. Englert, and Y. Elovici. Improving the detection of unknown computer worms activity using active learning. In: Proceedings of the German Conference on AI, pages 489–493. Springer, 2007. I. Muslea, S. Minton, and C.A. Knoblock. Selective sampling with redundant views. In: Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 621–626. AAAI Press, 2000. H.T. Nguyen and A. Smeulders. Active learning using pre-clustering. In: Proceedings of the International Conference on Machine Learning (ICML), pages 79–86. ACM Press, 2004. J. Nocedal and S.J. Wright. Numerical Optimization. Springer, 1999. M. Nyffenegger, J.C. Chappelier, and E. Gaussier. Revisiting Fisher kernels for document similarities. In: Proceedings of the European Conference on Machine Learning (ECML), pages 727–734. Springer, 2006. G. Paass and J. Kindermann. Bayesian query construction for neural network models. In Advances in Neural Information Processing Systems (NIPS), volume 7, pages 443–450. MIT Press, 1995. F. Peng and Andrew McCallum. Accurate information extraction from research papers using conditional random fields. In: Proceedings of Human Language Technology and the North American Association for Computational Linguistics (HLT-NAACL), pages 329–336. ACL Press, 2004. F. Provost, Tom Fawcett, and Ron Kohavi. Building the case against accuracy estimation for comparing induction algorithms. In: Proceedings of the International Conference on Machine Learning (ICML), pages 445–453. Morgan Kaufmann, 1998.

Lawrence R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.

H. Raghavan, O. Madani, and R. Jones. Active learning with feedback on both features and instances. Journal of Machine Learning Research, 7:1655–1686, 2006. R. Rahmani and S.A. Goldman. MISSL: Multiple-instance semi-supervised learning. In: Proceedings of the International Conference on Machine Learning (ICML), pages 705–712. ACM Press, 2006. L.A. Ramshaw and M.P. Marcus. Text chunking using transformation-based learning. In: Proceedings of the ACL Workshop on Very Large Corpora, 1995. S. Ray and M. Craven. Supervised versus multiple instance learning: An empirical comparison. In: Proceedings of the International Conference on Machine Learning (ICML), pages 697–704. ACM Press, 2005. N. Roy and Andrew McCallum. Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the International Conference on Machine Learning (ICML), pages 441–448. Morgan Kaufmann, 2001. E.F.T.K. Sang and F. DeMeulder. Introduction to the CoNLL-2003 shared task: Languageindependent named entity recognition. In: Proceedings of the Conference on Natural Language Learning (CoNLL), pages 142–147, 2003. T. Scheffer, C. Decomain, and S. Wrobel. Active hidden Markov models for information extraction. In: Proceedings of the International Conference on Advances in Intelligent Data Analysis (CAIDA), pages 309–318. Springer-Verlag, 2001. A.I. Schein and L.H. Ungar. Active learning for logistic regression: An evaluation. Machine Learning, 68(3):235–265, 2007. M. Schena, D. Shalong, R. Davis, and P.O. Brown. Quantitaive monitoring of gene expression patterns with a complimentary DNA microarray. Science, 270:467–470, 1995. M.J. Schervish. Theory of Statistics. Springer, 1995.

R. Schwartz and Y.-L. Chow. The N-best algorithm: an efficient and exact procedure for finding

the N most likely sentence hypotheses. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 81–83. IEEE Press, 1990. B. Settles. ABNER: An open source tool for automatically tagging genes, proteins, and other entity names in text. Bioinformatics, 21(14):3191–3192, 2005. B. Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), pages 104–107, 2004. B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1069–1078. ACL Press, 2008. B. Settles and M. Craven. Exploiting zone information, syntactic features, and informative terms in gene ontology annotation from biomedical documents. In: Proceedings of the Text Retrieval Conference (TREC), 2005. B. Settles, M. Craven, and L. Friedland. Active learning with real annotation costs. In: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, pages 1–10, 2008a. B. Settles, M. Craven, and S. Ray. Multiple-instance active learning. In Advances in Neural Information Processing Systems (NIPS), volume 20, pages 1289–1296. MIT Press, 2008b. H.S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In: Proceedings of the ACM Workshop on Computational Learning Theory, pages 287–294, 1992. F. Sha and Fernando Pereira. Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology and the North American Association for Computational Linguistics (HLTNAACL), pages 213–220. ACL Press, 2003. C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27: 379–423,623–656, 1948. L. Smith, L.K. Tanabe, R.J. Ando, C.J. Kuo, I.F. Chung, C.N. Hsu, Y.S. Lin, R. Klinger, C.M. Friedrich, K. Ganchev, M. Torii, H. Bing Liu Haddow, C.A. Struble, R.J. Povinelli, A. Vlachos, W.A. Baumgartner Jr, L. Hunter, B. Carpenter, R.T. Tsai, H.J. Dai, F. Liu, Y. Chen, C. Sun, S. Katrenko, P. Adriaans, Christian Blaschke, R. Torres, M. Neves, P. Nakov, A. Divoli, M.M. L´opez, J. Mata, and W.J. Wilbur. Overview of BioCreative II gene mention recognition. Genome Biology, 9(Suppl 2):S2, 2008. A.J. Smola and Bernhard Schölkopf. A tutorial on support vector regression. Technical Report NC2-TR- 1998-030, NueroCOLT2 Technical Report Series, 1998. R. Snow, B. O’Connor, Daniel Jurafsky, and A. Ng. Cheap and fast — but is it good? In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 254– 263. ACM Press, 2008. E.S. Soteriades and M.E. Falagas. Comparison of amount of biomedical research originating from th euriopean union and the united states. British Medical Journal, 331:192–194, 2005. C. Sutton and Andrew McCallum. An introduction to conditional random fields for relational learning. In Lise Getoor and Ben Taskar, editors, Introduction to Statistical Relational Learning. MIT Press, 2006.

R. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

Q. Tao, S.D. Scott, and N.V. Vinodchandran. SVM-based generalized multiple-instance learning via approximate box counting. In: Proceedings of the International Conference on Machine Learning (ICML), pages 779–806. Morgan Kaufmann, 2004. L. Tari, G. Gonzalez, R. Leaman, S. Nikkila, R.Wendt, and C. Baral. ASU at TREC 2006 genomics track. In: Proceedings of the Text Retrieval Conference (TREC), 2007. C.A. Thompson, M.E. Califf, and R.J. Mooney. Active learning for natural language parsing and information extraction. In: Proceedings of the International Conference on Machine Learning (ICML), pages 406–414. Morgan Kaufmann, 1999. S. Tong and E. Chang. Support vector machine active learning for image retrieval. In: Proceedings of the ACM International Conference on Multimedia, pages 107–118. ACM Press, 2001. S. Tong and Daphne Koller. Support vector machine active learning with applications to text classi- fication. In: Proceedings of the International Conference on Machine Learning (ICML), pages 999–1006. Morgan Kaufmann, 2000. L. Torrey and J. Shavlik. Transfer learning. In E. Soria, J. Martin, R. Magdalena, M. Martinez, and A. Serrano, editors, Handbook of Research on Machine Learning Applications. To appear, 2009. G. Tur, D. Hakkani-Tür, and Robert E. Schapire. Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45(2):171–186, 2005. C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M.N. Clark, J. Dolan, D. Duggins, T. Galatali, C. Geyer, M. Gittleman, S. Harbaugh, M. Hebert, T.M. Howard, S. Kolski, A. Kelly, M. Likhachev, M. McNaughton, N. Miller, K. Peterson, B. Pilnick, R. Rajkumar, P. Rybski, B. Salesky, Y.W. Seo, S. Singh, J. Snider, A. Stentz, W. Whittaker, Z. Wolkowicki, J. Ziglar, H. Bae, T. Brown, D. Demitrish, B. Litkouhi, J. Nickolaou, V. Sadekar, W. Zhang, J. Struble, M. Taylor, M. Darms, and D. Ferguson. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics, 25(8):425–466, 2008.

L.G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
Vladimir N. Vapnik and A. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16:264–280, 1971.
S. Vijayanarasimhan and K. Grauman. Multi-level active prediction of useful image annotations for recognition. In Advances in Neural Information Processing Systems (NIPS), volume 21. MIT Press, 2009.
A. Vlachos. Evaluating and combining biomedical named entity recognition systems. In BioNLP

2007: Biological, translational, and clinical language processing, pages 199–206, 2007. L. von Ahn and L. Dabbish. General techniques for designing games with a purpose. Communications of the ACM, 51(8):58–67, 2008.

D.G. Wang, J.B. Fan, C.J. Siao, A. Berno, P. Young, R. Sapolsky, G. Ghandour, N. Perkins,

E. Winchester, J. Spencer, L. Kruglyak, L. Stein, L. Hsie, T. Topaloglou, E. Hubbell, E. Robinson, M. Mittmann, M.S. Morris, N. Shen, D. Kilburn, J. Rioux, C. Nusbaum, S. Rozen, T.J. Hudson, R. Lipshutz, M. Chee, and E.S. Lander. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science, 280(5366):1077– 1082, 1998. Z. Xu, R. Akella, and Y. Zhang. Incorporating diversity and density in active learning for relevance feedback. In: Proceedings of the European Conference on IR Research (ECIR), pages 246–257. Springer-Verlag, 2007. R. Yan, J. Yang, and A. Hauptmann. Automatically labeling video data using multi-class active learning. In: Proceedings of the International Conference on Computer Vision, pages 516–523. IEEE Press, 2003. Z. Yang, H. Lin, Y. Li, B Liu, and Y. Lu. TREC 2005 genomics track experiments at DUTAI. In Proceedings of the Text Retrieval Conference (TREC), 2006. David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the Association for Computational Linguistics (ACL), pages 189–196. ACL Press, 1995.

A. Yeh, A. Morgan, Marc E. Colosimo, and Lynette Hirschman. Biocreative task 1a: gene mention finding

evaluation. BMC Bioinformatics, 6(Suppl 1):S2, 2005. H. Yu. SVM selective sampling for ranking with application to data retrieval. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD), pages 354– 363. ACM Press, 2005. L. Yu, S.T. Ahmed, G. Gonzalez, B. Logsdon, M. Nakamura, S. Nikkila, K. Shah, L. Tari, R. Wendt, A. Ziegler, and C Baral. Genomic information retrieval through selective extraction and tagging by the ASU-BoiAI group. In: Proceedings of the Text Retrieval Conference (TREC), 2006. C. Zhang and T. Chen. An active learning framework for content based information retrieval. IEEE Transactions on Multimedia, 4(2):260–268, 2002. T. Zhang and F.J. Oles. A probability analysis on the value of unlabeled data for classification problems. In: Proceedings of the International Conference on Machine Learning (ICML), pages 1191–1198. Morgan Kaufmann, 2000. Z. Zheng and B. Padmanabhan. On active learning for data acquisition. In: Proceedings of the IEEE Conference on Data Mining (ICDM), pages 562–569. IEEE Press, 2002.

G. Zhou and J. Su. Exploring deep knowledge resources in biomedical name recognition. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), pages 96–99, 2004.
G.D. Zhou, J. Zhang, J. Su, D. Shen, and C.L. Tan. Recognizing names in biomedical texts: A

machine learning approach. Bioinformatics, 20(7):1178–1190, 2004a.

Z.H. Zhou, K.J. Chen, and Y. Jiang. Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the European Conference on Machine Learning (ECML), pages 425–435. Springer, 2004b.
X. Zhu. Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, 2005a.
X. Zhu. Semi-supervised learning literature survey. Computer Sciences Technical Report 1530, University of Wisconsin–Madison, 2005b.
X. Zhu, John D. Lafferty, and Zoubin Ghahramani. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the ICML Workshop on the Continuum from Labeled to Unlabeled Data, pages 58–65, 2003.

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2008 CuriousMachinesActiveLearningwi	Burr Settles			Curious Machines: Active Learning with Structured Instances						2008