Transfer Learning Algorithm: Difference between revisions
Jump to navigation
Jump to search
(Redirected page to Domain Adaptable Learning Algorithm) |
No edit summary |
||
Line 1: | Line 1: | ||
A [[Transfer Learning Algorithm]] is a [[learning algorithm]] that trains on a one [[learning dataset]] prior to being applied to another [[learning dataset]]. | |||
* <B>Context</U>:</B> | |||
** It can be implemented by a [[Transfer Learning System]] (to solve a [[domain adaptable learning task]]). | |||
** It can range from being a [[Transductive Transfer Learning Algorithm]] to being a [[Inductive Transfer Learning Algorithm]]. | |||
** It can range from being an [[Unsupervised Domain Adaptable Learning Algorithm]] to being a [[Supervised Domain Adaptable Learning Algorithm]]. | |||
* <B>See:</B> [[Semi-Supervised Learning Algorithm]]. | |||
---- | |||
---- | |||
== References == | |||
=== 2018 === | |||
* ([[2018_UniversalLanguageModelFineTunin|Howard & Ruder, 2018]]) ⇒ [[::Jeremy Howard]], and [[::Sebastian Ruder]]. ([[::2018]]). “[http://www.aclweb.org/anthology/P18-1031.pdf Universal Language Model Fine-tuning for Text Classification].” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics ([[ACL-2018]]). | |||
** QUOTE: ... [[Inductive transfer learning]] has had a large impact on [[computer vision (CV)]]. </s> ... While [[Deep Learning model]]s have achieved [[state-of-the-art]] on many [[NLP task]]s, these [[model]]s are [[trained from scratch]], requiring [[large text dataset|large dataset]]s, and days to [[converge]]. </s> [[Research in NLP]] focused mostly on [[transductive transfer]] ([[Blitzer et al., 2007]]). </s> For [[inductive transfer]], [[model fine-tuning|fine-tuning]] [[pretrained word embedding]]s ([[Mikolov et al., 2013]]), a simple [[transfer technique]] that only targets a [[model’s first layer]], has had a large impact in [[applied NLP|practice]] and is used in most [[State-of-the-Art NLP Algorithm|state-of-the-art]] [[Deep NLP model|model]]s. </s> ... | |||
=== 2010 === | |||
* ([[Pan & Tang, 2010]]) ⇒ Sinno Jialin Pan, and [[Qiang Yang]]. ([[2010]]). “A Survey on Transfer Learning." In: IEEE Trans. on Knowl. and Data Eng., 22(10). [http://dx.doi.org/10.1109/TKDE.2009.191 doi:10.1109/TKDE.2009.191] | |||
=== 2009 === | |||
* ([[2009_ExtractingDiscriminativeConcept|Chen et al., 2009]]) ⇒ Bo Chen, [[Wai Lam]], Ivor Tsang, and Tak-Lam Wong. ([[2009]]). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining." In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD-2009]]). [http://dx.doi.org/10.1145/1557019.1557045 doi:10.1145/1557019.1557045] | |||
** … Several domain adaptation methods have been proposed to learn a reasonable representation so as to make the distributions between the source domain and the target domain closer [3, 12, 13, 11]. | |||
=== 2008 === | |||
* ([[Pan et al., 2008]]) ⇒ S. J. Pan, J. T. Kwok, and Q. Yang. ([[2008]]). “Transfer Learning via Dimensionality Reduction." In: Proceedings of the 23rd AAAI conference on Artificial Intelligence. | |||
=== 2007 === | |||
* ([[Daumé III, 2007]]) ⇒ [[Hal Daumé III]]. ([[2007]]). “[http://acl.ldc.upenn.edu/P/P07/P07-1033.pdf Frustratingly Easy Domain Adaptation]." In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics ([[ACL 2007]]). | |||
* ([[Raina et al., 2007]]) ⇒ R. Raina, A. Battle, H. Lee, B. Packer, and [[A. Y. Ng]]. ([[2007]]). “Self-Taught Learning: Transfer learning from [[unlabeled data]]." In: Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007). | |||
* ([[Satpal & Sarawagi, 2007]]) ⇒ S. Satpal and [[Sunita Sarawagi]]. ([[2007]]). “Domain Adaptation of Conditional Probability Models via Feature Subsetting." In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases. | |||
=== 2006 === | |||
* ([[Blitzer et al., 2006]]) ⇒ [[J. Blitzer]], R. McDonald, and [[Fernando Pereira]]. ([[2006]]). “[http://acl.ldc.upenn.edu/W/W06/W06-1615.pdf Domain Adaptation with Structural Correspondence Learning]." In: Proceedings of the Conference on Empirical Methods in Natural Language Processing ([[EMNLP 2006]]). | |||
* ([[Daumé III & Marcu, 2006]]) ⇒ [[Hal Daumé, III]], and [[Daniel Marcu]]. ([[2006]]). “[https://www.aaai.org/Papers/JAIR/Vol26/JAIR-2603.pdf Domain Adaptation for Statistical Classifiers]." In: Journal of Artificial Intelligence Research, 26 (JAIR 26). | |||
** QUOTE: The most basic assumption used in statistical learning theory is that [[training data]] and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "<i>in-domain</i>" test data is drawn from a distribution that is related, but not identical, to the "<i>out-of-domain</i>" distribution of the [[training data]]. [[We]] consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. [[We]] introduce a statistical formulation of [[this problem]] in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. [[We]] present efficient inference algorithms for this special case based on the technique of [[conditional expectation maximization]]. [[Our experimental result]]s show that [[our approach]] leads to improved performance on three real world tasks on four different data sets from the natural language processing domain. | |||
---- | |||
__NOTOC__ | |||
[[Category:Concept]] |
Revision as of 23:00, 18 August 2018
A Transfer Learning Algorithm is a learning algorithm that trains on a one learning dataset prior to being applied to another learning dataset.
- Context:
- It can be implemented by a Transfer Learning System (to solve a domain adaptable learning task).
- It can range from being a Transductive Transfer Learning Algorithm to being a Inductive Transfer Learning Algorithm.
- It can range from being an Unsupervised Domain Adaptable Learning Algorithm to being a Supervised Domain Adaptable Learning Algorithm.
- See: Semi-Supervised Learning Algorithm.
References
2018
- (Howard & Ruder, 2018) ⇒ [[::Jeremy Howard]], and [[::Sebastian Ruder]]. ([[::2018]]). “Universal Language Model Fine-tuning for Text Classification.” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-2018).
- QUOTE: ... Inductive transfer learning has had a large impact on computer vision (CV). ... While Deep Learning models have achieved state-of-the-art on many NLP tasks, these models are trained from scratch, requiring large datasets, and days to converge. Research in NLP focused mostly on transductive transfer (Blitzer et al., 2007). For inductive transfer, fine-tuning pretrained word embeddings (Mikolov et al., 2013), a simple transfer technique that only targets a model’s first layer, has had a large impact in practice and is used in most state-of-the-art models. ...
2010
- (Pan & Tang, 2010) ⇒ Sinno Jialin Pan, and Qiang Yang. (2010). “A Survey on Transfer Learning." In: IEEE Trans. on Knowl. and Data Eng., 22(10). doi:10.1109/TKDE.2009.191
2009
- (Chen et al., 2009) ⇒ Bo Chen, Wai Lam, Ivor Tsang, and Tak-Lam Wong. (2009). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining." In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557045
- … Several domain adaptation methods have been proposed to learn a reasonable representation so as to make the distributions between the source domain and the target domain closer [3, 12, 13, 11].
2008
- (Pan et al., 2008) ⇒ S. J. Pan, J. T. Kwok, and Q. Yang. (2008). “Transfer Learning via Dimensionality Reduction." In: Proceedings of the 23rd AAAI conference on Artificial Intelligence.
2007
- (Daumé III, 2007) ⇒ Hal Daumé III. (2007). “Frustratingly Easy Domain Adaptation." In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007).
- (Raina et al., 2007) ⇒ R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. (2007). “Self-Taught Learning: Transfer learning from unlabeled data." In: Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007).
- (Satpal & Sarawagi, 2007) ⇒ S. Satpal and Sunita Sarawagi. (2007). “Domain Adaptation of Conditional Probability Models via Feature Subsetting." In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases.
2006
- (Blitzer et al., 2006) ⇒ J. Blitzer, R. McDonald, and Fernando Pereira. (2006). “Domain Adaptation with Structural Correspondence Learning." In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006).
- (Daumé III & Marcu, 2006) ⇒ Hal Daumé, III, and Daniel Marcu. (2006). “Domain Adaptation for Statistical Classifiers." In: Journal of Artificial Intelligence Research, 26 (JAIR 26).
- QUOTE: The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the "in-domain" test data is drawn from a distribution that is related, but not identical, to the "out-of-domain" distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.