Transfer Learning Algorithm: Difference between revisions
(Redirected page to Domain Adaptable Learning Algorithm) |
No edit summary |
||
(20 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
A [[Transfer Learning Algorithm]] is a [[learning algorithm]] that trains on one [[learning dataset]] prior to being applied to another [[learning dataset]]. | |||
* <B>AKA:</B> [[Domain Adaptation Method]], [[Cross-Domain Learning Algorithm]], [[Knowledge Transfer Algorithm]]. | |||
* <B>Context:</B> | |||
** [[Task Input]]: [[Source Domain Data]], [[Target Domain Data]] | |||
*** [[Optional Input]]: [[Domain Knowledge]], [[Transfer Constraint]]s | |||
** [[Task Output]]: [[Adapted Model]], [[Transferred Knowledge]] | |||
** [[Task Performance Measure]]: [[Transfer Efficiency]], [[Domain Adaptation Accuracy]], [[Knowledge Retention Rate]] | |||
** ... | |||
** It can enable [[Knowledge Transfer]] through [[feature alignment]] between [[source domain]] and [[target domain]]. | |||
** It can facilitate [[Model Adaptation]] by managing [[distribution shift]]s between [[training data]] and [[test data]]. | |||
** It can support [[Efficient Learning]] by leveraging [[pre-existing knowledge]] from [[related task]]s. | |||
** It can manage [[Domain Gap]] using [[adaptation strategy|adaptation strategies]] and [[domain alignment]]. | |||
** It can optimize [[Resource Usage]] by reducing required [[target domain data]]. | |||
** ... | |||
** It can often utilize [[Feature Representation]] for [[cross-domain learning]]. | |||
** It can often implement [[Progressive Adaptation]] through [[iterative refinement]]. | |||
** It can often employ [[Distribution Matching]] to reduce [[domain discrepancy]]. | |||
** ... | |||
** It can range from being a [[Transductive Transfer Learning Algorithm]] to being an [[Inductive Transfer Learning Algorithm]], depending on its [[transfer type]]. | |||
** It can range from being an [[Unsupervised Domain Adaptable Learning Algorithm]] to being a [[Supervised Domain Adaptable Learning Algorithm]], based on [[target data label]] availability. | |||
** It can range from being a [[Zero-Shot Transfer Algorithm]] to being a [[Few-Shot Transfer Algorithm]], depending on its [[target data requirement]]s. | |||
** It can range from being a [[Single-Task Transfer Algorithm]] to being a [[Multi-Task Transfer Algorithm]], based on its [[task scope]]. | |||
** It can range from being a [[Shallow Transfer Algorithm]] to being a [[Deep Transfer Algorithm]], depending on its [[network depth]] and [[layer adaptation]]. | |||
** It can range from being a [[Source-Free Transfer Algorithm]] to being a [[Source-Dependent Transfer Algorithm]], based on its [[source data requirement]]s. | |||
** It can range from being a [[Static Transfer Algorithm]] to being an [[Adaptive Transfer Algorithm]], depending on its [[adaptation dynamics]]. | |||
** It can range from being a [[Homogeneous Domain Transfer Algorithm]] to being a [[Heterogeneous Domain Transfer Algorithm]], based on its [[feature space compatibility]]. | |||
** ... | |||
* <B>Examples:</B> | |||
** [[Model Distillation Method]]. | |||
** [[Learning Approach]] implementations, such as: | |||
*** [[Deep Transfer Method]]s, such as: | |||
**** [[Deep Learning Model Fine-Tuning Algorithm]] for [[model adaptation]]. | |||
**** [[Feature Extraction Transfer]] for [[representation learning]]. | |||
**** [[Layer-wise Transfer]] for [[selective knowledge transfer]]. | |||
**** [[Progressive Neural Network]] for [[knowledge expansion]]. | |||
*** [[Domain Adaptation Technique]]s, such as: | |||
**** [[Adversarial Domain Adaptation]] for [[distribution alignment]]. | |||
**** [[Structural Correspondence Learning]] for [[feature mapping]]. | |||
**** [[Maximum Mean Discrepancy]] for [[distribution matching]]. | |||
**** [[Optimal Transport Adaptation]] for [[probability alignment]]. | |||
*** [[Sim2Real Transfer Algorithm]]s, such as: | |||
**** [[Domain Randomization Transfer]] for [[simulation robustness]]. | |||
**** [[Cycle-Consistent Adaptation]] for [[visual domain bridging]]. | |||
**** [[System Identification Transfer]] for [[dynamics matching]]. | |||
**** [[Progressive Sim2Real Transfer]] for [[gradual reality adaptation]]. | |||
** [[Application-Specific Transfer]]s, such as: | |||
*** [[NLP Transfer Learning Algorithm]]s, such as: | |||
**** [[BERT Fine-Tuning Algorithm]] for [[language understanding]]. | |||
**** [[Cross-Lingual Transfer]] for [[multilingual adaptation]]. | |||
**** [[Domain-Specific Language Model Transfer]] for [[specialized text]]. | |||
*** [[Computer Vision Transfer]]s, such as: | |||
**** [[ImageNet Pre-Training Transfer]] for [[visual recognition]]. | |||
**** [[Style Transfer Algorithm]] for [[image adaptation]]. | |||
**** [[Cross-Domain Object Detection]] for [[vision task]]. | |||
*** [[Robotics Transfer Learning]]s, such as: | |||
**** [[Policy Transfer Algorithm]] for [[control adaptation]]. | |||
**** [[Skill Transfer Method]] for [[task generalization]]. | |||
**** [[Multi-Robot Transfer]] for [[platform adaptation]]. | |||
** [[Transfer Strategy]] types, such as: | |||
*** [[Sequential Transfer Learning]] methods, such as: | |||
**** [[Curriculum Transfer]] for [[progressive learning]]. | |||
**** [[Lifelong Learning Transfer]] for [[continuous adaptation]]. | |||
*** [[Multi-Task Transfer Learning]] approaches, such as: | |||
**** [[Shared Parameter Transfer]] for [[common feature learning]]. | |||
**** [[Task-Specific Adaptation]] for [[specialized transfer]]. | |||
*** [[Cross-Modal Transfer Learning]] techniques, such as: | |||
**** [[Vision-Language Transfer]] for [[multimodal learning]]. | |||
**** [[Audio-Visual Transfer]] for [[sensory adaptation]]. | |||
** ... | |||
* <B>Counter-Examples:</B> | |||
** [[Single Domain Learning Algorithm]], which operates within one [[domain]] without [[transfer mechanism]]s. | |||
** [[Scratch Training Algorithm]], which learns without leveraging [[pre-existing knowledge]]. | |||
** [[Independent Learning Method]], which doesn't utilize [[cross-domain knowledge]]. | |||
** [[Fixed Model Algorithm]], which lacks [[adaptation capability]]s. | |||
* <B>See:</B> [[Semi-Supervised Learning Algorithm]], [[Adversarial Domain Adaptation]], [[Multi-Task Learning]], [[Meta-Learning Algorithm]], [[Continual Learning Method]], [[Domain Generalization]]. | |||
---- | |||
---- | |||
== References == | |||
=== 2019 === | |||
* ([[Li et al., 2019]]) ⇒ [[Xiang Li]], [[Wei Zhang]], [[Qian Ding]], and [[Jian-Qiao Sun]]. ([[2019]]). “Multi-layer Domain Adaptation Method for Rolling Bearing Fault Diagnosis.” Signal processing 157. | |||
** QUOTE: ... In the past years, [[data-driven approach]]es such as [[deep learning]] have been widely applied on [[machinery signal processing]] to develop intelligent [[fault diagnosis system]]s. In [[real-world application]]s, [[domain shift problem]] usually occurs where the [[distribution of the labeled training data]], denoted as source domain, is different from that of the [[unlabeled testing data]], known as [[target domain]]. That results in serious diagnosis performance degradation. [[This paper]] proposes a novel [[domain adaptation method]] for rolling bearing fault diagnosis based on deep learning techniques. ... | |||
=== 2019 === | |||
* https://towardsdatascience.com/transfer-learning-in-nlp-f5035cc3f62f | |||
** QUOTE: ... Now we define [[taxonomy]] as per [[Pan and Yang (2010)]]. [[Pan and Yang (2010)|They]] segregate [[transfer learning]] mainly into [[transductive transfer learning|transductive]] and [[inductive transfer learning|inductive]]. It is further divided into [[domain adaption]], [[cross-lingual learning]], [[multi-task learning]] and [[sequential transfer learning]]. ... | |||
=== 2018 === | |||
* ([[2018_UniversalLanguageModelFineTunin|Howard & Ruder, 2018]]) ⇒ [[Jeremy Howard]], and [[Sebastian Ruder]]. ([[2018]]). “[http://www.aclweb.org/anthology/P18-1031.pdf Universal Language Model Fine-tuning for Text Classification].” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics ([[ACL-2018]]). | |||
** QUOTE: ... [[Inductive transfer learning]] has had a large impact on [[computer vision (CV)]]. </s> ... While [[Deep Learning model]]s have achieved [[state-of-the-art]] on many [[NLP task]]s, these [[model]]s are [[trained from scratch]], requiring [[large text dataset|large dataset]]s, and days to [[converge]]. </s> [[Research in NLP]] focused mostly on [[transductive transfer]] ([[Blitzer et al., 2007]]). </s> For [[inductive transfer]], [[model fine-tuning|fine-tuning]] [[pretrained word embedding]]s ([[Mikolov et al., 2013]]), a simple [[transfer technique]] that only targets a [[model’s first layer]], has had a large impact in [[applied NLP|practice]] and is used in most [[State-of-the-Art NLP Algorithm|state-of-the-art]] [[Deep NLP model|model]]s. </s> ... | |||
=== 2010 === | |||
* ([[Pan & Tang, 2010]]) ⇒ Sinno Jialin Pan, and [[Qiang Yang]]. ([[2010]]). “A Survey on Transfer Learning.” In: IEEE Trans. on Knowl. and Data Eng., 22(10). [http://dx.doi.org/10.1109/TKDE.2009.191 doi:10.1109/TKDE.2009.191] | |||
=== 2009 === | |||
* ([[2009_ExtractingDiscriminativeConcept|Chen et al., 2009]]) ⇒ Bo Chen, [[Wai Lam]], Ivor Tsang, and Tak-Lam Wong. ([[2009]]). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining.” In: Proceedings of [[ACM SIGKDD]] Conference ([[KDD-2009]]). [http://dx.doi.org/10.1145/1557019.1557045 doi:10.1145/1557019.1557045] | |||
** … Several domain adaptation methods have been proposed to learn a reasonable representation so as to make the distributions between the source domain and the target domain closer [3, 12, 13, 11]. | |||
=== 2008 === | |||
* ([[Pan et al., 2008]]) ⇒ S. J. Pan, J. T. Kwok, and Q. Yang. ([[2008]]). “Transfer Learning via Dimensionality Reduction.” In: Proceedings of the 23rd AAAI conference on Artificial Intelligence. | |||
=== 2007 === | |||
* ([[Daumé III, 2007]]) ⇒ [[Hal Daumé III]]. ([[2007]]). “[http://acl.ldc.upenn.edu/P/P07/P07-1033.pdf Frustratingly Easy Domain Adaptation].” In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics ([[ACL 2007]]). | |||
* ([[Raina et al., 2007]]) ⇒ R. Raina, A. Battle, H. Lee, B. Packer, and [[A. Y. Ng]]. ([[2007]]). “Self-Taught Learning: Transfer learning from [[unlabeled data]].” In: Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007). | |||
* ([[Satpal & Sarawagi, 2007]]) ⇒ S. Satpal and [[Sunita Sarawagi]]. ([[2007]]). “Domain Adaptation of Conditional Probability Models via Feature Subsetting.” In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases. | |||
=== 2006 === | |||
* ([[Blitzer et al., 2006]]) ⇒ [[J. Blitzer]], R. McDonald, and [[Fernando Pereira]]. ([[2006]]). “[http://acl.ldc.upenn.edu/W/W06/W06-1615.pdf Domain Adaptation with Structural Correspondence Learning].” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing ([[EMNLP 2006]]). | |||
* ([[Daumé III & Marcu, 2006]]) ⇒ [[Hal Daumé, III]], and [[Daniel Marcu]]. ([[2006]]). “[https://www.aaai.org/Papers/JAIR/Vol26/JAIR-2603.pdf Domain Adaptation for Statistical Classifiers].” In: Journal of Artificial Intelligence Research, 26 (JAIR 26). | |||
** QUOTE: The most basic assumption used in statistical learning theory is that [[training data]] and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the “<i>in-domain</i>” test data is drawn from a distribution that is related, but not identical, to the “<i>out-of-domain</i>” distribution of the [[training data]]. [[We]] consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. [[We]] introduce a statistical formulation of [[this problem]] in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. [[We]] present efficient inference algorithms for this special case based on the technique of [[conditional expectation maximization]]. [[Our experimental result]]s show that [[our approach]] leads to improved performance on three real world tasks on four different data sets from the natural language processing domain. | |||
---- | |||
__NOTOC__ | |||
[[Category:Concept]] | |||
[[Category:Machine Learning]] | |||
[[Category:Robotics]] | |||
[[Category:Transfer Learning]] | |||
[[Category:Quality Silver]] |
Latest revision as of 23:11, 28 January 2025
A Transfer Learning Algorithm is a learning algorithm that trains on one learning dataset prior to being applied to another learning dataset.
- AKA: Domain Adaptation Method, Cross-Domain Learning Algorithm, Knowledge Transfer Algorithm.
- Context:
- Task Input: Source Domain Data, Target Domain Data
- Task Output: Adapted Model, Transferred Knowledge
- Task Performance Measure: Transfer Efficiency, Domain Adaptation Accuracy, Knowledge Retention Rate
- ...
- It can enable Knowledge Transfer through feature alignment between source domain and target domain.
- It can facilitate Model Adaptation by managing distribution shifts between training data and test data.
- It can support Efficient Learning by leveraging pre-existing knowledge from related tasks.
- It can manage Domain Gap using adaptation strategies and domain alignment.
- It can optimize Resource Usage by reducing required target domain data.
- ...
- It can often utilize Feature Representation for cross-domain learning.
- It can often implement Progressive Adaptation through iterative refinement.
- It can often employ Distribution Matching to reduce domain discrepancy.
- ...
- It can range from being a Transductive Transfer Learning Algorithm to being an Inductive Transfer Learning Algorithm, depending on its transfer type.
- It can range from being an Unsupervised Domain Adaptable Learning Algorithm to being a Supervised Domain Adaptable Learning Algorithm, based on target data label availability.
- It can range from being a Zero-Shot Transfer Algorithm to being a Few-Shot Transfer Algorithm, depending on its target data requirements.
- It can range from being a Single-Task Transfer Algorithm to being a Multi-Task Transfer Algorithm, based on its task scope.
- It can range from being a Shallow Transfer Algorithm to being a Deep Transfer Algorithm, depending on its network depth and layer adaptation.
- It can range from being a Source-Free Transfer Algorithm to being a Source-Dependent Transfer Algorithm, based on its source data requirements.
- It can range from being a Static Transfer Algorithm to being an Adaptive Transfer Algorithm, depending on its adaptation dynamics.
- It can range from being a Homogeneous Domain Transfer Algorithm to being a Heterogeneous Domain Transfer Algorithm, based on its feature space compatibility.
- ...
- Examples:
- Model Distillation Method.
- Learning Approach implementations, such as:
- Deep Transfer Methods, such as:
- Domain Adaptation Techniques, such as:
- Sim2Real Transfer Algorithms, such as:
- Application-Specific Transfers, such as:
- NLP Transfer Learning Algorithms, such as:
- Computer Vision Transfers, such as:
- Robotics Transfer Learnings, such as:
- Transfer Strategy types, such as:
- Sequential Transfer Learning methods, such as:
- Multi-Task Transfer Learning approaches, such as:
- Cross-Modal Transfer Learning techniques, such as:
- ...
- Counter-Examples:
- Single Domain Learning Algorithm, which operates within one domain without transfer mechanisms.
- Scratch Training Algorithm, which learns without leveraging pre-existing knowledge.
- Independent Learning Method, which doesn't utilize cross-domain knowledge.
- Fixed Model Algorithm, which lacks adaptation capabilitys.
- See: Semi-Supervised Learning Algorithm, Adversarial Domain Adaptation, Multi-Task Learning, Meta-Learning Algorithm, Continual Learning Method, Domain Generalization.
References
2019
- (Li et al., 2019) ⇒ Xiang Li, Wei Zhang, Qian Ding, and Jian-Qiao Sun. (2019). “Multi-layer Domain Adaptation Method for Rolling Bearing Fault Diagnosis.” Signal processing 157.
- QUOTE: ... In the past years, data-driven approaches such as deep learning have been widely applied on machinery signal processing to develop intelligent fault diagnosis systems. In real-world applications, domain shift problem usually occurs where the distribution of the labeled training data, denoted as source domain, is different from that of the unlabeled testing data, known as target domain. That results in serious diagnosis performance degradation. This paper proposes a novel domain adaptation method for rolling bearing fault diagnosis based on deep learning techniques. ...
2019
- https://towardsdatascience.com/transfer-learning-in-nlp-f5035cc3f62f
- QUOTE: ... Now we define taxonomy as per Pan and Yang (2010). They segregate transfer learning mainly into transductive and inductive. It is further divided into domain adaption, cross-lingual learning, multi-task learning and sequential transfer learning. ...
2018
- (Howard & Ruder, 2018) ⇒ Jeremy Howard, and Sebastian Ruder. (2018). “Universal Language Model Fine-tuning for Text Classification.” In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL-2018).
- QUOTE: ... Inductive transfer learning has had a large impact on computer vision (CV). ... While Deep Learning models have achieved state-of-the-art on many NLP tasks, these models are trained from scratch, requiring large datasets, and days to converge. Research in NLP focused mostly on transductive transfer (Blitzer et al., 2007). For inductive transfer, fine-tuning pretrained word embeddings (Mikolov et al., 2013), a simple transfer technique that only targets a model’s first layer, has had a large impact in practice and is used in most state-of-the-art models. ...
2010
- (Pan & Tang, 2010) ⇒ Sinno Jialin Pan, and Qiang Yang. (2010). “A Survey on Transfer Learning.” In: IEEE Trans. on Knowl. and Data Eng., 22(10). doi:10.1109/TKDE.2009.191
2009
- (Chen et al., 2009) ⇒ Bo Chen, Wai Lam, Ivor Tsang, and Tak-Lam Wong. (2009). “Extracting Discrimininative Concepts for Domain Adaptation in Text Mining.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557045
- … Several domain adaptation methods have been proposed to learn a reasonable representation so as to make the distributions between the source domain and the target domain closer [3, 12, 13, 11].
2008
- (Pan et al., 2008) ⇒ S. J. Pan, J. T. Kwok, and Q. Yang. (2008). “Transfer Learning via Dimensionality Reduction.” In: Proceedings of the 23rd AAAI conference on Artificial Intelligence.
2007
- (Daumé III, 2007) ⇒ Hal Daumé III. (2007). “Frustratingly Easy Domain Adaptation.” In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007).
- (Raina et al., 2007) ⇒ R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng. (2007). “Self-Taught Learning: Transfer learning from unlabeled data.” In: Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007).
- (Satpal & Sarawagi, 2007) ⇒ S. Satpal and Sunita Sarawagi. (2007). “Domain Adaptation of Conditional Probability Models via Feature Subsetting.” In: Proceedings of European Conference on Principles and Practice of Knowledge Discovery in Databases.
2006
- (Blitzer et al., 2006) ⇒ J. Blitzer, R. McDonald, and Fernando Pereira. (2006). “Domain Adaptation with Structural Correspondence Learning.” In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2006).
- (Daumé III & Marcu, 2006) ⇒ Hal Daumé, III, and Daniel Marcu. (2006). “Domain Adaptation for Statistical Classifiers.” In: Journal of Artificial Intelligence Research, 26 (JAIR 26).
- QUOTE: The most basic assumption used in statistical learning theory is that training data and test data are drawn from the same underlying distribution. Unfortunately, in many applications, the “in-domain” test data is drawn from a distribution that is related, but not identical, to the “out-of-domain” distribution of the training data. We consider the common case in which labeled out-of-domain data is plentiful, but labeled in-domain data is scarce. We introduce a statistical formulation of this problem in terms of a simple mixture model and present an instantiation of this framework to maximum entropy classifiers and their linear chain counterparts. We present efficient inference algorithms for this special case based on the technique of conditional expectation maximization. Our experimental results show that our approach leads to improved performance on three real world tasks on four different data sets from the natural language processing domain.