2021 PTuningV2PromptTuningCanBeCompa

(Liu, Ji et al., 2021) ⇒ Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. (2021). “P-Tuning V2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks.” In: arXiv preprint arXiv:2110.07602. doi:10.48550/arXiv.2110.07602

Subject Headings: LLM Prompt Tuning.

Notes

It was followed by this paper: (Liu, Ji et al., 2022).
It provides insights into Natural Language Understanding (NLU) and Language Model Tuning, focusing on Prompt Tuning, a method that tunes continuous prompts while keeping the Language Model frozen, reducing storage and memory usage.
It addresses the limitations of previous Prompt Tuning methods, which underperformed for normal-sized Pretrained Models and struggled with hard sequence labeling tasks, such as Extractive Question Answering and Named Entity Recognition (NER).
It introduces P-Tuning v2, an optimized form of Deep Prompt Tuning, adapted for NLU tasks, demonstrating effectiveness across various model scales and NLU tasks.
It highlights that P-Tuning v2 matches the performance of Fine-Tuning while requiring significantly fewer tuned parameters, offering an efficient alternative in terms of parameter tuning and resource utilization.
It emphasizes the universal effectiveness of P-Tuning v2 across different model scales and NLU tasks, improving upon previous methods, particularly in smaller and challenging models.
It details critical aspects of optimization and implementation, including the use of continuous prompts in every layer of the Pretrained Model, varying prompt lengths for different tasks, and applying a classification head for sequence labeling tasks.
It presents experimental results showing that P-Tuning v2 matches or surpasses Fine-Tuning performance across different models (from 300M to 10B parameters) and on various NLU tasks, including challenging sequence tagging tasks.
It posits P-Tuning v2 as a strong alternative to Fine-Tuning and a baseline for future research in NLU and Language Model Tuning.
In the context of sequence tagging tasks, it explores:
- Named Entity Recognition (NER): Utilizing datasets like CoNLL03, OntoNotes 5.0, and CoNLL04, the model is trained on standard train-develop-test splits, labeled in IOB2 format, to mark entities within a text.
- Extractive Question Answering: Using SQuAD versions 1.1 and 2.0, the task involves classifying tokens in a context given a question to extract the answer, with labels like ‘start’ or ‘end’ assigned to each token.
- Semantic Role Labeling (SRL): Evaluated on CoNLL05 and CoNLL12 datasets, this involves assigning semantic roles to words or phrases in a sentence, with the target verb token added to the end of each sentence for verb recognition.

Cited By

http://scholar.google.com/scholar?q=%222021%22+P-tuning+V2%3A+Prompt+Tuning+Can+Be+Comparable+to+Fine-tuning+Universally+Across+Scales+and+Tasks

Quotes

Abstract

Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence labeling tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks. It matches the performance of finetuning while having only 0.1%-3% tuned parameters. Our method P-Tuning v2 is an implementation of Deep Prompt Tuning (Li and Liang, 2021; Qin and Eisner, 2021) optimized and adapted for NLU. Given the universality and simplicity of P-Tuning v2, we believe it can serve as an alternative to finetuning and a strong baseline for future research.Our code and data are released at this https URL.

5 Conclusions

We present P-tuning v2, a prompt tuning method. Despite its relatively limited technical novelty, it contributes to a novel finding that prompt tuning can be comparable to fine-tuning universally across scales (from 330M to 10B parameters) and tasks. With high accuracy and parameter efficiency, P- Tuning v2 can be a potential alternative for fine- tuning and a strong baseline for future work.

References

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
Xavier Carreras and Lluís Màrquez. 2004. Introduction to the CoNLL-2004 shared task: Semantic role labeling. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, pages 89–97, Boston, Massachusetts, USA. Association for Computational Linguistics.
Xavier Carreras and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pages 152–164, Ann Arbor, Michi- gan. Association for Computational Linguistics.
Xiang Chen, Xin Xie, Ningyu Zhang, Jiahuan Yan, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2021. Adaprompt: Adaptive prompt-based finetuning for relation extraction. arXiv preprint arXiv:2104.07650.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv e-prints.
Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. All nlp tasks are generation tasks: A general pretraining framework. arXiv preprint arXiv:2103.10360.
Tianyu Gao, Adam Fisch, and Danqi Chen. 2020. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2021. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention.

Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dong- pil Seo, et al. 2021. What changes can large-scale language models bring? intensive study on hyper- clova: Billions-scale korean generative pretrained transformers. arXiv preprint arXiv:2109.04650.

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.

Xiang Lisa Li and Percy Liang. 2021. Prefix- tuning: Optimizing continuous prompts for genera- tion. arXiv preprint arXiv:2101.00190.

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2021. Gpt understands, too. arXiv:2103.10385.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretrain- ing Approach. arXiv e-prints.

Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2021. Noisy channel language model prompting for few-shot text classification. arXiv preprint arXiv:2108.04106.

Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. 2012. CoNLL- 2012 shared task: Modeling multilingual unre- stricted coreference in OntoNotes. In Joint Confer- ence on EMNLP and CoNLL - Shared Task, pages 1–40, Jeju Island, Korea. Association for Computa- tional Linguistics.

Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text trans- former. arXiv preprint arXiv:1910.10683.

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text.

Erik F Sang and Fien De Meulder. 2003. Intro- duction to the conll-2003 shared task: Language- independent named entity recognition. arXiv preprint cs/0306050.

Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language mod- els are also few-shot learners. arXiv preprint arXiv:2009.07118.

Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. SuperGLUE: A Stickier Benchmark for General-Purpose Lan- guage Understanding Systems. In NeurIPS 2019, pages 3261–3275.

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. Superglue: A stickier benchmark for general-purpose language un- derstanding systems. arXiv e-prints.

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv e-prints.

Hongru Wang, Mingyu Cui, Zimo Zhou, Gabriel Pui Cheong Fung, and Kam-Fai Wong. 2021a. Topi- crefine: Joint topic prediction and dialogue response generation for multi-turn end-to-end dialogue sys- tem. arXiv preprint arXiv:2109.05187.

Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, and Yang Liu. 2021b. Lan- guage models are good translators. arXiv preprint arXiv:2106.13627.

Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Ni- anwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadel- phia, PA, 23.

Lu Xu, Zhanming Jie, Wei Lu, and Lidong Bing. 2021. Better feature integration for named entity recogni- tion. arXiv preprint arXiv:2104.05316.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Car- bonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretrain- ing for language understanding. arXiv preprint arXiv:1906.08237.

Yanan Zheng, Jing Zhou, Yujie Qian, Ming Ding, Jian Li, Ruslan Salakhutdinov, Jie Tang, Sebas- tian Ruder, and Zhilin Yang. 2021. Fewnlu: Benchmarking state-of-the-art methods for few-shot natural language understanding. arXiv preprint arXiv:2109.12742.

Zexuan Zhong, Dan Friedman, and Danqi Chen. 2021. Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240.;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2021 PTuningV2PromptTuningCanBeCompa	Jie Tang Zhilin Yang Xiao Liu Kaixuan Ji Yicheng Fu Zhengxiao Du Weng Lam Tam			P-tuning V2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks				10.48550/arXiv.2110.07602		2021