2021 PTuningV2PromptTuningCanBeCompa

From GM-RKB
Jump to navigation Jump to search

Subject Headings: LLM Prompt Tuning.

Notes

  • It was followed by this paper: (Liu, Ji et al., 2022).
  • It provides insights into Natural Language Understanding (NLU) and Language Model Tuning, focusing on Prompt Tuning, a method that tunes continuous prompts while keeping the Language Model frozen, reducing storage and memory usage.
  • It addresses the limitations of previous Prompt Tuning methods, which underperformed for normal-sized Pretrained Models and struggled with hard sequence labeling tasks, such as Extractive Question Answering and Named Entity Recognition (NER).
  • It introduces P-Tuning v2, an optimized form of Deep Prompt Tuning, adapted for NLU tasks, demonstrating effectiveness across various model scales and NLU tasks.
  • It highlights that P-Tuning v2 matches the performance of Fine-Tuning while requiring significantly fewer tuned parameters, offering an efficient alternative in terms of parameter tuning and resource utilization.
  • It emphasizes the universal effectiveness of P-Tuning v2 across different model scales and NLU tasks, improving upon previous methods, particularly in smaller and challenging models.
  • It details critical aspects of optimization and implementation, including the use of continuous prompts in every layer of the Pretrained Model, varying prompt lengths for different tasks, and applying a classification head for sequence labeling tasks.
  • It presents experimental results showing that P-Tuning v2 matches or surpasses Fine-Tuning performance across different models (from 300M to 10B parameters) and on various NLU tasks, including challenging sequence tagging tasks.
  • It posits P-Tuning v2 as a strong alternative to Fine-Tuning and a baseline for future research in NLU and Language Model Tuning.
  • In the context of sequence tagging tasks, it explores:
    • Named Entity Recognition (NER): Utilizing datasets like CoNLL03, OntoNotes 5.0, and CoNLL04, the model is trained on standard train-develop-test splits, labeled in IOB2 format, to mark entities within a text.
    • Extractive Question Answering: Using SQuAD versions 1.1 and 2.0, the task involves classifying tokens in a context given a question to extract the answer, with labels like ‘start’ or ‘end’ assigned to each token.
    • Semantic Role Labeling (SRL): Evaluated on CoNLL05 and CoNLL12 datasets, this involves assigning semantic roles to words or phrases in a sentence, with the target verb token added to the end of each sentence for verb recognition.

Cited By

Quotes

Abstract

Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence labeling tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks. It matches the performance of finetuning while having only 0.1%-3% tuned parameters. Our method P-Tuning v2 is an implementation of Deep Prompt Tuning (Li and Liang, 2021; Qin and Eisner, 2021) optimized and adapted for NLU. Given the universality and simplicity of P-Tuning v2, we believe it can serve as an alternative to finetuning and a strong baseline for future research.Our code and data are released at this https URL.

5 Conclusions

We present P-tuning v2, a prompt tuning method. Despite its relatively limited technical novelty, it contributes to a novel finding that prompt tuning can be comparable to fine-tuning universally across scales (from 330M to 10B parameters) and tasks. With high accuracy and parameter efficiency, P- Tuning v2 can be a potential alternative for fine- tuning and a strong baseline for future work.


References

  • Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
  • Xavier Carreras and Lluís Màrquez. 2004. Introduction to the CoNLL-2004 shared task: Semantic role labeling. In Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004, pages 89–97, Boston, Massachusetts, USA. Association for Computational Linguistics.
  • Xavier Carreras and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pages 152–164, Ann Arbor, Michi- gan. Association for Computational Linguistics.
  • Xiang Chen, Xin Xie, Ningyu Zhang, Jiahuan Yan, Shumin Deng, Chuanqi Tan, Fei Huang, Luo Si, and Huajun Chen. 2021. Adaprompt: Adaptive prompt-based finetuning for relation extraction. arXiv preprint arXiv:2104.07650.
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv e-prints.
  • Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. All nlp tasks are generation tasks: A general pretraining framework. arXiv preprint arXiv:2103.10360.
  • Tianyu Gao, Adam Fisch, and Danqi Chen. 2020. Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.
  • Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2021. Ppt: Pre-trained prompt tuning for few-shot learning. arXiv preprint arXiv:2109.04332.
  • Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. 2020. Deberta: Decoding-enhanced bert with disentangled attention.

Boseop Kim, HyoungSeok Kim, Sang-Woo Lee, Gichang Lee, Donghyun Kwak, Dong Hyeon Jeon, Sunghyun Park, Sungju Kim, Seonhoon Kim, Dong- pil Seo, et al. 2021. What changes can large-scale language models bring? intensive study on hyper- clova: Billions-scale korean generative pretrained transformers. arXiv preprint arXiv:2109.04650.

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.

Xiang Lisa Li and Percy Liang. 2021. Prefix- tuning: Optimizing continuous prompts for genera- tion. arXiv preprint arXiv:2101.00190.

Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2021. Gpt understands, too. arXiv:2103.10385.

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretrain- ing Approach. arXiv e-prints.

Sewon Min, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2021. Noisy channel language model prompting for few-shot text classification. arXiv preprint arXiv:2108.04106.

Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. 2012. CoNLL- 2012 shared task: Modeling multilingual unre- stricted coreference in OntoNotes. In Joint Confer- ence on EMNLP and CoNLL - Shared Task, pages 1–40, Jeju Island, Korea. Association for Computa- tional Linguistics.

Guanghui Qin and Jason Eisner. 2021. Learning how to ask: Querying lms with mixtures of soft prompts. arXiv preprint arXiv:2104.06599.

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text trans- former. arXiv preprint arXiv:1910.10683.

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text.

Erik F Sang and Fien De Meulder. 2003. Intro- duction to the conll-2003 shared task: Language- independent named entity recognition. arXiv preprint cs/0306050.

Timo Schick and Hinrich Schütze. 2020. It’s not just size that matters: Small language mod- els are also few-shot learners. arXiv preprint arXiv:2009.07118.

Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. SuperGLUE: A Stickier Benchmark for General-Purpose Lan- guage Understanding Systems. In NeurIPS 2019, pages 3261–3275.

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. Superglue: A stickier benchmark for general-purpose language un- derstanding systems. arXiv e-prints.

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv e-prints.

Hongru Wang, Mingyu Cui, Zimo Zhou, Gabriel Pui Cheong Fung, and Kam-Fai Wong. 2021a. Topi- crefine: Joint topic prediction and dialogue response generation for multi-turn end-to-end dialogue sys- tem. arXiv preprint arXiv:2109.05187.

Shuo Wang, Zhaopeng Tu, Zhixing Tan, Wenxuan Wang, Maosong Sun, and Yang Liu. 2021b. Lan- guage models are good translators. arXiv preprint arXiv:2106.13627.

Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Ni- anwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadel- phia, PA, 23.

Lu Xu, Zhanming Jie, Wei Lu, and Lidong Bing. 2021. Better feature integration for named entity recogni- tion. arXiv preprint arXiv:2104.05316.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Car- bonell, Ruslan Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretrain- ing for language understanding. arXiv preprint arXiv:1906.08237.

Yanan Zheng, Jing Zhou, Yujie Qian, Ming Ding, Jian Li, Ruslan Salakhutdinov, Jie Tang, Sebas- tian Ruder, and Zhilin Yang. 2021. Fewnlu: Benchmarking state-of-the-art methods for few-shot natural language understanding. arXiv preprint arXiv:2109.12742.

Zexuan Zhong, Dan Friedman, and Danqi Chen. 2021. Factual probing is [mask]: Learning vs. learning to recall. arXiv preprint arXiv:2104.05240.;


 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2021 PTuningV2PromptTuningCanBeCompaJie Tang
Zhilin Yang
Xiao Liu
Kaixuan Ji
Yicheng Fu
Zhengxiao Du
Weng Lam Tam
P-tuning V2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks10.48550/arXiv.2110.076022021