Low-Rank Adaption (LoRA) LLM Fine-Tuning Algorithm

A Low-Rank Adaption (LoRA) LLM Fine-Tuning Algorithm is a adapter-based LLM fine-tuning algorithm that approximates the LLM weight matrix with two low-rank matrices.

Context:
- It can (typically) leverage the concept of lower-rank matrices, which reduces the number of trainable parameters significantly.
- It can (often) introduce two new small matrices (\(W_A\) and \(W_B\)) during training, keeping the original weights fixed, and computes the final output as the original output plus the product of the inputs with the adapted weights, scaled by a factor.
- It can (typically) efficiently adapt large models to new tasks without retraining the entire model.
- It can have a pattern of:
  - Initialize two low-rank matrices, A and B.
  - Compute the low-rank approximation of the LLM's weight matrix, W, as W ≈ AB.
  - Fine-tune the matrices A and B.
  - To make inference, use the fine-tuned matrices A and B to approximate the LLM's weight matrix, W, as W ≈ AB.
- It can be applied not only to language models but also to image models for style or character specialization.
- ...
Example(s):
- as proposed in (Hu et al., 2021).
- as proposed in (Hayou et al., 2024).
- ...
Counter-Example(s):
- Full fine-tuning, where all parameters of a model are adjusted.
- Prefix tuning and Adapter modules, which are alternative parameter-efficient fine-tuning methods.
See: LLM Fine-Tuning, Fine-Tuning (Deep Learning), Parameter-Efficient Fine-Tuning.
- Fine-tuning a GPT-3 model for a specific domain using LoRA, which requires significantly fewer computational resources than full fine-tuning.
- Adapting a Stable Diffusion model to generate images in a specific style using LoRA.

References

2024

(Hayou et al., 2024) ⇒ Soufiane Hayou, Nikhil Ghosh, and Bin Yu. (2024). “LoRA +: Efficient Low Rank Adaptation of Large Models.” arXiv preprint arXiv:2402.12354

2023

https://www.youtube.com/watch?v=hf5N-SlqRmA
- How to code long-context LLM: LongLoRA explained on LLama 2 100K

2023

(Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)#Low-rank_adaption Retrieved:2023-10-4.
- Low-rank adaption (LoRA) is an adapter-based technique for efficiently finetuning models. The basic idea is to design a low-rank matrix that is then added to the original matrix. An "adapter" in this context is a collection of low-rank matrices, which when added to a base model, produces a finetuned model. It allows for performance that approaches full-model fine-tuning with less space requirement. A language model with billions of parameters may be LoRA fine-tuned with only several millions of parameters. LoRA-based fine-tuning has become popular in the Stable Diffusion community. Support for LoRA is being integrated into the Diffusers library from Hugging Face. Support for LoRA and similar techniques is also available for a wide range of other models through Hugging Face's Parameter-Efficient Fine-Tuning (PEFT) package.

2023

GBard
- LoRA Fine-Tuning, or Low-Rank Adaptation of Large Language Models, is a technique for fine-tuning large language models (LLMs) in a more efficient and parameter-efficient way than traditional fine-tuning.
  Traditional fine-tuning involves optimizing all of the parameters in the LLM. This can be computationally expensive and time-consuming, especially for very large LLMs like GPT-3 and LaMDA.
  LoRA Fine-Tuning works by approximating the LLM's weight matrix with two smaller matrices. These two matrices are then fine-tuned instead of the full weight matrix. This results in a significant reduction in the number of parameters that need to be optimized, which can lead to faster training and lower computational costs.
  Another advantage of LoRA Fine-Tuning is that it can be used to target specific modules in the LLM architecture. This is useful for tasks where it is only necessary to adapt certain parts of the model, such as the attention layer or the decoder.
  LoRA Fine-Tuning has been shown to achieve similar or better performance than traditional fine-tuning on a variety of tasks, including text summarization, question answering, and code generation.
- Here is a high-level overview of the LoRA Fine-Tuning process:
  - Initialize two low-rank matrices, A and B.
  - Compute the low-rank approximation of the LLM's weight matrix, W, as W ≈ AB.
  - Fine-tune the matrices A and B.
  - To make inference, use the fine-tuned matrices A and B to approximate the LLM's weight matrix, W, as W ≈ AB.

2023

(Li, Yu et al., 2023) ⇒ Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, and Tuo Zhao. (2023). “LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models.” doi:10.48550/arXiv.2310.08659

2021

(Hu et al., 2021) ⇒ Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. (2021). “LoRA: Low-rank Adaptation of Large Language Models.” arXiv preprint arXiv:2106.09685

Low-Rank Adaption (LoRA) LLM Fine-Tuning Algorithm

References

2024

2023

2023

2023

2023

2021

Navigation menu

Search