Low-Rank Adaption (LoRA) LLM Fine-Tuning Algorithm

From GM-RKB
Jump to navigation Jump to search

A Low-Rank Adaption (LoRA) LLM Fine-Tuning Algorithm is a adapter-based LLM fine-tuning algorithm that approximates the LLM weight matrix with two low-rank matrices.

  • Context:
    • It can (typically) leverage the concept of lower-rank matrices, which reduces the number of trainable parameters significantly.
    • It can (often) introduce two new small matrices (\(W_A\) and \(W_B\)) during training, keeping the original weights fixed, and computes the final output as the original output plus the product of the inputs with the adapted weights, scaled by a factor.
    • It can (typically) efficiently adapt large models to new tasks without retraining the entire model.
    • It can have a pattern of:
    • It can be applied not only to language models but also to image models for style or character specialization.
    • ...
  • Example(s):
  • Counter-Example(s):
    • Full fine-tuning, where all parameters of a model are adjusted.
    • Prefix tuning and Adapter modules, which are alternative parameter-efficient fine-tuning methods.
  • See: LLM Fine-Tuning, Fine-Tuning (Deep Learning), Parameter-Efficient Fine-Tuning.
    • Fine-tuning a GPT-3 model for a specific domain using LoRA, which requires significantly fewer computational resources than full fine-tuning.
    • Adapting a Stable Diffusion model to generate images in a specific style using LoRA.


References

2024

2023

2023

  • (Wikipedia, 2023) ⇒ https://en.wikipedia.org/wiki/Fine-tuning_(deep_learning)#Low-rank_adaption Retrieved:2023-10-4.
    • Low-rank adaption (LoRA) is an adapter-based technique for efficiently finetuning models. The basic idea is to design a low-rank matrix that is then added to the original matrix. An "adapter" in this context is a collection of low-rank matrices, which when added to a base model, produces a finetuned model. It allows for performance that approaches full-model fine-tuning with less space requirement. A language model with billions of parameters may be LoRA fine-tuned with only several millions of parameters. LoRA-based fine-tuning has become popular in the Stable Diffusion community. Support for LoRA is being integrated into the Diffusers library from Hugging Face. Support for LoRA and similar techniques is also available for a wide range of other models through Hugging Face's Parameter-Efficient Fine-Tuning (PEFT) package.

2023

  • GBard
    • LoRA Fine-Tuning, or Low-Rank Adaptation of Large Language Models, is a technique for fine-tuning large language models (LLMs) in a more efficient and parameter-efficient way than traditional fine-tuning.

      Traditional fine-tuning involves optimizing all of the parameters in the LLM. This can be computationally expensive and time-consuming, especially for very large LLMs like GPT-3 and LaMDA.

      LoRA Fine-Tuning works by approximating the LLM's weight matrix with two smaller matrices. These two matrices are then fine-tuned instead of the full weight matrix. This results in a significant reduction in the number of parameters that need to be optimized, which can lead to faster training and lower computational costs.

      Another advantage of LoRA Fine-Tuning is that it can be used to target specific modules in the LLM architecture. This is useful for tasks where it is only necessary to adapt certain parts of the model, such as the attention layer or the decoder.

      LoRA Fine-Tuning has been shown to achieve similar or better performance than traditional fine-tuning on a variety of tasks, including text summarization, question answering, and code generation.

    • Here is a high-level overview of the LoRA Fine-Tuning process:

2023

2021