LLM Scaling Law

From GM-RKB
Jump to navigation Jump to search

A LLM Scaling Law is a scaling law that can apply to a Large Language Model.

  • Context:
    • It can (often) involve relationships between the Model Size, Training Dataset Size, Training Cost, and Post-Training Performance.
    • It can be with respect to Cross-Entropy Loss.
    • It can include findings that the model size and the number of training tokens should be scaled equally for compute-optimal training.
    • It can suggest that increasing model size shows diminishing returns and performance saturation, especially over 100 billion parameters.
    • It can suggest that dataset size improvements also show diminishing benefits.
    • It can suggest that optimal configurations balance model width, depth, batch size, and memory bandwidth depending on hardware.
    • It can include guidelines for determining the optimal size of a model for a given quantity of compute.
    • It can explore the interplay between model size, training data size, and computing when training large language models to find the most efficient balance.
    • It can suggest that more research is required to understand further the complex relationships between model scale, data scale, and model quality across different tasks.
    • ...
  • Example(s):
  • Counter-Example(s):
  • See: Large Language Model, Model Performance, Cross-Entropy Loss, Neural Network, Chinchilla LLM.


References

2022