BloombergGPT LLM

From GM-RKB
Jump to navigation Jump to search

A BloombergGPT LLM is a domain-specific LLM for a finance domain.



References

2023

  • "BloombergGPT: How We Built a 50 Billion Parameter Financial Language Model." Toronto Machine Learning Series (TMLS), 2023-06-13
    • QUOTE: We will present BloombergGPT, a 50 billion parameter language model, purpose-built for finance and trained on a uniquely balanced mix of standard general-purpose datasets and a diverse array of financial documents from the Bloomberg archives. Building a large language model (LLM) is a costly and time-intensive endeavor. To reduce risk, we adhered closely to model designs and training strategies from recent successful models, such as OPT and BLOOM. Nevertheless, we faced numerous challenges during the training process, including loss spikes, unexpected parameter drifts, and performance plateaus.

      In this talk, we will discuss these hurdles and our responses, which included a complete training restart after weeks of effort. Our persistence paid off: BloombergGPT ultimately outperformed existing models on financial tasks by significant margins, while maintaining competitive performance on general LLM benchmarks. We will also provide several examples illustrating how BloombergGPT stands apart from general-purpose models.

      Our goal is to provide valuable insights into the specific challenges encountered when building LLMs and to offer guidance for those debating whether to embark on their own LLM journey, as well as for those who are already determined to do so.

    • NOTES:
      • Building a large language model requires making decisions about model code/architecture, datasets, and compute infrastructure. The Bloomberg team aimed to mitigate risk by largely copying an existing successful model (BigScience's Bloom) while focusing the additional data on the finance domain.
      • They used a mix of public datasets like C4 dataset and Wikipedia snapshot as well as private Bloomberg financial data over 15 years amounting to over 400 billion tokens. The total dataset was 200x the size of English Wikipedia.
      • Training very large models can be unstable. The Bloomberg team faced issues with loss curve flattening and exploding gradients, requiring debugging tricks like lowering learning rates. They hypothesized issues with layer normalization contributed.
      • After adjustments, they trained a 50 billion parameter model for 42 days before instability returned, though the model performed well on evaluations. Takeaways included starting small and ramping models up in size to diagnose issues earlier.
      • The model achieved state-of-the-art financial domain performance by training on a mix of general and domain-specific data, suggesting potential for domain-specific large language models.

2023