2023 EmergingArchitecturesforLLMAppl

From GM-RKB
Jump to navigation Jump to search

Subject Headings: AI Agent Architecture.

Notes

Cited By

Quotes

Abstract

Large language models are a powerful new primitive for building software. But since they are so new - and behave so differently from normal computing resources - it’s not always obvious how to use them.

In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now.

The stack

Here’s our current view of the LLM app stack:

And here’s a list of links to each project for quick reference:

Data Pipelines Embedding Model Vector Database Playground Orchestration APIs/Plugins LLM Cache
Databricks OpenAI Pinecone OpenAI Langchain Serp Redis
Airflow Cohere Weaviate nat.dev LlamaIndex Wolfram SQLite
Unstructured Hugging Face ChromaDB Humanloop ChatGPT Zapier GPTCache
pgvector
Logging / LLMops Validation App Hosting LLM APIs (proprietary) LLM APIs (open) Cloud Providers Opinionated Clouds
Weights & Biases Guardrails Vercel OpenAI Hugging Face AWS Databricks
MLflow Rebuff Steamship Anthropic Replicate GCP Anyscale
PromptLayer Microsoft Guidance Streamlit Azure Mosaic
Helicone LMQL Modal CoreWeave Modal
RunPod

There are many different ways to build with LLMs, including training models from scratch, fine-tuning open-source models, or using hosted APIs. The stack we’re showing here is based on in-context learning, which is the design pattern we’ve seen the majority of developers start with (and is only possible now with foundation models).

...

At a very high level, the workflow can be divided into three stages:

  • Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.
  • Prompt construction / retrievall: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.
  • Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.

...

References

;

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2023 EmergingArchitecturesforLLMApplMatt Bornstein
Rajko Radovanovic
Emerging Architectures for LLM Application2023