2023 EmergingArchitecturesforLLMAppl

(Bornstein & Radovanovic, 2023) ⇒ Matt Bornstein, and Rajko Radovanovic. (2023). “Emerging Architectures for LLM Application.”

Subject Headings: AI Agent Architecture.

Notes

Cited By

http://scholar.google.com/scholar?q=%222023%22+Emerging+Architectures+for+LLM+Application

Quotes

Abstract

Large language models are a powerful new primitive for building software. But since they are so new - and behave so differently from normal computing resources - it’s not always obvious how to use them.

In this post, we’re sharing a reference architecture for the emerging LLM app stack. It shows the most common systems, tools, and design patterns we’ve seen used by AI startups and sophisticated tech companies. This stack is still very early and may change substantially as the underlying technology advances, but we hope it will be a useful reference for developers working with LLMs now.

The stack

Here’s our current view of the LLM app stack:

And here’s a list of links to each project for quick reference:

Data Pipelines	Embedding Model	Vector Database	Playground	Orchestration	APIs/Plugins	LLM Cache
Databricks	OpenAI	Pinecone	OpenAI	Langchain	Serp	Redis
Airflow	Cohere	Weaviate	nat.dev	LlamaIndex	Wolfram	SQLite
Unstructured	Hugging Face	ChromaDB	Humanloop	ChatGPT	Zapier	GPTCache
		pgvector

Logging / LLMops	Validation	App Hosting	LLM APIs (proprietary)	LLM APIs (open)	Cloud Providers	Opinionated Clouds
Weights & Biases	Guardrails	Vercel	OpenAI	Hugging Face	AWS	Databricks
MLflow	Rebuff	Steamship	Anthropic	Replicate	GCP	Anyscale
PromptLayer	Microsoft Guidance	Streamlit			Azure	Mosaic
Helicone	LMQL	Modal			CoreWeave	Modal
					RunPod

There are many different ways to build with LLMs, including training models from scratch, fine-tuning open-source models, or using hosted APIs. The stack we’re showing here is based on in-context learning, which is the design pattern we’ve seen the majority of developers start with (and is only possible now with foundation models).

...

At a very high level, the workflow can be divided into three stages:

Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.
Prompt construction / retrievall: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.
Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.

...

References

;

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2023 EmergingArchitecturesforLLMAppl	Matt Bornstein Rajko Radovanovic			Emerging Architectures for LLM Application						2023