Technical

Context Length

The maximum number of tokens a model can handle in a single forward pass.

Full Definition

Context length (often used interchangeably with context window) is the hard upper bound on tokens — input plus output combined — that a model can process at once. Exceeding this limit causes the oldest tokens to be truncated, losing information. Context length is constrained by the quadratic scaling of attention computation with sequence length, though techniques like flash attention, sliding window attention, and RoPE (Rotary Position Embedding) have reduced these costs significantly. Extending context length has been a key frontier in model development: Claude 3's 200k-token context window, for example, can hold approximately 500 pages of text, enabling whole-document analysis.

Examples

GPT-4 Turbo's 128k context length allowing developers to include an entire codebase as context for a refactoring task.

A retrieval-augmented generation system that chunks documents into 512-token segments because the embedding model has a 512-token context limit.

Apply this in your prompts

Prompt𝙸t𝙸n automatically uses techniques like Context Length to build better prompts for you.

✦ Try it free

Related Terms

Context Window

The maximum number of tokens a model can process in a single input-output intera…

View →

Token

The basic unit of text a language model processes, roughly corresponding to a wo…

View →

Prompt Compression

Reducing prompt length while preserving the information needed for accurate resp…

View →

← Browse all 100 terms