Context Length
The maximum number of tokens a model can handle in a single forward pass.
Full Definition
Context length (often used interchangeably with context window) is the hard upper bound on tokens — input plus output combined — that a model can process at once. Exceeding this limit causes the oldest tokens to be truncated, losing information. Context length is constrained by the quadratic scaling of attention computation with sequence length, though techniques like flash attention, sliding window attention, and RoPE (Rotary Position Embedding) have reduced these costs significantly. Extending context length has been a key frontier in model development: Claude 3's 200k-token context window, for example, can hold approximately 500 pages of text, enabling whole-document analysis.
Examples
GPT-4 Turbo's 128k context length allowing developers to include an entire codebase as context for a refactoring task.
A retrieval-augmented generation system that chunks documents into 512-token segments because the embedding model has a 512-token context limit.
Apply this in your prompts
PromptITIN automatically uses techniques like Context Length to build better prompts for you.