Prompt Compression
Reducing prompt length while preserving the information needed for accurate responses.
Full Definition
Prompt compression techniques shrink long prompts — particularly retrieved documents or conversation histories — to fit within a model's context window or to reduce token costs. Methods include extractive compression (removing irrelevant sentences), abstractive compression (summarising), selective retrieval (only including the most relevant chunks), and learned compression (using a smaller model to encode long context into a compact representation). As context windows grow, compression matters less for fitting text in, but remains critical for cost control and for mitigating the 'lost in the middle' phenomenon where models underweight centrally positioned content.
Examples
Using a summarisation model to condense a 50-page PDF to 2,000 tokens before passing it to a question-answering model.
Removing boilerplate and repeated content from a conversation history before appending it to a new prompt.
Apply this in your prompts
PromptITIN automatically uses techniques like Prompt Compression to build better prompts for you.
Related Terms
Context Window
The maximum number of tokens a model can process in a single input-output intera…
View →RAG (Retrieval-Augmented Generation)
Augmenting model responses by retrieving relevant documents from an external kno…
View →Token
The basic unit of text a language model processes, roughly corresponding to a wo…
View →