Streaming
Sending model output tokens to the client incrementally as they are generated rather than all at once.
Full Definition
Streaming delivers model outputs token-by-token (or in small chunks) over a persistent HTTP connection (typically using Server-Sent Events or WebSockets) rather than waiting for the entire response to be generated before sending. This dramatically improves perceived latency: users see the first words in under a second rather than waiting 10–30 seconds for a long response to complete. Streaming is now the default mode for all major LLM APIs and consumer products. Implementing streaming on the client side requires handling partial JSON, managing UI state for incremental text rendering, and gracefully recovering from interrupted streams.
Examples
ChatGPT's typing-cursor effect, where words appear word-by-word in real time, is implemented via streaming from OpenAI's API.
A code generation tool showing the generated function appearing line by line while the model is still generating later lines.
Apply this in your prompts
PromptITIN automatically uses techniques like Streaming to build better prompts for you.