Home/Glossary/Streaming
Technical

Streaming

Sending model output tokens to the client incrementally as they are generated rather than all at once.

Full Definition

Streaming delivers model outputs token-by-token (or in small chunks) over a persistent HTTP connection (typically using Server-Sent Events or WebSockets) rather than waiting for the entire response to be generated before sending. This dramatically improves perceived latency: users see the first words in under a second rather than waiting 10–30 seconds for a long response to complete. Streaming is now the default mode for all major LLM APIs and consumer products. Implementing streaming on the client side requires handling partial JSON, managing UI state for incremental text rendering, and gracefully recovering from interrupted streams.

Examples

1

ChatGPT's typing-cursor effect, where words appear word-by-word in real time, is implemented via streaming from OpenAI's API.

2

A code generation tool showing the generated function appearing line by line while the model is still generating later lines.

Apply this in your prompts

PromptITIN automatically uses techniques like Streaming to build better prompts for you.

✦ Try it free

Related Terms

Inference

The process of running a trained model to generate predictions or responses from

View →

Latency

The time delay between sending a request to a model and receiving its first toke

View →

API

An Application Programming Interface that lets developers call AI model capabili

View →
← Browse all 100 terms