Training

Pretraining

The initial phase of training a model on massive text data to learn general language representations.

Full Definition

Pretraining is the first and most compute-intensive phase of building a large language model. The model is trained on trillions of tokens from diverse internet sources, books, and code using a self-supervised objective — typically next-token prediction (causal language modelling) for GPT-style models or masked language modelling for BERT-style models. No human-labelled data is required; the training signal comes from predicting the natural continuations of text. Pretraining encodes broad world knowledge, language syntax, and reasoning patterns into the model's weights. It requires thousands of GPUs running for weeks and constitutes the majority of the total training cost for frontier models.

Examples

GPT-4 being pretrained on approximately 13 trillion tokens scraped from the internet, books, and code repositories.

Llama 3 pretraining on over 15 trillion tokens, including a higher proportion of code and multilingual data than its predecessors.

Apply this in your prompts

Prompt𝙸t𝙸n automatically uses techniques like Pretraining to build better prompts for you.

✦ Try it free

Related Terms

Base Model

A model trained only on next-token prediction over a large corpus, before any in…

View →

Transfer Learning

Reusing a model trained on one task as the starting point for a related task.…

View →

Fine-Tuning

Continuing training of a pretrained model on a smaller, task-specific dataset to…

View →

← Browse all 100 terms