Training

Unsupervised Learning

Training a model on raw data without human-provided labels, discovering structure autonomously.

Full Definition

Unsupervised learning trains models on raw, unlabelled data, letting the model discover patterns, clusters, or representations without explicit guidance. In language modelling, the dominant unsupervised approach is self-supervised learning: the model creates its own labels from the data (predict the next token; reconstruct a masked token) and learns by minimising prediction error. This enables training on effectively unlimited text data scraped from the internet. The representations learned through unsupervised pretraining encode rich semantic and syntactic information that transfers to downstream supervised tasks. Autoencoding (BERT), causal language modelling (GPT), and contrastive learning (CLIP) are all self-supervised unsupervised techniques.

Examples

Word2Vec learning that 'king' − 'man' + 'woman' ≈ 'queen' purely from co-occurrence statistics in unlabelled text.

GPT-3's pretraining objective: predict the next token in a sequence, learning from 300 billion tokens with no human labelling.

Apply this in your prompts

Prompt𝙸t𝙸n automatically uses techniques like Unsupervised Learning to build better prompts for you.

✦ Try it free

Related Terms

Pretraining

The initial phase of training a model on massive text data to learn general lang…

View →

Supervised Learning

Training a model on input-output pairs where the correct output is provided as a…

View →

Embedding

A dense numerical vector that represents a token, sentence, or document in a con…

View →

← Browse all 100 terms