Training

QLoRA

A memory-efficient fine-tuning method combining quantisation with LoRA adapters.

Full Definition

QLoRA (Quantised LoRA), introduced by Tim Dettmers et al. in 2023, fine-tunes large models by quantising the frozen base model to 4-bit precision while training LoRA adapters in 16-bit floating point. This combination reduces VRAM requirements so dramatically that a 65B parameter model can be fine-tuned on a single 48GB GPU — previously requiring multi-GPU clusters. QLoRA introduces several innovations: NF4 (NormalFloat 4-bit) quantisation, double quantisation for additional memory savings, and paged optimisers to handle memory spikes. It democratised fine-tuning of very large open-weight models for individual researchers and startups.

Examples

Fine-tuning Llama 2 70B on a single consumer A6000 GPU using QLoRA with NF4 quantisation, achieving near full fine-tune quality.

A solo developer fine-tuning Mistral 7B on their laptop using QLoRA in 2 hours on a dataset of 1,000 examples.

Apply this in your prompts

Prompt𝙸t𝙸n automatically uses techniques like QLoRA to build better prompts for you.

✦ Try it free

Related Terms

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that updates only small low-rank matric…

View →

Parameter-Efficient Fine-Tuning

A family of methods that fine-tune large models by updating only a small fractio…

View →

Fine-Tuning

Continuing training of a pretrained model on a smaller, task-specific dataset to…

View →

← Browse all 100 terms