Training / Backpropagation›
Foundations
Tokens & Embeddings
Transformers
Training
Fine-Tuning & RL
Model Behavior
Inference
Using Models
Evals & Measurement
Backpropagation
A network has billions of weights. When a prediction is wrong, how do you know which ones to blame? Backpropagation answers that, and it's the engine that makes deep learning trainable.
It works in two passes. In the forward pass, input flows through the layers to produce a prediction and a loss. In the backward pass, you walk that error backward through the network, layer by layer, computing how much each weight contributed to it.
The output is a gradient: for every weight, a number that says which direction to nudge it, and how hard, to lower the loss. Backpropagation is really the chain rule from calculus applied at scale, reused so the work stays efficient even across hundreds of layers.
This is one half of how a model learns. Backpropagation finds the direction to move; gradient descent takes the step. Run the pair millions of times and the network slowly gets better at predicting.
It also solves what's called credit assignment across structure: spreading the blame for one error across billions of parameters at once. The version of that problem across time, in reinforcement learning, is much harder.