Training / Scaling Laws›
Foundations
Tokens & Embeddings
Transformers
Training
Fine-Tuning & RL
Model Behavior
Inference
Using Models
Evals & Measurement
Scaling Laws
Scaling laws are the reason the field bet so heavily on size. In a 2020 paper, researchers showed that as you grow a model's parameters, training data, and compute, its performance improves in a smooth, predictable curve.
That predictability is the surprising part. You can train small models, measure the trend, and forecast how a much larger one will do before you spend the money to build it. Capability stopped being a guess and became something you could plan for.
For years the move was simple: add more compute, more data, and more parameters. It kept working, and it's a big reason models leapt ahead so fast.
The story has shifted since. Pretraining scaling still works, but the bigger gains lately come from test-time compute: giving a model more room to think during inference, through reasoning and tool use, instead of only making the model bigger. There's more than one axis to scale, and the interesting work now is choosing which one to push.