Pareto Frontier

You rarely optimize one thing. You want a model that's smart and cheap, helpful and safe, fast and accurate. Those goals pull against each other, and the Pareto frontier is the honest way to picture the trade-off.

Plot every model you're choosing between on two axes, say quality against cost. Some are beaten on both at once: another model is cheaper and better. Those are dominated, and you can drop them. What remains, the models nothing beats on both axes at the same time, is the frontier. Each of them wins somewhere, so choosing between them means deciding what you care about.

This reframes the question "which model is best." There's no single best, only the best for a given constraint. On a tight latency budget you take the fast end of the frontier. With room to spend, you take the smart end. A model sitting off the frontier is the easy call to cut, because something already beats it outright.

Real progress is the whole frontier moving outward. When distillation or quantization makes a model cheaper at the same quality, or a better design makes it smarter at the same cost, the new model dominates last year's. It's also why the alignment tax stings: training for safety can knock a model off the capability frontier, and much of the work is clawing back to it.