Model Specs

A model spec is a document that writes down how a model should behave. How it handles ambiguity, when it pushes back, when it keeps working versus stops to ask. OpenAI publishes its Model Spec, and Anthropic publishes Claude's constitution.

Why bother writing it down? Because "good behavior" is a matter of taste, and taste differs. Two models can both be excellent and still feel different, the same way two great athletes have different styles. A spec pins down the intended style so a whole team can train toward the same target.

Then comes the hard part: the conversion problem. Gradient descent doesn't read English. A flawless written guide won't make a model good any more than a manual makes you an expert. The words have to become something training understands: a distribution to match, a score to climb, or a preference between two outputs.

That conversion is the daily work of model behavior teams. A spec line like "keep working on long tasks instead of stopping early" becomes scenarios, a judge that grades them, and an evaluation that tracks whether the behavior is improving. The spec is the target; everything else turns it into training signal.