Temperature

Temperature is the dial you reach for most often when using a model. It controls how random the output is.

At each step the model produces a probability for every possible next token. Temperature reshapes those probabilities before one is picked. Low temperature sharpens the distribution toward the single most likely token, so output gets focused and repeatable. High temperature flattens it, giving less likely tokens a real chance, so output gets more varied and surprising.

Temperature

The cat sat on the ___

mat

52%

floor

25%

couch

13%

table

bed

chair

Temperature0.8

Balanced: a mix of likely and diverse tokens

Set it to zero and the model becomes nearly deterministic, almost always taking the top choice. That's what you want for extraction, classification, or anything where you need the same answer twice. Turn it up and you get range, which helps for brainstorming or creative writing, at the cost of more mistakes and drift.

There's no single correct value. It's a knob you tune to the task. A good default sits in the middle, and you move it based on whether you're optimizing for reliability or for variety.

Temperature is one of a few inference-time sampling settings. You'll also meet top-p and top-k, which limit the pool of candidate tokens before temperature does its reshaping.