Chain of Thought

Chain of thought is a model working through a problem in steps before giving its final answer, the way you'd work through a math problem on paper instead of blurting the result.

It started as a prompting trick. Add "let's think step by step" and accuracy on hard problems jumped, because the model used its own intermediate tokens as scratch space. Each step it writes becomes context for the next, so the reasoning compounds instead of having to happen in a single leap.

Newer reasoning models bake this in. Through reinforcement learning, they're trained to produce long internal chains of thought before answering, and to check their own work along the way. You don't have to prompt for it; the behavior lives in the model.

This is the clearest example of test-time compute: spending more effort during inference to get a better answer, rather than making the model bigger. Give the model more room to think and harder problems become solvable.

The trade is cost and latency. All that thinking is tokens, which take time to generate and money to produce. Reasoning shines on math, code, and logic, and it's overkill for simple lookups where a direct answer is faster and cheaper.