Rewards

A reward is a number that says how good one attempt was. Reinforcement learning has a single job: make the choices that earn more of it. Everything a model learns about what you want, it learns through this one channel, which is why reward design is most of the work.

Rewards come from a few places. The cleanest are verifiable: run the tests, check the answer against a known solution, confirm the code compiles. These cost nothing and can't be argued with. When the goal is fuzzier, you fall back on a reward model or an LLM judge to score the output instead.

Two shapes matter. A sparse reward arrives once, at the end, when the task either passed or didn't. A dense reward gives feedback along the way. Sparse rewards are honest but hard to learn from, because one score for a long attempt barely hints at which step mattered. That's the credit assignment problem, and building a denser signal is how you fight it.

Adding those in-between signals is reward shaping, and it cuts both ways. A well-placed hint speeds up learning. A careless one teaches the wrong lesson, like rewarding a coding agent for opening the right file until it learns to open files instead of fixing bugs.

Underneath all of it sits one hazard: a model optimizes the reward you wrote, never the intent behind it. Any gap between the two is something a strong optimizer will find. Minding that gap is the whole subject of reward hacking, and it's why a plain, hard-to-fake reward beats a clever one almost every time.