Model Behavior / Credit Assignment

Credit Assignment

Credit assignment is one of the deepest problems in learning of any kind. The reward shows up at the end, but many actions led there. Which ones deserve the credit?

Lose a 40-move chess game. Which move was the blunder? An agent works for an hour, takes hundreds of steps reading files, searching, and editing, and the result disappoints. Which step caused it? A single score for the whole episode gives the optimizer almost nothing to point at.

The signal in reinforcement learning is sparse (one judgment for a long sequence) and delayed (it arrives far from the action that mattered). With nothing finer to go on, the optimizer makes diffuse changes everywhere, and that smearing is a big source of RL's strange side effects.

Most recent progress in post-training is, at heart, credit assignment engineering: getting feedback closer to the moment that earned it. A judge that grades one message, not a whole session, narrows the blame. A hint placed at the exact step that went wrong is narrower still.

Worth noting: backpropagation already solves credit assignment across a network's structure. The hard, open version is credit assignment across time, which is what this page is about.