Pretraining

Pretraining is the first and largest phase of teaching a model. You take an enormous amount of text and train the network on one task: predict the next token. Do that across enough of the internet and the model absorbs grammar, facts, styles, and a surprising amount of reasoning.

The work is less about the loop and more about the data. Collecting and cleaning it is harder than scraping pages. Feed the model low-quality text and you get low-quality predictions back. Garbage in, garbage out.

Mechanically, pretraining compresses all that text into the model's parameters. It's a giant loop that runs over the data many times, nudging weights a little on each pass. One full pass through the dataset is called an epoch.

The result is a base model. It's knowledgeable but raw, good at continuing text rather than following instructions or holding a conversation. Turning it into a helpful assistant is the job of the later phases: fine-tuning and reinforcement learning.