Foundations / Machine Learning

Machine Learning

You could write a spam filter by hand. It wouldn't last a week.

Write the rule: if the subject says "FREE MONEY," mark it spam. A spammer sends "F-R-E-E money," your rule misses it, you patch it, and they adapt again. Hand-written rules rot faster than you can type them.

So flip the problem. Collect thousands of emails, each labeled spam or not, and let an optimizer tune the model's numbers until its predictions match. You never wrote the rule; the system inferred it from examples. That's machine learning, the definition Tom Mitchell set in 1997: a program that improves at a task, by some measure, as it gains experience. The r2d3 visual introduction animates that fitting.

The same recipe scales. Show a model a large slice of the internet, have it predict the next word over and over, and you get GPT, Claude, and Gemini.

Not all AI learns this way. The 1966 chatbot ELIZA matched patterns and swapped in canned replies, with no data at all. So the terms nest: machine learning is the subset of AI that learns its rules from data, and deep learning is the slice of that built on many-layer neural networks.

One myth to kill. Ask a chatbot the same question twice and the answers differ, so machine learning looks random. It isn't. A trained model with frozen weights is a deterministic function; your spam filter returns the same label for the same email every time. A language model is no different. It outputs a probability distribution, and replies vary only because you sample from it, set by temperature. The learning lives in the data; the dice live in the sampler.