Deep Learning

Your code already stacks meaning. Raw bytes become integers, integers become structs, and structs become the objects you program against. Each layer speaks in richer terms than the one below, and you stop thinking in bytes long before you reach business logic.

A neural network can build that same ladder on its own. Machine learning with neural networks stacked many layers deep is deep learning. The "deep" is literal: a shallow network has one or two hidden layers, while a deep one runs dozens to hundreds. GPT-3 stacks 96.

Depth is what lets a network learn a hierarchy of features with nobody labeling them. The lowest layers catch raw detail; each layer up trades detail for abstraction, until the top layers work in concepts. Because the model invents these features instead of borrowing hand-written ones, the field also calls this representation learning. It's why one model can read a question in English and answer in Japanese, having picked up the shape of both from text alone.

The catch: depth alone isn't the magic. Layers pay off only with the non-linear activations between them (without those, the stack collapses into one), plus the data and compute to train them. The deepest networks wouldn't train at all until residual connections (ResNet, 2015) fixed vanishing gradients. Capacity you can't train is dead weight.

So deep learning lets a network write its own abstraction layers, from raw input up to meaning.

Read on with parameters and scaling laws, or see LeCun et al. in Nature.