Deep learning

Article
Topic area	Machine Learning
Difficulty	Introductory

Other languages:

English
Español
中文

Deep learning is a subfield of Machine Learning that uses artificial neural networks with many layers — and millions to billions of parameters — to learn hierarchical representations directly from raw data. It underpins most of the recent breakthroughs in computer vision, natural language processing, speech recognition, and scientific discovery.

Overview

Classical machine learning relied on hand-engineered features: a practitioner would design pixel statistics, n-gram counts, or acoustic descriptors, and a relatively shallow model would map these features to outputs. Deep learning removes this bottleneck. A deep neural network learns its own features layer by layer, with each successive layer composing simpler patterns from the layer below into more abstract concepts.

The qualifier "deep" refers to the depth of the computation graph rather than any particular biological fidelity. Modern systems routinely stack tens to hundreds of layers and rely on three coupled ingredients that became simultaneously available in the early 2010s: large labelled datasets, massively parallel hardware (GPUs and later TPUs), and stable optimisation techniques. Together they made it practical to train networks whose representational capacity dwarfs anything previously feasible.

Deep learning is often credited with shifting AI from rule-based and feature-engineered systems toward a paradigm of end-to-end learning, in which a single differentiable model is trained jointly to map raw inputs to task outputs.

Key Concepts

Hierarchical representation learning — successive layers transform the input into representations of increasing abstraction; the network discovers features rather than receiving them.
Distributed representations — concepts are encoded as patterns of activation across many units, allowing combinatorial generalisation that one-hot or symbolic schemes cannot match.
Differentiable computation — every operation is (almost everywhere) differentiable, so gradients flow through the entire model and parameters are tuned by gradient-based optimisation.
End-to-end training — the entire pipeline, from raw input to final prediction, is optimised against a single loss, which removes the need for hand-tuned intermediate stages.
Inductive biases via architecture — convolution encodes translation equivariance, recurrence encodes temporal locality, Lua error: Internal error: The interpreter exited with status 1. encodes pairwise interaction; the choice of architecture injects assumptions appropriate for the data.
Scale — empirical scaling laws show that loss decreases predictably as a power of model size, dataset size, and compute, motivating ever-larger models.

History

Deep learning has roots that long predate its modern dominance. The perceptron (Rosenblatt 1958) and the early multilayer models of the 1960s established the basic neuron abstraction, but were limited by the lack of an effective training procedure for hidden layers. The reinvention and popularisation of backpropagation by Rumelhart, Hinton, and Williams in 1986 made multi-layer training feasible, and Yann LeCun's LeNet (1989, refined through the 1990s) demonstrated end-to-end learning of handwritten digits with a convolutional network.

Through the 1990s and early 2000s neural networks were largely overshadowed by support vector machines, kernel methods, and probabilistic graphical models. Renewed interest came from work on deep belief networks and unsupervised Lua error: Internal error: The interpreter exited with status 1. (Hinton, Salakhutdinov, Bengio, around 2006), which showed that depth was tractable if initialisation was handled carefully.

The decisive turning point was AlexNet (Krizhevsky, Sutskever, Hinton, 2012), which won the ImageNet challenge by a wide margin and demonstrated the practical force of GPU-trained convolutional networks with Dropout and cross-entropy objectives. The years that followed saw rapid architectural progress: VGG and GoogLeNet (2014), ResNet (He et al. 2015) and its residual connections, Lua error: Internal error: The interpreter exited with status 1. models with Lua error: Internal error: The interpreter exited with status 1., and the Lua error: Internal error: The interpreter exited with status 1. (Vaswani et al. 2017). The Lua error: Internal error: The interpreter exited with status 1. in turn enabled large language models (BERT 2018, GPT-2 2019, GPT-3 2020) and modern multimodal systems.

Key Approaches

A typical deep model is a parameterised function $f_\theta : \mathcal{X} \to \mathcal{Y}$ trained by minimising an empirical risk:

\mathcal{L}(\theta) = \frac{1}{N}\sum_{i=1}^{N} \ell\bigl(f_\theta(x_i),\, y_i\bigr) + \lambda\, R(\theta)

where $\ell$ is a per-example loss (e.g. Lua error: Internal error: The interpreter exited with status 1. for classification, squared error for regression) and $$ R $$ is an optional regulariser. Gradients $\nabla_\theta \mathcal{L}$ are computed by backpropagation and parameters are updated with stochastic gradient descent or adaptive methods such as Lua error: Internal error: The interpreter exited with status 1.:

\theta_{t+1} = \theta_t - \eta\, \widehat{\nabla}_\theta \mathcal{L}(\theta_t)

The dominant architectural families are:

Convolutional networks — translation-equivariant feature extractors for grid-structured data; foundational in vision.
Recurrent networks (Lua error: Internal error: The interpreter exited with status 1., GRU) — state-carrying models for sequences, central to early speech and language work.
Lua error: Internal error: The interpreter exited with status 1. — built around the attention mechanism, where outputs are computed as $\operatorname{Attention}(Q,K,V)=\operatorname{softmax}(QK^\top/\sqrt{d_k})V$ ; now the default for language and increasingly for vision and audio.
Graph neural networks — generalise Lua error: Internal error: The interpreter exited with status 1. to nodes and edges, used for molecules, citation networks, and social graphs.
Autoencoders and variational autoencoders — encoder–decoder pairs trained to compress and reconstruct, useful for representation learning and generation.
Generative adversarial networks — a generator and discriminator trained in a minimax game to produce realistic samples.
Diffusion models — generative models that learn to invert a gradual noising process, dominant in modern image and video synthesis.

Effective training depends on auxiliary techniques: careful initialisation (Xavier, He), normalisation (batch, layer, group), Lua error: Internal error: The interpreter exited with status 1. (Dropout, Lua error: Internal error: The interpreter exited with status 1., data augmentation), and learning-rate schedules (warm-up, cosine decay). Increasingly, self-supervised and Lua error: Internal error: The interpreter exited with status 1. objectives are used to learn general-purpose representations from unlabelled data, which are then adapted to downstream tasks via Lua error: Internal error: The interpreter exited with status 1. or transfer learning.

A loose taxonomy of training regimes:

Regime	Signal	Typical use
Supervised	labelled $$ (x, y) $$ pairs	image classification, machine translation
Self-supervised	pretext task derived from $$ x $$ alone	Lua error: Internal error: The interpreter exited with status 1. language and vision models
Unsupervised / generative	likelihood of $$ x $$	autoencoders, diffusion, GANs
Reinforcement	scalar reward from an environment	game playing, robotics, RLHF for alignment

Connections

Deep learning sits at the intersection of several long-standing fields. As a form of Machine Learning, it inherits the bias–variance trade-off, generalisation theory, and concerns about overfitting. It is built on top of neural networks and depends critically on Backpropagation for credit assignment and on Gradient Descent (in particular Stochastic Gradient Descent) for optimisation. Classification heads typically combine a softmax output with a cross-entropy loss, while other losses are chosen to match the task structure.

Architecturally, CNNs specialise the general framework to spatial data, RNNs to sequential data, and Lua error: Internal error: The interpreter exited with status 1. to general set- and sequence-structured data via attention. In language and search, word embeddings were an early demonstration that deep models could learn meaningful continuous representations of discrete symbols. Modern reinforcement learning, Lua error: Internal error: The interpreter exited with status 1., and many areas of computational science now rely on deep models as drop-in function approximators.

References

LeCun, Y., Bengio, Y. and Hinton, G. (2015). "Deep learning". Nature, 521, 436–444.
Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. MIT Press.
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). "Learning representations by back-propagating errors". Nature, 323, 533–536.
Krizhevsky, A., Sutskever, I. and Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks". NeurIPS.
He, K., Zhang, X., Ren, S. and Sun, J. (2016). "Deep Residual Learning for Image Recognition". CVPR.
Vaswani, A. et al. (2017). "Lua error: Internal error: The interpreter exited with status 1. Is All You Need". NeurIPS.
Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks, 61, 85–117.