Neural Networks: Difference between revisions
(Force re-parse after Math source-mode rollout (v1.2.0)) Tags: ci-deploy Reverted |
([deploy-bot] Deploy from CI (8c92aeb)) Tags: ci-deploy Manual revert |
||
| (One intermediate revision by the same user not shown) | |||
| Line 111: | Line 111: | ||
[[Category:Introductory]] | [[Category:Introductory]] | ||
[[Category:Neural Networks]] | [[Category:Neural Networks]] | ||
Latest revision as of 07:08, 24 April 2026
| Article | |
|---|---|
| Topic area | Deep Learning |
| Difficulty | Introductory |
Neural networks (also called artificial neural networks, or ANNs) are computational models inspired by the structure of biological nervous systems. They consist of interconnected layers of simple processing units called neurons (or nodes) and form the basis of modern deep learning.
Biological inspiration
The biological neuron receives electrical signals through its dendrites, integrates them in the cell body, and, if the combined signal exceeds a threshold, fires an output signal along its axon to downstream neurons. Artificial neural networks abstract this process: each artificial neuron computes a weighted sum of its inputs, adds a bias term, and passes the result through a nonlinear activation function.
While the analogy to biology motivated early research, modern neural networks are best understood as flexible parameterised function approximators rather than faithful brain simulations.
The perceptron
The perceptron, introduced by Frank Rosenblatt in 1958, is the simplest neural network. It computes:
- $ y = \sigma\!\left(\sum_{i=1}^{n} w_i x_i + b\right) = \sigma(\mathbf{w}^\top \mathbf{x} + b) $
where $ \mathbf{x} $ is the input vector, $ \mathbf{w} $ are learnable weights, $ b $ is a bias, and $ \sigma $ is a step function that outputs 1 if the argument is positive and 0 otherwise. The perceptron can learn any linearly separable function but famously cannot represent the XOR function — a limitation that stalled neural-network research for over a decade.
Feedforward networks
A feedforward neural network (also called a multilayer perceptron, or MLP) stacks multiple layers of neurons. Information flows in one direction — from the input layer through one or more hidden layers to the output layer.
For a network with one hidden layer, the computation is:
- $ \mathbf{h} = g(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1) $
- $ \mathbf{y} = f(\mathbf{W}_2 \mathbf{h} + \mathbf{b}_2) $
where $ g $ and $ f $ are activation functions, $ \mathbf{W}_1, \mathbf{W}_2 $ are weight matrices, and $ \mathbf{b}_1, \mathbf{b}_2 $ are bias vectors. The hidden layer enables the network to learn nonlinear relationships that a single perceptron cannot capture.
Networks with many hidden layers are called deep neural networks, and training them is the subject of deep learning.
Activation functions
The activation function introduces nonlinearity; without it, a multi-layer network would collapse to a single linear transformation. Common choices include:
| Function | Formula | Range | Notes |
|---|---|---|---|
| Sigmoid | $ \sigma(z) = \frac{1}{1+e^{-z}} $ | (0, 1) | Historically popular; suffers from vanishing gradients |
| Tanh | $ \tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}} $ | (−1, 1) | Zero-centred; still saturates for large inputs |
| ReLU | $ \max(0, z) $ | [0, ∞) | Default choice in modern networks; can cause "dead neurons" |
| Leaky ReLU | $ \max(\alpha z, z) $ for small $ \alpha > 0 $ | (−∞, ∞) | Addresses the dead-neuron problem |
| Softmax | $ \frac{e^{z_i}}{\sum_j e^{z_j}} $ | (0, 1) | Used in output layer for multi-class classification |
Universal approximation theorem
The universal approximation theorem (Cybenko 1989, Hornik 1991) states that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of $ \mathbb{R}^n $ to arbitrary accuracy, provided the activation function satisfies mild conditions (e.g. is non-constant, bounded, and continuous).
This theorem guarantees the existence of a good approximation but says nothing about how to find it — in practice, training deep networks with many layers is far more effective than using a single wide layer.
Training overview
Training a neural network involves:
- Defining a loss function — a measure of how far the network's predictions are from the true targets (see Loss Functions).
- Forward pass — computing the output of the network for a given input by propagating values layer by layer.
- Backward pass (backpropagation) — computing the gradient of the loss with respect to every weight by applying the chain rule in reverse through the network (see Backpropagation).
- Parameter update — adjusting the weights using an optimisation algorithm such as Gradient Descent or one of its variants.
- Iteration — repeating steps 2–4 over many passes (epochs) through the training data.
Successful training also requires attention to initialisation (e.g. Xavier or He schemes), regularisation (to prevent overfitting), and hyperparameter tuning (learning rate, batch size, network architecture).
Common architectures
Beyond the basic feedforward network, several specialised architectures have been developed:
- Convolutional Neural Networks (CNNs) — designed for grid-structured data such as images, using local connectivity and weight sharing.
- Recurrent Neural Networks (RNNs) — designed for sequential data, with connections that form cycles to maintain hidden state.
- Transformers — attention-based architectures that have become dominant in natural language processing and increasingly in vision.
- Autoencoders — networks trained to reconstruct their input, used for dimensionality reduction and generative modelling.
- Generative adversarial networks (GANs) — pairs of networks (generator and discriminator) trained in competition to generate realistic data.
Applications
Neural networks are applied across a vast range of domains:
- Computer vision (image classification, object detection, segmentation)
- Natural language processing (translation, summarisation, question answering)
- Speech recognition and synthesis
- Game playing (AlphaGo, Atari agents)
- Scientific discovery (protein folding, drug design, weather prediction)
- Autonomous vehicles and robotics
See also
- Gradient Descent
- Backpropagation
- Loss Functions
- Convolutional Neural Networks
- Recurrent Neural Networks
- Overfitting and Regularization
References
- Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain". Psychological Review.
- Cybenko, G. (1989). "Approximation by Superpositions of a Sigmoidal Function". Mathematics of Control, Signals, and Systems.
- Hornik, K. (1991). "Approximation Capabilities of Multilayer Feedforward Networks". Neural Networks.
- LeCun, Y., Bengio, Y. and Hinton, G. (2015). "Deep learning". Nature, 521, 436–444.
- Goodfellow, I., Bengio, Y. and Courville, A. (2016). Deep Learning. MIT Press.