DeployBot: [deploy-bot] Deploy from CI (8c92aeb)

2026-04-24T07:08:59Z

[deploy-bot] Deploy from CI (8c92aeb)

← Older revision		Revision as of 07:08, 24 April 2026
Line 111:		Line 111:
	[[Category:Introductory]]		[[Category:Introductory]]
	[[Category:Neural Networks]]		[[Category:Neural Networks]]
	~~<!--v1.2.0 cache-bust-->~~
	~~<!-- pass 2 -->~~

DeployBot: Pass 2 force re-parse

2026-04-24T07:01:02Z

Pass 2 force re-parse

← Older revision		Revision as of 07:01, 24 April 2026
Line 112:		Line 112:
	[[Category:Neural Networks]]		[[Category:Neural Networks]]
	<!--v1.2.0 cache-bust-->		<!--v1.2.0 cache-bust-->
			<!-- pass 2 -->

DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)

2026-04-24T06:58:25Z

Force re-parse after Math source-mode rollout (v1.2.0)

← Older revision		Revision as of 06:58, 24 April 2026
Line 111:		Line 111:
	[[Category:Introductory]]		[[Category:Introductory]]
	[[Category:Neural Networks]]		[[Category:Neural Networks]]
			<!--v1.2.0 cache-bust-->

DeployBot: [deploy-bot] Deploy from CI (775ba6e)

2026-04-24T04:01:43Z

[deploy-bot] Deploy from CI (775ba6e)

New page

{{LanguageBar | page = Neural Networks}}
{{ArticleInfobox | topic_area = Deep Learning | difficulty = Introductory | prerequisites = }}
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}

'''Neural networks''' (also called '''artificial neural networks''', or ANNs) are computational models inspired by the structure of biological nervous systems. They consist of interconnected layers of simple processing units called '''neurons''' (or nodes) and form the basis of modern deep learning.

== Biological inspiration ==

The biological neuron receives electrical signals through its '''dendrites''', integrates them in the '''cell body''', and, if the combined signal exceeds a threshold, fires an output signal along its '''axon''' to downstream neurons. Artificial neural networks abstract this process: each artificial neuron computes a weighted sum of its inputs, adds a bias term, and passes the result through a nonlinear '''activation function'''.

While the analogy to biology motivated early research, modern neural networks are best understood as flexible parameterised function approximators rather than faithful brain simulations.

== The perceptron ==

The '''perceptron''', introduced by Frank Rosenblatt in 1958, is the simplest neural network. It computes:

:<math>y = \sigma\!\left(\sum_{i=1}^{n} w_i x_i + b\right) = \sigma(\mathbf{w}^\top \mathbf{x} + b)</math>

where <math>\mathbf{x}</math> is the input vector, <math>\mathbf{w}</math> are learnable weights, <math>b</math> is a bias, and <math>\sigma</math> is a step function that outputs 1 if the argument is positive and 0 otherwise. The perceptron can learn any linearly separable function but famously cannot represent the XOR function — a limitation that stalled neural-network research for over a decade.

== Feedforward networks ==

A '''feedforward neural network''' (also called a '''multilayer perceptron''', or MLP) stacks multiple layers of neurons. Information flows in one direction — from the '''input layer''' through one or more '''hidden layers''' to the '''output layer'''.

For a network with one hidden layer, the computation is:

:<math>\mathbf{h} = g(\mathbf{W}_1 \mathbf{x} + \mathbf{b}_1)</math>

:<math>\mathbf{y} = f(\mathbf{W}_2 \mathbf{h} + \mathbf{b}_2)</math>

where <math>g</math> and <math>f</math> are activation functions, <math>\mathbf{W}_1, \mathbf{W}_2</math> are weight matrices, and <math>\mathbf{b}_1, \mathbf{b}_2</math> are bias vectors. The hidden layer enables the network to learn nonlinear relationships that a single perceptron cannot capture.

Networks with many hidden layers are called '''deep''' neural networks, and training them is the subject of '''deep learning'''.

== Activation functions ==

The activation function introduces nonlinearity; without it, a multi-layer network would collapse to a single linear transformation. Common choices include:

{| class="wikitable"
|-
! Function !! Formula !! Range !! Notes
|-
| '''Sigmoid''' || <math>\sigma(z) = \frac{1}{1+e^{-z}}</math> || (0, 1) || Historically popular; suffers from vanishing gradients
|-
| '''Tanh''' || <math>\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}</math> || (−1, 1) || Zero-centred; still saturates for large inputs
|-
| '''ReLU''' || <math>\max(0, z)</math> || [0, ∞) || Default choice in modern networks; can cause "dead neurons"
|-
| '''Leaky ReLU''' || <math>\max(\alpha z, z)</math> for small <math>\alpha > 0</math> || (−∞, ∞) || Addresses the dead-neuron problem
|-
| '''Softmax''' || <math>\frac{e^{z_i}}{\sum_j e^{z_j}}</math> || (0, 1) || Used in output layer for multi-class classification
|}

== Universal approximation theorem ==

The '''universal approximation theorem''' (Cybenko 1989, Hornik 1991) states that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of <math>\mathbb{R}^n</math> to arbitrary accuracy, provided the activation function satisfies mild conditions (e.g. is non-constant, bounded, and continuous).

This theorem guarantees the ''existence'' of a good approximation but says nothing about how to ''find'' it — in practice, training deep networks with many layers is far more effective than using a single wide layer.

== Training overview ==

Training a neural network involves:

# '''Defining a loss function''' — a measure of how far the network's predictions are from the true targets (see [[Loss Functions]]).
# '''Forward pass''' — computing the output of the network for a given input by propagating values layer by layer.
# '''Backward pass (backpropagation)''' — computing the gradient of the loss with respect to every weight by applying the chain rule in reverse through the network (see [[Backpropagation]]).
# '''Parameter update''' — adjusting the weights using an optimisation algorithm such as [[Gradient Descent]] or one of its variants.
# '''Iteration''' — repeating steps 2–4 over many passes (epochs) through the training data.

Successful training also requires attention to '''initialisation''' (e.g. Xavier or He schemes), '''regularisation''' (to prevent [[Overfitting and Regularization|overfitting]]), and '''hyperparameter tuning''' (learning rate, batch size, network architecture).

== Common architectures ==

Beyond the basic feedforward network, several specialised architectures have been developed:

* '''[[Convolutional Neural Networks]]''' (CNNs) — designed for grid-structured data such as images, using local connectivity and weight sharing.
* '''[[Recurrent Neural Networks]]''' (RNNs) — designed for sequential data, with connections that form cycles to maintain hidden state.
* '''Transformers''' — attention-based architectures that have become dominant in natural language processing and increasingly in vision.
* '''Autoencoders''' — networks trained to reconstruct their input, used for dimensionality reduction and generative modelling.
* '''Generative adversarial networks''' (GANs) — pairs of networks (generator and discriminator) trained in competition to generate realistic data.

== Applications ==

Neural networks are applied across a vast range of domains:

* Computer vision (image classification, object detection, segmentation)
* Natural language processing (translation, summarisation, question answering)
* Speech recognition and synthesis
* Game playing (AlphaGo, Atari agents)
* Scientific discovery (protein folding, drug design, weather prediction)
* Autonomous vehicles and robotics

== See also ==

* [[Gradient Descent]]
* [[Backpropagation]]
* [[Loss Functions]]
* [[Convolutional Neural Networks]]
* [[Recurrent Neural Networks]]
* [[Overfitting and Regularization]]

== References ==

* Rosenblatt, F. (1958). "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain". ''Psychological Review''.
* Cybenko, G. (1989). "Approximation by Superpositions of a Sigmoidal Function". ''Mathematics of Control, Signals, and Systems''.
* Hornik, K. (1991). "Approximation Capabilities of Multilayer Feedforward Networks". ''Neural Networks''.
* LeCun, Y., Bengio, Y. and Hinton, G. (2015). "Deep learning". ''Nature'', 521, 436–444.
* Goodfellow, I., Bengio, Y. and Courville, A. (2016). ''Deep Learning''. MIT Press.

[[Category:Deep Learning]]
[[Category:Introductory]]
[[Category:Neural Networks]]

Neural Networks - Revision history

DeployBot: [deploy-bot] Deploy from CI (8c92aeb)

DeployBot: Pass 2 force re-parse

DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)

DeployBot: [deploy-bot] Deploy from CI (775ba6e)