Definition
A neural network is a computational model loosely inspired by the human brain, consisting of layers of interconnected artificial neurons (nodes). Each neuron receives inputs, applies weights and a bias, passes the result through an activation function, and outputs to the next layer. Through training with backpropagation, neural networks learn to recognize patterns, make predictions, and generate outputs from data.
Why it matters
Neural networks are the foundation of modern AI:
- Universal approximators — can learn any continuous function given enough neurons
- Feature learning — automatically discover relevant patterns in data
- Scalability — performance improves with more data and compute
- Versatility — vision, language, speech, games, science, and more
- State-of-the-art — power all leading AI systems including LLMs
From image recognition to language generation, neural networks dominate AI.
How it works
┌────────────────────────────────────────────────────────────┐
│ NEURAL NETWORK │
├────────────────────────────────────────────────────────────┤
│ │
│ STRUCTURE OF A FEEDFORWARD NETWORK: │
│ ─────────────────────────────────── │
│ │
│ Input Layer Hidden Layers Output Layer │
│ │ │ │ │
│ ○ ─────┬────► ○ ────┬────► ○ ────┬────► ○ │
│ │ │ │ │ │ │ │ │
│ ○ ─────┼────► ○ ────┼────► ○ ────┼────► ○ │
│ │ │ │ │ │ │ │ │
│ ○ ─────┴────► ○ ────┴────► ○ ────┴────► (output) │
│ │
│ x₁,x₂,x₃ h₁,h₂,h₃ h₄,h₅,h₆ ŷ │
│ │
│ SINGLE NEURON: │
│ ────────────── │
│ │
│ ┌────────────────────────────────────────────────┐ │
│ │ Inputs Weights Sum + Bias Activation│ │
│ │ │ │
│ │ x₁ ──────► w₁ ──┐ │ │
│ │ │ │ │
│ │ x₂ ──────► w₂ ──┼──► Σ + b ──► f(·) ──► y │ │
│ │ │ │ │
│ │ x₃ ──────► w₃ ──┘ │ │
│ │ │ │
│ │ y = f(w₁x₁ + w₂x₂ + w₃x₃ + b) │ │
│ └────────────────────────────────────────────────┘ │
│ │
│ COMMON ACTIVATION FUNCTIONS: │
│ ──────────────────────────── │
│ │
│ ReLU: f(x) = max(0, x) ___/ │
│ Sigmoid: f(x) = 1/(1+e⁻ˣ) _/⁻⁻ │
│ Tanh: f(x) = (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) _/‾ │
│ Softmax: Probability distribution (for classification) │
│ │
│ NETWORK TYPES: │
│ ────────────── │
│ Feedforward (MLP): Data flows one direction │
│ Convolutional (CNN): Spatial patterns (images) │
│ Recurrent (RNN): Sequential data (text, time) │
│ Transformer: Attention-based (LLMs) │
│ │
└────────────────────────────────────────────────────────────┘
Network architecture comparison:
| Type | Strength | Use Cases |
|---|---|---|
| MLP | Simple tabular data | Classification, regression |
| CNN | Spatial hierarchies | Images, video, audio |
| RNN/LSTM | Sequential patterns | Time series, early NLP |
| Transformer | Long-range dependencies | LLMs, modern NLP, vision |
Common questions
Q: How deep should a neural network be?
A: It depends on task complexity. Simple tasks need few layers; complex patterns (like language) need many. Modern LLMs have 32-100+ layers. Start simple and add depth if underfitting.
Q: What’s the difference between neurons and parameters?
A: Neurons are the computational units; parameters are the weights and biases connecting them. A network with 1000 neurons might have millions of parameters (each neuron connects to many others).
Q: Why do neural networks need activation functions?
A: Without nonlinear activations, multiple layers would collapse to a single linear transformation (no matter how many layers). Activation functions enable networks to learn complex, nonlinear patterns.
Q: How do neural networks relate to “deep learning”?
A: Deep learning specifically refers to neural networks with many layers (deep architectures). A 2-layer network is a neural network but not “deep.” Modern transformer LLMs are very deep neural networks.
Related terms
- Deep Learning — neural networks with many layers
- Transformer Architecture — modern neural architecture
- Backpropagation — training algorithm
- LLM — language-focused deep neural networks
References
LeCun et al. (2015), “Deep Learning”, Nature. [40,000+ citations]
Goodfellow et al. (2016), “Deep Learning”, MIT Press. [20,000+ citations]
Hornik et al. (1989), “Multilayer feedforward networks are universal approximators”, Neural Networks. [25,000+ citations]
Rosenblatt (1958), “The Perceptron: A Probabilistic Model for Information Storage”, Psychological Review. [Foundational paper]