Skip to main content
AI & Machine Learning

Neural Network

A machine learning model composed of interconnected layers of artificial neurons that learn patterns from data.

Also known as: Artificial neural network, ANN, Neural net, Connectionist model

Definition

A neural network is a computational model loosely inspired by the human brain, consisting of layers of interconnected artificial neurons (nodes). Each neuron receives inputs, applies weights and a bias, passes the result through an activation function, and outputs to the next layer. Through training with backpropagation, neural networks learn to recognize patterns, make predictions, and generate outputs from data.

Why it matters

Neural networks are the foundation of modern AI:

  • Universal approximators — can learn any continuous function given enough neurons
  • Feature learning — automatically discover relevant patterns in data
  • Scalability — performance improves with more data and compute
  • Versatility — vision, language, speech, games, science, and more
  • State-of-the-art — power all leading AI systems including LLMs

From image recognition to language generation, neural networks dominate AI.

How it works

┌────────────────────────────────────────────────────────────┐
│                     NEURAL NETWORK                         │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  STRUCTURE OF A FEEDFORWARD NETWORK:                       │
│  ───────────────────────────────────                       │
│                                                            │
│  Input Layer    Hidden Layers    Output Layer              │
│      │                │               │                    │
│      ○ ─────┬────► ○ ────┬────► ○ ────┬────► ○             │
│      │      │      │     │      │     │      │             │
│      ○ ─────┼────► ○ ────┼────► ○ ────┼────► ○             │
│      │      │      │     │      │     │      │             │
│      ○ ─────┴────► ○ ────┴────► ○ ────┴────► (output)      │
│                                                            │
│     x₁,x₂,x₃      h₁,h₂,h₃      h₄,h₅,h₆       ŷ          │
│                                                            │
│  SINGLE NEURON:                                            │
│  ──────────────                                            │
│                                                            │
│  ┌────────────────────────────────────────────────┐        │
│  │  Inputs       Weights    Sum + Bias   Activation│       │
│  │                                                 │        │
│  │    x₁ ──────► w₁ ──┐                           │        │
│  │                    │                           │        │
│  │    x₂ ──────► w₂ ──┼──► Σ + b ──► f(·) ──► y  │        │
│  │                    │                           │        │
│  │    x₃ ──────► w₃ ──┘                           │        │
│  │                                                 │        │
│  │  y = f(w₁x₁ + w₂x₂ + w₃x₃ + b)                │        │
│  └────────────────────────────────────────────────┘        │
│                                                            │
│  COMMON ACTIVATION FUNCTIONS:                              │
│  ────────────────────────────                              │
│                                                            │
│  ReLU:    f(x) = max(0, x)         ___/                   │
│  Sigmoid: f(x) = 1/(1+e⁻ˣ)        _/⁻⁻                    │
│  Tanh:    f(x) = (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)  _/‾                  │
│  Softmax: Probability distribution  (for classification)  │
│                                                            │
│  NETWORK TYPES:                                            │
│  ──────────────                                            │
│  Feedforward (MLP):  Data flows one direction             │
│  Convolutional (CNN): Spatial patterns (images)           │
│  Recurrent (RNN):     Sequential data (text, time)        │
│  Transformer:         Attention-based (LLMs)              │
│                                                            │
└────────────────────────────────────────────────────────────┘

Network architecture comparison:

TypeStrengthUse Cases
MLPSimple tabular dataClassification, regression
CNNSpatial hierarchiesImages, video, audio
RNN/LSTMSequential patternsTime series, early NLP
TransformerLong-range dependenciesLLMs, modern NLP, vision

Common questions

Q: How deep should a neural network be?

A: It depends on task complexity. Simple tasks need few layers; complex patterns (like language) need many. Modern LLMs have 32-100+ layers. Start simple and add depth if underfitting.

Q: What’s the difference between neurons and parameters?

A: Neurons are the computational units; parameters are the weights and biases connecting them. A network with 1000 neurons might have millions of parameters (each neuron connects to many others).

Q: Why do neural networks need activation functions?

A: Without nonlinear activations, multiple layers would collapse to a single linear transformation (no matter how many layers). Activation functions enable networks to learn complex, nonlinear patterns.

Q: How do neural networks relate to “deep learning”?

A: Deep learning specifically refers to neural networks with many layers (deep architectures). A 2-layer network is a neural network but not “deep.” Modern transformer LLMs are very deep neural networks.


References

LeCun et al. (2015), “Deep Learning”, Nature. [40,000+ citations]

Goodfellow et al. (2016), “Deep Learning”, MIT Press. [20,000+ citations]

Hornik et al. (1989), “Multilayer feedforward networks are universal approximators”, Neural Networks. [25,000+ citations]

Rosenblatt (1958), “The Perceptron: A Probabilistic Model for Information Storage”, Psychological Review. [Foundational paper]