Skip to main content
AI & Machine Learning

Supervised Learning

A machine learning approach where models learn from labeled training data to predict outputs for new inputs.

Also known as: Supervised ML, Labeled learning, Predictive modeling, Inductive learning

Definition

Supervised learning is a machine learning paradigm where algorithms learn from labeled training data—examples that include both input features and the correct output (label). The model learns the mapping between inputs and outputs, then applies this learned relationship to predict labels for new, unseen data. It’s called “supervised” because the training process is guided by known correct answers, like a teacher supervising a student.

Why it matters

Supervised learning is the most common ML approach:

  • Clear training signal — known answers guide learning
  • Measurable accuracypredictions vs. labels enables validation
  • Practical applications — spam detection, medical diagnosis, credit scoring
  • Foundation for LLMs — next-token prediction is supervised learning
  • Interpretable results — predictions map to defined classes or values

Most production ML systems use supervised learning.

How it works

┌────────────────────────────────────────────────────────────┐
│                   SUPERVISED LEARNING                      │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  TRAINING PHASE:                                           │
│  ───────────────                                           │
│                                                            │
│  Labeled Training Data:                                    │
│  ┌─────────────────────────────────────────────────┐      │
│  │ Input (Features)              │ Label (Target)  │      │
│  ├─────────────────────────────────────────────────┤      │
│  │ [Email text: "Win $1000..."]  │    SPAM        │      │
│  │ [Email text: "Meeting at 3"]  │    NOT SPAM    │      │
│  │ [Email text: "Click here!"]   │    SPAM        │      │
│  │ [Email text: "Project update"]│    NOT SPAM    │      │
│  └─────────────────────────────────────────────────┘      │
│                        │                                   │
│                        ▼                                   │
│  ┌─────────────────────────────────────────────────┐      │
│  │              LEARNING ALGORITHM                  │      │
│  │                                                  │      │
│  │  1. Make prediction: ŷ = f(x)                   │      │
│  │  2. Compare to label: Error = ŷ - y            │      │
│  │  3. Update model to reduce error                │      │
│  │  4. Repeat until error is minimized             │      │
│  └─────────────────────────────────────────────────┘      │
│                        │                                   │
│                        ▼                                   │
│                   TRAINED MODEL                            │
│                                                            │
│  PREDICTION PHASE:                                         │
│  ─────────────────                                         │
│                                                            │
│  New Email ──► Trained Model ──► Prediction: SPAM/NOT SPAM│
│                                                            │
│  TWO MAIN TYPES:                                           │
│  ───────────────                                           │
│                                                            │
│  CLASSIFICATION:              REGRESSION:                  │
│  Predict categories           Predict continuous values   │
│                                                            │
│  "Cat" or "Dog"?             "House price = $450,000"     │
│  "Spam" or "Not Spam"?       "Temperature = 23.5°C"       │
│  "Positive" or "Negative"?   "Sales = $12,500"            │
│                                                            │
│       ┌───┐ ┌───┐                    ↗                    │
│       │ A │ │ B │             ──────●────                 │
│       └───┘ └───┘                ↗                        │
│    Discrete classes        Continuous line                 │
│                                                            │
│  COMMON ALGORITHMS:                                        │
│  ──────────────────                                        │
│  • Logistic Regression    (classification)                │
│  • Decision Trees         (both)                          │
│  • Random Forests         (both)                          │
│  • Neural Networks        (both)                          │
│  • Support Vector Machines(classification)                │
│  • Linear Regression      (regression)                    │
│                                                            │
└────────────────────────────────────────────────────────────┘

Classification vs Regression:

AspectClassificationRegression
OutputDiscrete categoriesContinuous values
ExampleSpam detectionPrice prediction
MetricsAccuracy, F1-scoreMSE, R-squared
Loss functionCross-entropyMean squared error

Common questions

Q: What makes data “labeled”?

A: Labeled data has both inputs and known correct outputs. For image classification: images (input) + what’s in them (label). For spam detection: emails (input) + spam/not-spam tags (label). Humans typically create labels, which is expensive and time-consuming.

Q: How is LLM training supervised learning?

A: LLM pretraining is self-supervised: the “label” for each token is simply the next token in the text. Given “The cat sat on the”, the model learns to predict “mat.” No human labeling needed—the text itself provides supervision.

Q: What if I don’t have labeled data?

A: You have several options: (1) Use unsupervised learning to find patterns, (2) Use semi-supervised learning with some labels, (3) Generate labels yourself or with crowdsourcing, (4) Use transfer learning from pretrained models, (5) Apply active learning to label most informative examples first.

Q: How much labeled data is enough?

A: It varies widely. Simple problems: hundreds of examples. Complex deep learning: thousands to millions. Rule of thumb: 10× more samples than features. With transfer learning/fine-tuning, much less may suffice.


References

Bishop (2006), “Pattern Recognition and Machine Learning”, Springer. [Foundational text]

Hastie et al. (2009), “The Elements of Statistical Learning”, Springer. [70,000+ citations]

Goodfellow et al. (2016), “Deep Learning”, MIT Press, Chapter 5. [Supervised learning fundamentals]

Vapnik (1998), “Statistical Learning Theory”, Wiley. [Foundational theory]