Definition
Unsupervised learning is a machine learning paradigm where algorithms discover hidden patterns, structures, and relationships in data without the guidance of labeled examples. Unlike supervised learning where correct answers are provided during training, unsupervised methods must find meaningful organization in the data on their own—identifying natural clusters, reducing dimensionality, detecting anomalies, or learning useful representations.
Why it matters
Unsupervised learning unlocks value in unlabeled data:
- No labeling required — works with raw, unlabeled data (cheaper, abundant)
- Pattern discovery — finds structure humans might miss
- Data preprocessing — dimensionality reduction, feature learning
- Anomaly detection — identifies outliers without examples
- Foundation for embeddings — learns representations that power semantic search
Many modern AI breakthroughs, including text embeddings, rely on unsupervised or self-supervised learning.
How it works
┌────────────────────────────────────────────────────────────┐
│ UNSUPERVISED LEARNING │
├────────────────────────────────────────────────────────────┤
│ │
│ SUPERVISED VS UNSUPERVISED: │
│ ─────────────────────────── │
│ │
│ SUPERVISED: UNSUPERVISED: │
│ "Here's the data AND "Here's the data. │
│ the right answers" Find patterns yourself" │
│ │
│ Input → LABEL Input → ??? │
│ [Image] → "Cat" [Data points] → Groups? │
│ │
│ MAIN UNSUPERVISED TASKS: │
│ ──────────────────────── │
│ │
│ 1. CLUSTERING │
│ Group similar items together │
│ │
│ Before: After: │
│ ● ○ ● ┌───────┐ ┌───────┐ │
│ ○ ● ○ │ ● ● ● │ │ ○ ○ ○ │ │
│ ● ○ ● │ ● ● ● │ │ ○ ○ ○ │ │
│ └───────┘ └───────┘ │
│ Cluster A Cluster B │
│ │
│ 2. DIMENSIONALITY REDUCTION │
│ Compress data while preserving structure │
│ │
│ High-D space Low-D space │
│ (100 features) → (2-3 features) │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ x₁,x₂,...x₁₀₀│ → │ x'₁, x'₂ │ │
│ └─────────────┘ └─────────────┘ │
│ PCA, t-SNE, UMAP, Autoencoders │
│ │
│ 3. ANOMALY DETECTION │
│ Find unusual patterns │
│ │
│ ●●●●●● │
│ ●●●●●●●●● │
│ ●●●●●● ○ ← Anomaly! │
│ │
│ 4. REPRESENTATION LEARNING │
│ Learn useful features automatically │
│ │
│ Raw Data → Encoder → Embedding → Useful representation│
│ │
│ COMMON ALGORITHMS: │
│ ────────────────── │
│ Clustering: K-Means, DBSCAN, Hierarchical │
│ Dim. Reduction: PCA, t-SNE, UMAP │
│ Density-based: Gaussian Mixture Models │
│ Neural: Autoencoders, VAEs │
│ │
└────────────────────────────────────────────────────────────┘
Unsupervised methods comparison:
| Method | Purpose | Output | Example Use |
|---|---|---|---|
| K-Means | Clustering | K groups | Customer segmentation |
| PCA | Dimensionality reduction | Lower-D data | Feature compression |
| Autoencoders | Representation learning | Embeddings | Image compression |
| DBSCAN | Density clustering | Variable groups | Anomaly detection |
Common questions
Q: How do you evaluate unsupervised learning if there are no labels?
A: Several approaches: (1) Intrinsic metrics like silhouette score for clustering, (2) Reconstruction error for autoencoders, (3) Downstream task performance (use learned representations for a supervised task), (4) Human evaluation of discovered patterns, (5) Comparison with known ground truth if available.
Q: What’s self-supervised learning?
A: Self-supervised learning is a form of unsupervised learning where the algorithm creates its own labels from the data. LLM pretraining is self-supervised: predicting the next token uses the text itself as labels. It’s technically unsupervised (no human labels) but the training process resembles supervised learning.
Q: When should I use unsupervised vs supervised learning?
A: Use unsupervised when: (1) You have no labels, (2) You want to explore/understand data structure, (3) You need preprocessing (dimensionality reduction), (4) You want to find anomalies. Use supervised when you have labels and a specific prediction task.
Q: How does unsupervised learning relate to embeddings?
A: Many embedding methods use unsupervised or self-supervised learning. Word2Vec learns word embeddings without labels by predicting context words. Autoencoders learn compressed representations. These unsupervised embeddings then enable semantic search, clustering, and more.
Related terms
- Machine Learning — the broader field
- Supervised Learning — learning with labels
- Embeddings — often learned unsupervised
- Clustering — grouping similar items
References
Hastie et al. (2009), “The Elements of Statistical Learning”, Springer, Chapters 13-14. [Foundational text]
Goodfellow et al. (2016), “Deep Learning”, MIT Press, Chapter 15. [Unsupervised representation learning]
van der Maaten & Hinton (2008), “Visualizing Data using t-SNE”, JMLR. [20,000+ citations]
Kingma & Welling (2014), “Auto-Encoding Variational Bayes”, ICLR. [Foundational VAE paper, 15,000+ citations]