Definition
Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them. Values range from -1 (opposite) to 1 (identical), with 0 indicating orthogonal (no similarity). It’s the most common metric for comparing text embeddings because it focuses on direction (meaning) rather than magnitude (length).
Why it matters
Cosine similarity is foundational to modern AI search and retrieval:
- Direction over magnitude — captures semantic orientation, not vector length
- Normalized comparison — works with embeddings of different magnitudes
- Efficiency — fast to compute, especially with optimized libraries
- Interpretability — easy to understand: 1 = same, 0 = unrelated, -1 = opposite
- Standard metric — default in most vector databases and embedding APIs
It enables meaningful comparison of text, images, and other embedded representations.
How it works
┌────────────────────────────────────────────────────────────┐
│ COSINE SIMILARITY │
├────────────────────────────────────────────────────────────┤
│ │
│ FORMULA: cos(θ) = (A · B) / (||A|| × ||B||) │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ B │ │
│ │ / │ │
│ │ / θ = angle │ │
│ │ / │ │
│ │ / │ │
│ │ ──────────────► A │ │
│ │ │ │
│ │ cos(0°) = 1.0 → Identical direction │ │
│ │ cos(45°) ≈ 0.71 → Similar direction │ │
│ │ cos(90°) = 0.0 → Orthogonal (unrelated) │ │
│ │ cos(180°) = -1.0 → Opposite direction │ │
│ │ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ EXAMPLE CALCULATION: │
│ Vector A = [0.8, 0.6] │
│ Vector B = [0.6, 0.8] │
│ │
│ A · B = (0.8 × 0.6) + (0.6 × 0.8) = 0.96 │
│ ||A|| = √(0.8² + 0.6²) = 1.0 │
│ ||B|| = √(0.6² + 0.8²) = 1.0 │
│ │
│ cos(θ) = 0.96 / (1.0 × 1.0) = 0.96 → Very similar │
│ │
└────────────────────────────────────────────────────────────┘
Comparison with other metrics:
| Metric | Formula | When to use |
|---|---|---|
| Cosine | 1 - cos(θ) | Normalized embeddings, text similarity |
| Euclidean | √Σ(a-b)² | Absolute distances matter |
| Dot Product | Σ(a×b) | When vectors are already normalized |
Common questions
Q: Why use cosine similarity instead of Euclidean distance?
A: Cosine ignores vector magnitude, focusing only on direction. Two documents about “tax law” should be similar even if one is longer and has a larger embedding magnitude. Cosine captures this; Euclidean treats them as more different.
Q: What does a cosine similarity of 0.8 mean?
A: The vectors point in nearly the same direction—they’re semantically similar. For text embeddings, 0.8+ typically indicates strong relevance. However, thresholds vary by model; calibrate with your data.
Q: Can cosine similarity be negative?
A: Yes, when vectors point in opposite directions. With most text embeddings, negatives are rare since embedding models typically produce vectors in positive space. A value near 0 is more common for unrelated content.
Q: Is cosine similarity the same as cosine distance?
A: They’re inverses. Cosine distance = 1 - cosine similarity. Databases often use “distance” (lower = more similar) while APIs report “similarity” (higher = more similar). Check your tool’s convention.
Related terms
- Embeddings — vectors that cosine similarity compares
- Semantic Similarity — concept measured by cosine similarity
- Vector Database — uses cosine for nearest-neighbor search
- Semantic Search — retrieval powered by cosine comparisons
References
Singhal (2001), “Modern Information Retrieval: A Brief Overview”, IEEE Data Engineering Bulletin. [3,000+ citations]
Manning et al. (2008), “Introduction to Information Retrieval”, Cambridge University Press. [20,000+ citations]
Mikolov et al. (2013), “Efficient Estimation of Word Representations in Vector Space”, arXiv. [35,000+ citations]
Johnson et al. (2019), “Billion-scale similarity search with GPUs”, IEEE Transactions on Big Data. [1,500+ citations]