Definition

Embeddings are dense, continuous vector representations of discrete data (words, sentences, images, etc.) in a high-dimensional space. Unlike sparse representations like one-hot encoding, embeddings compress information into fixed-size vectors where similar items are located close together in the embedding space. This enables mathematical operations on semantic concepts.

Why it matters

Embeddings are foundational to modern AI systems:

Semantic similarity — similar meanings map to nearby vectors, enabling similarity search
Transfer learning — pre-trained embeddings capture general knowledge usable across tasks
Dimensionality reduction — millions of possible words compress into hundreds of dimensions
Mathematical operations — vector arithmetic reveals semantic relationships (king - man + woman ≈ queen)

Every RAG system, search engine, and recommendation system relies on embeddings to understand content.

How it works

┌─────────────────────────────────────────────────────────┐
│                   EMBEDDING PROCESS                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Input Text ─────→ Tokenize ─────→ Model ─────→ Vector  │
│                                                         │
│  "tax law"   →   [123, 456]   →   Neural   →  [0.12,   │
│                                   Network      0.45,   │
│                                               -0.23,   │
│                                                ...]    │
│                                               (768-D)  │
│                                                         │
│  Semantic space:                                        │
│     "tax law" ●────────● "fiscal regulation"           │
│                    close                                │
│     "weather" ●                                        │
│               far                                       │
└─────────────────────────────────────────────────────────┘

Tokenization — input text is split into tokens
Model encoding — neural network processes tokens
Pooling — token representations are combined (mean, CLS token, etc.)
Output vector — fixed-size dense vector (e.g., 384, 768, or 1536 dimensions)

The embedding model is trained so that semantically similar inputs produce vectors with high cosine similarity.

Common questions

Q: What embedding dimensions are common?

A: Typical sizes range from 384 (lightweight) to 1536 (OpenAI) to 4096 (large models). Higher dimensions can capture more nuance but require more storage and computation.

Q: How do sentence embeddings differ from word embeddings?

A: Word embeddings (Word2Vec, GloVe) represent individual words. Sentence embeddings (from models like sentence-transformers) capture entire sentence meaning, handling context and word order.

Q: What are bilingual/multilingual embeddings?

A: These models map multiple languages into a shared embedding space, so “legal advice” and “juridisch advies” produce similar vectors, enabling cross-lingual search.

Q: Do embeddings drift over time?

A: Embedding models are static once trained, but if you update your embedding model, all vectors must be regenerated since different models produce incompatible spaces.

RAG — uses embeddings for retrieval
Vector Database — stores and searches embeddings
Semantic Similarity — measured via embedding distance
LLM — uses embeddings internally

References

Mikolov et al. (2013), “Efficient Estimation of Word Representations in Vector Space”, arXiv. [40,000+ citations]

Reimers & Gurevych (2019), “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks”, EMNLP. [8,000+ citations]

Pennington et al. (2014), “GloVe: Global Vectors for Word Representation”, EMNLP. [35,000+ citations]

Muennighoff et al. (2022), “MTEB: Massive Text Embedding Benchmark”, arXiv. [700+ citations]

Definition

Why it matters

How it works

Common questions

Related terms

References