Definition

Reranking is a retrieval technique that applies a more powerful model to reorder an initial set of search results, improving the ranking of truly relevant documents. It typically follows a first-stage retrieval (like vector search) and uses cross-encoder models that consider query-document pairs together for more accurate relevance scoring.

Why it matters

Reranking bridges the gap between fast retrieval and accurate relevance:

Quality improvement — pushes the most relevant results to the top
Precision boost — cross-encoders understand context better than bi-encoders
RAG enhancement — ensures the best documents enter the LLM context
Cost-effective — applies expensive models only to top candidates, not entire corpus
Latency balance — adds ~50-100ms for significantly better results

Reranking can increase retrieval accuracy by 10-30% with minimal latency impact.

How it works

┌────────────────────────────────────────────────────────────┐
│                   TWO-STAGE RETRIEVAL                      │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  STAGE 1: FAST RETRIEVAL (Bi-Encoder)                      │
│  ┌────────────────────────────────────────────────────┐    │
│  │  Query ─────────────┐                              │    │
│  │                     ├───► Compare Embeddings       │    │
│  │  Doc Embeddings ────┘    (Approximate, Fast)       │    │
│  │                                                    │    │
│  │  Return: Top 100-500 candidates                    │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                 │
│                          ▼                                 │
│  STAGE 2: RERANKING (Cross-Encoder)                        │
│  ┌────────────────────────────────────────────────────┐    │
│  │                                                    │    │
│  │  For each candidate:                               │    │
│  │  ┌─────────────────────────────────────────────┐   │    │
│  │  │  [Query] [SEP] [Document] → Model → Score   │   │    │
│  │  └─────────────────────────────────────────────┘   │    │
│  │                                                    │    │
│  │  Considers full interaction (Accurate, Slower)    │    │
│  │                                                    │    │
│  │  Return: Reordered top 5-20                       │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                 │
│                          ▼                                 │
│                 FINAL RANKED RESULTS                       │
└────────────────────────────────────────────────────────────┘

Key differences:

Aspect	Bi-Encoder (Stage 1)	Cross-Encoder (Stage 2)
Speed	Fast (~1ms per 1M docs)	Slow (~10ms per doc)
Accuracy	Good	Excellent
Interaction	None (separate encoding)	Full (joint encoding)
Scale	Entire corpus	Top candidates only

Common questions

Q: Why not just use cross-encoders for everything?

A: Cross-encoders are too slow for large-scale retrieval. They must process each query-document pair together, making them O(n) where n is corpus size. Two-stage retrieval provides the best of both worlds.

Q: What models are used for reranking?

A: Popular rerankers include Cohere Rerank, BGE Reranker, and cross-encoder models fine-tuned on MS MARCO. These are specifically trained to score query-document relevance.

Q: How many documents should be reranked?

A: Typically 50-200 candidates from the first stage are reranked. Too few and you might miss relevant documents; too many adds unnecessary latency.

Q: Does reranking replace vector search?

A: No, it complements it. Vector search provides fast candidate retrieval; reranking improves the ordering. Both stages are needed for optimal performance.

RAG — pipeline that benefits from reranking
Hybrid Search — first-stage approach that combines methods
Semantic Search — embedding-based retrieval
Cross-Encoder — model type used for reranking

References

Nogueira & Cho (2019), “Passage Re-ranking with BERT”, arXiv. [1,500+ citations]

Karpukhin et al. (2020), “Dense Passage Retrieval for Open-Domain Question Answering”, EMNLP. [3,500+ citations]

Humeau et al. (2020), “Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring”, ICLR. [700+ citations]

Glass et al. (2022), “Re2G: Retrieve, Rerank, Generate”, NAACL. [100+ citations]

Definition

Why it matters

How it works

Common questions

Related terms

References