Skip to main content
AI & Machine Learning

Reranking

A second-stage retrieval technique that reorders initial search results to improve relevance using more sophisticated models.

Also known as: Cross-encoder reranking, Result reordering, Two-stage retrieval

Definition

Reranking is a retrieval technique that applies a more powerful model to reorder an initial set of search results, improving the ranking of truly relevant documents. It typically follows a first-stage retrieval (like vector search) and uses cross-encoder models that consider query-document pairs together for more accurate relevance scoring.

Why it matters

Reranking bridges the gap between fast retrieval and accurate relevance:

  • Quality improvement — pushes the most relevant results to the top
  • Precision boost — cross-encoders understand context better than bi-encoders
  • RAG enhancement — ensures the best documents enter the LLM context
  • Cost-effective — applies expensive models only to top candidates, not entire corpus
  • Latency balance — adds ~50-100ms for significantly better results

Reranking can increase retrieval accuracy by 10-30% with minimal latency impact.

How it works

┌────────────────────────────────────────────────────────────┐
│                   TWO-STAGE RETRIEVAL                      │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  STAGE 1: FAST RETRIEVAL (Bi-Encoder)                      │
│  ┌────────────────────────────────────────────────────┐    │
│  │  Query ─────────────┐                              │    │
│  │                     ├───► Compare Embeddings       │    │
│  │  Doc Embeddings ────┘    (Approximate, Fast)       │    │
│  │                                                    │    │
│  │  Return: Top 100-500 candidates                    │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                 │
│                          ▼                                 │
│  STAGE 2: RERANKING (Cross-Encoder)                        │
│  ┌────────────────────────────────────────────────────┐    │
│  │                                                    │    │
│  │  For each candidate:                               │    │
│  │  ┌─────────────────────────────────────────────┐   │    │
│  │  │  [Query] [SEP] [Document] → Model → Score   │   │    │
│  │  └─────────────────────────────────────────────┘   │    │
│  │                                                    │    │
│  │  Considers full interaction (Accurate, Slower)    │    │
│  │                                                    │    │
│  │  Return: Reordered top 5-20                       │    │
│  └────────────────────────────────────────────────────┘    │
│                          │                                 │
│                          ▼                                 │
│                 FINAL RANKED RESULTS                       │
└────────────────────────────────────────────────────────────┘

Key differences:

AspectBi-Encoder (Stage 1)Cross-Encoder (Stage 2)
SpeedFast (~1ms per 1M docs)Slow (~10ms per doc)
AccuracyGoodExcellent
InteractionNone (separate encoding)Full (joint encoding)
ScaleEntire corpusTop candidates only

Common questions

Q: Why not just use cross-encoders for everything?

A: Cross-encoders are too slow for large-scale retrieval. They must process each query-document pair together, making them O(n) where n is corpus size. Two-stage retrieval provides the best of both worlds.

Q: What models are used for reranking?

A: Popular rerankers include Cohere Rerank, BGE Reranker, and cross-encoder models fine-tuned on MS MARCO. These are specifically trained to score query-document relevance.

Q: How many documents should be reranked?

A: Typically 50-200 candidates from the first stage are reranked. Too few and you might miss relevant documents; too many adds unnecessary latency.

Q: Does reranking replace vector search?

A: No, it complements it. Vector search provides fast candidate retrieval; reranking improves the ordering. Both stages are needed for optimal performance.


References

Nogueira & Cho (2019), “Passage Re-ranking with BERT”, arXiv. [1,500+ citations]

Karpukhin et al. (2020), “Dense Passage Retrieval for Open-Domain Question Answering”, EMNLP. [3,500+ citations]

Humeau et al. (2020), “Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring”, ICLR. [700+ citations]

Glass et al. (2022), “Re2G: Retrieve, Rerank, Generate”, NAACL. [100+ citations]