Advanced RetrievalΒΆ

Re-rankingΒΆ

Improving Retrieval Quality with a Second PassΒΆ

Initial vector search is fast but can be imprecise: it relies on a single dot product between compressed embeddings. Re-ranking adds a second stage where a cross-encoder model scores each (query, document) pair independently, considering the full interaction between query and document tokens. Cross-encoders are far more accurate than bi-encoders but too slow to run over the entire corpus, so the standard pattern is: first retrieve a broad candidate set (e.g., top 20) with fast vector search, then re-rank that set with the cross-encoder and return the refined top-\(k\). This two-stage approach gives you the speed of vector search with the precision of cross-attention.

# Pseudo-code for re-ranking
def rerank(query, initial_results):
    # Use cross-encoder for precise scoring
    scores = cross_encoder.predict([(query, doc) for doc in initial_results])
    # Sort by new scores
    reranked = sorted(zip(initial_results, scores), key=lambda x: x[1], reverse=True)
    return reranked