Conversational RAGΒΆ
RAG with MemoryΒΆ
Maintaining Context Across TurnsΒΆ
Standard RAG treats every query independently, but real conversations are full of pronouns (βitβ), ellipsis (βwhat about the cost?β), and follow-up references that only make sense in the context of prior turns. Conversational RAG solves this by maintaining a chat history and reformulating each new query to be self-contained before sending it to the retriever. For example, if the user first asks βWhat is RAG?β and then asks βHow does it handle hallucinations?β, the system rewrites the second query to βHow does RAG handle hallucinations?β so the retriever can find the right documents.
ComponentsΒΆ
Conversation history β a buffer or summary of previous turns
Context compression β condensing long histories to fit token limits
Query reformulation β rewriting ambiguous queries using conversation context
import numpy as np
import pandas as pd
from typing import List, Dict, Tuple
import json
import os
from pathlib import Path
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
ImplementationΒΆ
Wiring Up Memory with LangChainΒΆ
LangChainβs ConversationalRetrievalChain integrates a ConversationBufferMemory that stores the full chat history and automatically reformulates each new query by prepending the conversation context. The chain first uses the LLM to produce a standalone version of the latest question (resolving pronouns and references), then retrieves relevant documents, and finally generates a response that is aware of the entire conversation. The memory_key="chat_history" parameter tells the chain where to look for prior turns in the prompt template.
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=vectorstore.as_retriever(),
memory=memory
)
# Multi-turn conversation
conversation_chain("What is RAG?")
conversation_chain("How does it work?") # "it" refers to RAG
conversation_chain("What are the benefits?")