Query Rewriting for RAG
Bad retrieval often starts with a weak query. Here's how query rewriting improves RAG systems, which strategies work, and how to avoid turning a simple question into a worse one.
View all rag depths βDepth ladder for this topic:
A lot of RAG debugging starts in the wrong place.
Teams look at chunk size, embedding choice, vector databases, and rerankers while ignoring a simpler issue: the retrieval query itself may be poor. If the userβs phrasing is vague, incomplete, or conversational, the retriever may never see the clearest version of the information need.
That is where query rewriting helps.
What query rewriting is
Query rewriting means transforming the userβs original request into a better retrieval query before search runs.
Example:
User asks:
What changed with our remote policy after the office expansion?
A rewritten retrieval query might become:
remote work policy office expansion policy update relocation hybrid attendance requirements
The rewrite is not for the user. It is for the retriever.
Why it works
Users speak naturally. Retrieval systems perform better on clearer lexical or semantic signals.
Query rewriting can help when the original question contains:
- pronouns with missing context
- vague references to prior conversation
- overly broad wording
- conversational filler
- missing synonyms or domain terms
Common rewriting strategies
Standalone question rewriting
Turn a context-dependent turn into a self-contained query. This is especially useful in multi-turn chat.
Expansion
Add related terminology, synonyms, product names, or domain vocabulary likely to appear in the documents.
Decomposition
Split one complex question into several smaller retrieval queries. This is helpful when one question actually contains multiple intents.
Constraint extraction
Pull out the exact filters hidden in the userβs language, such as timeframe, geography, customer tier, or document type.
Where teams overdo it
Query rewriting is helpful, but it is not free.
Poor rewrites can:
- add noise
- dilute the original intent
- over-assume domain knowledge
- cause the retriever to miss rare exact phrases
This is why many strong systems use both the original query and the rewritten query, then combine results.
A good production pattern
One reliable approach is:
- Take the raw user question
- Generate a concise standalone rewrite
- Optionally generate one expanded keyword-style query
- Run retrieval on both original and rewritten versions
- Fuse and rerank results
This preserves the userβs wording while giving the search stack a cleaner angle of attack.
How to evaluate query rewriting
Do not judge the rewrite by how elegant it sounds.
Judge it by retrieval quality:
- did relevant documents move up?
- did recall improve?
- did irrelevant matches increase?
- did answer quality improve downstream?
The rewrite is infrastructure, not prose.
Bottom line
If your RAG system fails on real user language, query rewriting is one of the highest-leverage fixes to test.
Start with standalone-question rewriting and dual retrieval against both original and rewritten forms. Many retrieval problems are not embedding problems at all. They are translation problems between how people ask and how documents are written.
Simplify
β Parent Document Retrieval: Solving RAG's Context Window Problem
Go deeper
Query Understanding for RAG: What Happens Before Retrieval β
Related reads
Stay ahead of the AI curve
Weekly insights on AI β explained at the level that's right for you. No hype, no jargon, just what matters.
No spam. Unsubscribe anytime. We respect your inbox.