Parent Document Retrieval: Solving RAG's Context Window Problem

RAG systems face a fundamental tension: small chunks (100-200 tokens) match queries more precisely, but they lose the surrounding context the LLM needs to generate good answers. Large chunks (1000+ tokens) provide context but dilute the relevant information, making retrieval less accurate.

Parent document retrieval is the elegant solution: index small chunks for retrieval, but return their parent documents for generation.

The Problem in Detail

Consider a technical document about configuring a database:

Small chunk (200 tokens): “Set max_connections to 100 for databases with fewer than 50 concurrent users. For higher concurrency, use the formula: max_connections = num_users × 1.5 + 20.”

Large chunk (1000 tokens): The entire “Connection Configuration” section, including related settings, dependencies, and examples.

If a user asks “How should I set max_connections?”, the small chunk matches perfectly. But the LLM answering might need the surrounding context about connection pooling, timeout settings, and memory implications to give a complete answer.

How Parent Document Retrieval Works

Indexing:
  Document → Split into parents (sections/pages)
           → Split parents into children (small chunks)
           → Embed and index children
           → Store mapping: child_id → parent_id

Retrieval:
  Query → Search children → Get matching child_ids
        → Look up parent_ids → Return parent documents
        → Send parents + query to LLM

The key insight: you’re using small chunks as an index into larger documents. The small chunks are search targets; the parent documents are what the LLM actually reads.

Implementation

Basic Implementation with LangChain

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Parent splitter: larger chunks (the context the LLM sees)
parent_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200
)

# Child splitter: smaller chunks (what gets embedded and searched)
child_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=50
)

# Store for parent documents
docstore = InMemoryStore()  # Use Redis/Postgres in production

retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

# Index documents
retriever.add_documents(documents)

# Retrieve: searches children, returns parents
results = retriever.invoke("How do I configure max_connections?")

Custom Implementation

import uuid
from dataclasses import dataclass

@dataclass
class Chunk:
    id: str
    text: str
    parent_id: str
    metadata: dict

def index_with_parents(documents, parent_size=2000, child_size=400):
    all_children = []
    parent_store = {}
    
    for doc in documents:
        # Create parent chunks
        parents = split_text(doc.text, parent_size, overlap=200)
        
        for parent in parents:
            parent_id = str(uuid.uuid4())
            parent_store[parent_id] = parent
            
            # Create child chunks from each parent
            children = split_text(parent.text, child_size, overlap=50)
            
            for child in children:
                all_children.append(Chunk(
                    id=str(uuid.uuid4()),
                    text=child.text,
                    parent_id=parent_id,
                    metadata={**doc.metadata, "parent_id": parent_id}
                ))
    
    # Index children in vector store
    embeddings = embed(c.text for c in all_children)
    vector_store.add(all_children, embeddings)
    
    return parent_store

def retrieve(query, parent_store, k=5, parent_k=3):
    # Search children
    child_results = vector_store.search(query, k=k)
    
    # Map to unique parents
    seen_parents = set()
    parents = []
    for child in child_results:
        pid = child.metadata["parent_id"]
        if pid not in seen_parents:
            seen_parents.add(pid)
            parents.append(parent_store[pid])
            if len(parents) >= parent_k:
                break
    
    return parents

Choosing Parent and Child Sizes

Use Case	Child Size	Parent Size	Reasoning
Technical docs	300-500 tokens	1500-2500 tokens	Sections are self-contained
Legal contracts	200-400 tokens	2000-3000 tokens	Clauses need surrounding context
Chat/email	200-300 tokens	Full message/thread	Messages are natural units
Code	100-300 tokens	Full function/class	Code needs complete units
Research papers	400-600 tokens	Full section	Sections are logical units

Rule of thumb: Child chunks should be small enough to precisely match queries. Parent chunks should be large enough to provide complete context for answering.

Advanced Patterns

Multi-Level Hierarchy

Instead of two levels, use three:

Document → Section → Paragraph → Sentence
                                    ↑ search here
              ↑ return this

Search on sentences (most precise matching), but return entire sections. This works well for long documents where sections are 2000-5000 tokens.

Deduplication

Multiple child chunks from the same parent might match a query. Without deduplication, you waste context window space returning the same parent multiple times:

def deduplicated_parent_retrieval(query, k_children=10, k_parents=3):
    children = vector_store.search(query, k=k_children)
    
    parent_scores = {}
    for child in children:
        pid = child.metadata["parent_id"]
        if pid not in parent_scores:
            parent_scores[pid] = child.score
        else:
            # Boost parents with multiple matching children
            parent_scores[pid] = max(parent_scores[pid], child.score)
    
    # Return top parents by best child score
    top_parents = sorted(parent_scores.items(), key=lambda x: x[1], reverse=True)
    return [parent_store[pid] for pid, score in top_parents[:k_parents]]

Dynamic Parent Sizing

Use natural document boundaries instead of fixed sizes:

def split_by_structure(document):
    """Use headings, paragraph breaks, or other structural markers as parent boundaries"""
    parents = []
    current_parent = []
    
    for element in parse_document(document):
        if element.is_heading and current_parent:
            parents.append(join(current_parent))
            current_parent = [element]
        else:
            current_parent.append(element)
    
    if current_parent:
        parents.append(join(current_parent))
    
    return parents

Natural boundaries produce more coherent parent chunks than arbitrary size-based splitting.

When to Use Parent Document Retrieval

Use it when:

Documents have clear hierarchical structure
Answers require context beyond the matching passage
Your chunks are too small for the LLM to work with
You’re getting precise but incomplete answers

Don’t use it when:

Documents are very short (the whole document fits in a chunk)
Each chunk is self-contained (FAQ entries, glossary definitions)
Context window is very limited (parent documents might be too large)
You need maximum retrieval precision without context (keyword search use cases)

Parent document retrieval adds minimal complexity to a RAG system but often produces significantly better answers. If your RAG system is returning relevant but incomplete information, this is the first optimization to try.