🟣 Technical 8 min read

Metadata Filtering in RAG: The Most Underrated Retrieval Technique

Semantic search alone isn't enough for production RAG. Metadata filtering — combining vector similarity with structured filters — dramatically improves retrieval precision.

View all rag depths →

You’ve built a RAG system. Semantic search works well for general questions. Then a user asks: “What was our Q4 2025 revenue?” and the system returns a paragraph about Q3 2024 revenue because the text is semantically similar. The problem isn’t your embedding model — it’s that semantic similarity alone can’t distinguish time periods, document types, or access permissions.

Metadata filtering solves this by combining vector search with structured attribute filters. It’s simple to implement and dramatically improves retrieval precision for production systems.

What metadata filtering is

Every chunk in your vector store has an embedding (for semantic search) and can also have structured metadata — key-value pairs that describe properties of the chunk:

{
  "embedding": [0.12, -0.34, ...],
  "text": "Revenue grew 15% year-over-year...",
  "metadata": {
    "source": "annual-report-2025.pdf",
    "section": "financial-results",
    "quarter": "Q4",
    "year": 2025,
    "department": "finance",
    "access_level": "internal",
    "date_created": "2026-01-15"
  }
}

A filtered query applies both:

  1. Metadata filter: year = 2025 AND quarter = "Q4"
  2. Vector search: find the most semantically similar chunks

The database first narrows to chunks matching the metadata criteria, then ranks by vector similarity within that subset. This is pre-filtering — the most common and generally most effective approach.

Why it matters

Precision

Without filtering: “What was Q4 2025 revenue?” might return chunks from any quarter of any year — the word “revenue” is semantically similar regardless of time period.

With filtering: Only chunks from Q4 2025 are considered. The semantic search finds the most relevant revenue discussion within that narrow set.

Performance

Searching 1,000 chunks (filtered subset) is faster than searching 1,000,000 chunks (full index). Metadata filtering reduces the search space before the expensive vector comparison happens.

Access control

In enterprise RAG, not every user should see every document. Metadata filtering enforces access control at the retrieval level:

filter: access_level IN user.permissions

This is not optional for enterprise deployments — it’s a security requirement.

Designing your metadata schema

The metadata schema determines what filters are available. Design it based on how users will query the system.

Essential metadata fields

Source document. Which document does this chunk come from? Users often want to know the source, and you may need to filter by document type.

Date. When was the content created or last updated? Time-based filtering is critical for any domain where information changes (which is most domains).

Document type. Is this a policy document, a meeting transcript, a technical spec, an email? Different document types serve different query intents.

Section/chapter. Where in the document does this chunk appear? Useful for navigating long documents.

Domain-specific fields

Legal: jurisdiction, case_type, court_level, date_filed Healthcare: specialty, evidence_level, guideline_version Finance: fiscal_year, quarter, report_type, entity Engineering: product, version, component, author

Fields to avoid

Free-text fields. Metadata should be categorical or numeric. Don’t put paragraphs of text in metadata — that’s what the vector embedding is for.

High-cardinality unique identifiers. A unique ID per chunk doesn’t help with filtering. Include IDs for tracking but don’t build filters around them.

Redundant fields. If date already captures time, you don’t need separate year, month, day fields unless you filter on those independently.

Implementation patterns

Pattern 1: Query analysis for automatic filtering

Parse the user’s query to extract filter criteria before searching:

def extract_filters(query: str) -> dict:
    """Use an LLM to extract structured filters from natural language."""
    response = llm.complete(f"""
    Extract metadata filters from this query. Return JSON.
    Available filters: year (int), quarter (Q1-Q4), 
    department (string), document_type (string).
    
    Query: {query}
    
    Return only filters explicitly mentioned. 
    If none, return empty object.
    """)
    return json.loads(response)

# "What was Q4 2025 revenue?" 
# → {"year": 2025, "quarter": "Q4"}

This automates filter extraction so users don’t need to specify filters manually.

Pattern 2: Fallback on empty results

Strict filtering can return zero results if the filter is too narrow. Implement a fallback:

results = search(query, filters=strict_filters, top_k=5)
if len(results) == 0:
    # Relax filters and retry
    results = search(query, filters=relaxed_filters, top_k=5)
if len(results) == 0:
    # No filters, pure semantic
    results = search(query, top_k=5)

Always tell the user when filters were relaxed: “I couldn’t find Q4 2025 data specifically, but here’s what I found for 2025 overall.”

Return results alongside available filter options, like an e-commerce site:

Query: "employee benefits"
Results: [5 chunks about benefits]
Available filters:
  - Year: [2024, 2025, 2026]
  - Department: [HR, Finance, Legal]
  - Type: [Policy, FAQ, Email]

Users can progressively narrow their search. This is especially valuable in exploratory scenarios.

Pattern 4: Time-aware defaults

For domains where recency matters, apply default time filters:

default_filters = {
    "date_created": {"$gte": one_year_ago}
}

Users who want older content can explicitly override. This prevents stale information from dominating results by default.

Vector database support

All major vector databases support metadata filtering:

DatabaseFilter syntaxPre/post filter
PineconeJSON filter objectsPre-filter
WeaviateGraphQL-style filtersPre-filter
QdrantJSON filter conditionsPre-filter
ChromaWhere clausesPre-filter
pgvectorSQL WHERE clausesPre-filter (via SQL)
MilvusBoolean expressionsPre-filter

Pre-filtering (filter first, then vector search) is standard and generally performs better than post-filtering (vector search first, then filter) for most selectivity levels.

Indexing considerations

Metadata filtering performance depends on indexing:

Create indexes on filterable fields. Without indexes, filtering requires scanning all documents. With indexes, it’s near-instant.

Be aware of filter selectivity. A filter that selects 90% of documents provides almost no benefit. A filter that selects 1% dramatically narrows the search space.

Compound filters multiply selectivity. year=2025 might match 20% of chunks. year=2025 AND department="engineering" might match 3%. Stack filters for precision.

Common mistakes

Not extracting filters from queries. If users say “2025 annual report” and you search semantically without filtering by year or document type, you’ll get noisy results.

Over-filtering. Applying too many filters returns zero results. Start with the most discriminating filter and add more only if results are too broad.

Inconsistent metadata. If some chunks have year: 2025 and others have year: "2025" (string vs integer), filters break silently. Validate metadata types during ingestion.

Ignoring metadata in evaluation. When evaluating RAG quality, measure whether the correct metadata filters were applied, not just whether the right chunks were retrieved. A system that retrieves the right answer from the wrong year is a ticking time bomb.

Metadata filtering isn’t glamorous. It doesn’t appear in research papers as often as novel embedding techniques or reranking architectures. But in production RAG systems, it’s often the difference between “works in demos” and “works in real life.”

Simplify

← Hybrid Search for RAG: Combining Dense and Sparse Retrieval

Go deeper

Multi-Index RAG: Searching Across Different Knowledge Bases →

Related reads

ragmetadatafilteringretrievalvector-searcharchitecture

Stay ahead of the AI curve

Weekly insights on AI — explained at the level that's right for you. No hype, no jargon, just what matters.

No spam. Unsubscribe anytime. We respect your inbox.