Six RAG strategies, explained simply (with code).

Six Ways to Retrieve Data for an LLM

Apr 03, 2026

Most RAG tutorials jump straight to vector embeddings. Half the time, that’s the wrong tool.

RAG means retrieval augmented generation. Retrieve information, add it to the prompt, let the model answer using it as context. The retrieval part is where it gets interesting; there are a lot more options than most people realise.

As a side note, I use Postgres for everything. You don’t need complex database infrastructure. Postgres can handle all of these strategies, making it a pragmatic choice for most situations.

Here are six approaches I use on real client projects, roughly in order of complexity.

PS: If you want a full walkthrough, I made a video here.

1. Document loading

Almost everyone dismisses this one, because it’s simple and boring.

If you’re loading a step-by-step runbook, a checklist, or a recipe - you can’t just retrieve partial bits of information. You need the entire document or the answer won’t make sense. Partial retrieval of a setup guide produces partial answers.

Two ways to do it. The naive approach: read the file, stick it in the prompt.

with open(path, "r") as f:
    document = f.read()
prompt = f"Answer using this document:\n\n{document}\n\nQuestion: {question}"

The smarter approach: an index or lookup system that describes each document. Pass the index to the model, let it pick the right document first, then load it. Slower, but handles a larger document set.

The downside is tokens. The other downside is you need to know roughly where the information lives. If you have hundreds of documents and no idea which one is relevant, this won’t work. But for a focused document set, surprisingly reliable.

Don’t dismiss the simple option.

2. Full text search

This has been around forever. Search by keyword. Built into Postgres with tsvector and tsquery, no extra infrastructure needed.

SELECT content, ts_rank(search_vector, query) AS rank
FROM document_chunks, plainto_tsquery('english', %s) query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT %s

Someone asks about “30-day returns” and your document says “30-day returns,” you’ll find it. Postgres also stems keywords, so a search for “running” matches “runner” and “runs” as well.

Where it breaks: meaning. “Comfortable shoes for long distance running” will only find product descriptions that contain the word comfortable. It’ll miss cushioned, supportive, anything like that. Use this when your users naturally reach for the same words your documents use.

3. Vector search

This is what most people mean by RAG. Take a document, break it into chunks, turn each chunk into a vector, store it. When a user asks a question, embed the query into the same space and find the closest matches.

PS: As well as the cliched chunk documents approach - you can also turn database fields (product descriptions e.t.c) into vectors and search them as well (no one talks about this!).

The power: it understands what you meant, not just what you said. “Comfortable shoes for long distance running” finds a product described as “plush cushioned sole designed for marathon training.” The description never used the word comfortable. Vector search found it anyway.

Pgvector adds this directly to Postgres. No separate database.

sql = """
    SELECT content, 1 - (embedding <=> %s::vector) AS similarity
    FROM document_chunks
    ORDER BY embedding <=> %s::vector
    LIMIT %s
"""

Where it breaks: exact filters. “Nike shoes under £100” is a structured query. The embedding of “under £100” does not reliably land near documents that contain £99. It might return £200 shoes because the description is semantically similar. Semantic similarity and numerical filtering are different problems.

4. Hybrid search

Combine keyword and vector search, merge the results. This is my default when I’m not sure which approach a dataset needs.

The merging uses Reciprocal Rank Fusion. Each document scores based on where it appeared in each ranked list.

def reciprocal_rank_fusion(keyword_results, vector_results, k=60):
    scores = {}
    for rank, result in enumerate(keyword_results):
        scores[result["id"]] = scores.get(result["id"], 0) + 1 / (k + rank + 1)
    for rank, result in enumerate(vector_results):
        scores[result["id"]] = scores.get(result["id"], 0) + 1 / (k + rank + 1)
    return sorted(scores, key=lambda x: scores[x], reverse=True)

“Nike running shoes, comfortable for long distance.” Keyword search finds anything with Nike, vector search finds anything semantically similar to comfortable and long distance. You get both.

A solid and pragmatic choice for many business applications.

5. SQL RAG (Database RAG)

This one is relatively underrated and under-discussed, and it’s one of my favourites.

Plot twist: Most business data isn’t in documents. Customer records, orders, inventory, product listings. None of that is in a PDF. It’s in a database. SQL RAG turns a natural language question into a database query and just goes and gets exactly what you need. Tends to be very reliable.

Two approaches with different risk profiles.

Parameterised queries (safer). Pre-write the SQL. The model extracts parameters from the question and slots them in. The model never writes SQL.

sql = """
    SELECT name, price, stock_quantity
    FROM products
    WHERE category = %(category)s
    AND price < %(max_price)s
    AND rating >= %(min_rating)s
"""

Dynamic query generation (more powerful, riskier). The model writes the actual SQL. LLMs are surprisingly good at this. The queries get complex fast, joining tables, applying multiple filters, and they’re usually correct.

I really like this approach for internal analytics tools or database querying tools where the cost of a bad query is low. I’d be very hesitant to use it on a customer-facing product.

Start with parameterised queries via regular tool calls.

6. Agentic RAG

Give the model access to all the retrieval tools above and let it decide which one to use.

tools = [
    {"name": "search_documents", "description": "Search docs and FAQs"},
    {"name": "query_products", "description": "Search products by price, category, rating"},
    {"name": "get_order_status", "description": "Look up orders for a customer"}
]

Where this shines is compound questions. “I want running shoes under £150, and what’s the return policy if they don’t fit?” That needs a product database query AND a document lookup. One retrieval strategy can’t answer it. The agent looks at the question, looks at the tools it has, figures out which to call, and synthesises the answer.

The downside is latency. The agent has to make decisions and sometimes makes bad ones. Picks the wrong search term, retries. It’s also less deterministic. But this is the kind of strategy tools like Claude Code use: search in one file, realise it’s wrong, correct, search somewhere else. Very powerful.

How to choose

A few well-defined or small documents, low query volume or cost sensitivity, users need full context: document loading.
Users search with specific keywords: full text search.
Semantic understanding matters: vector search.
Not sure which applies, or need both: hybrid search.
Data is in a database: SQL RAG, start with parameterised queries.
Compound questions or complex search requirements across multiple data sources: agentic RAG.

Most production systems combine at least two. Agentic RAG is really just a routing layer over the others.

All six strategies are implemented against the same database in the video. Run the same questions through each one and watch where they fail. Seeing the failure modes side by side is more useful than any explanation.

If you want a full walkthrough, I made a video here.

Thanks for reading.

Owain

The AI Engineer

Ready for more?