AI Engineering02 Feb 2026

Building RAG Apps That Don’t Hallucinate Every Answer

Omer Toqeer

Author

Building RAG Apps That Don’t Hallucinate Every Answer

The Problem With Naive RAG

“We have a vector DB, we’re using embeddings… why is the model still making things up?”

I’ve seen this pattern in multiple projects: technically the stack is “RAG”, but the retrieval step is weak and the prompts are vague. The result: hallucinations wrapped in confident language.

What Actually Matters

1. Chunking Around Semantics, Not Just Tokens

Bad:

Split every 1,000 tokens, regardless of structure

Better:

Split by headings, bullet lists, and logical sections
Keep references (section numbers, page numbers) in each chunk

2. Retrieval With Constraints

Instead of “top 10 chunks by cosine similarity”, I:

Limit by document type (policy vs FAQ vs changelog)
Filter by date when recency matters
Use a smaller top-k (3–5) and increase only when needed

3. Grounding in the Prompt

I tell the model explicitly:

“Only answer from the provided context.”
“If the answer is not in the context, say you don’t know.”
“Always include the IDs of the chunks you used.”

4. Feedback Loop

I log:

The question
Retrieved chunks (IDs and previews)
The final answer
Whether the user accepted it

This gives me a way to:

Fix bad retrieval cases
Improve chunking rules
Add hard-coded routes for very common questions

Summary

Good RAG is mostly good retrieval engineering + good logging. The LLM is the easy part.

All Articles