Back to Blogs

AI Engineering02 Feb 2026
Building RAG Apps That Don’t Hallucinate Every Answer

Omer Toqeer
Author

The Problem With Naive RAG
“We have a vector DB, we’re using embeddings… why is the model still making things up?”
I’ve seen this pattern in multiple projects: technically the stack is “RAG”, but the retrieval step is weak and the prompts are vague. The result: hallucinations wrapped in confident language.
What Actually Matters
1. Chunking Around Semantics, Not Just Tokens
Bad:
- Split every 1,000 tokens, regardless of structure
Better:
- Split by headings, bullet lists, and logical sections
- Keep references (section numbers, page numbers) in each chunk
2. Retrieval With Constraints
Instead of “top 10 chunks by cosine similarity”, I:
- Limit by document type (policy vs FAQ vs changelog)
- Filter by date when recency matters
- Use a smaller top-k (3–5) and increase only when needed
3. Grounding in the Prompt
I tell the model explicitly:
- “Only answer from the provided context.”
- “If the answer is not in the context, say you don’t know.”
- “Always include the IDs of the chunks you used.”
4. Feedback Loop
I log:
- The question
- Retrieved chunks (IDs and previews)
- The final answer
- Whether the user accepted it
This gives me a way to:
- Fix bad retrieval cases
- Improve chunking rules
- Add hard-coded routes for very common questions
Summary
Good RAG is mostly good retrieval engineering + good logging. The LLM is the easy part.