Back to Blogs
AI Engineering02 Feb 2026

Building RAG Apps That Don’t Hallucinate Every Answer

Omer Toqeer

Omer Toqeer

Author

Building RAG Apps That Don’t Hallucinate Every Answer

The Problem With Naive RAG

“We have a vector DB, we’re using embeddings… why is the model still making things up?”

I’ve seen this pattern in multiple projects: technically the stack is “RAG”, but the retrieval step is weak and the prompts are vague. The result: hallucinations wrapped in confident language.

What Actually Matters

1. Chunking Around Semantics, Not Just Tokens

Bad:

  • Split every 1,000 tokens, regardless of structure

Better:

  • Split by headings, bullet lists, and logical sections
  • Keep references (section numbers, page numbers) in each chunk

2. Retrieval With Constraints

Instead of “top 10 chunks by cosine similarity”, I:

  • Limit by document type (policy vs FAQ vs changelog)
  • Filter by date when recency matters
  • Use a smaller top-k (3–5) and increase only when needed

3. Grounding in the Prompt

I tell the model explicitly:

  • “Only answer from the provided context.”
  • “If the answer is not in the context, say you don’t know.”
  • “Always include the IDs of the chunks you used.”

4. Feedback Loop

I log:

  • The question
  • Retrieved chunks (IDs and previews)
  • The final answer
  • Whether the user accepted it

This gives me a way to:

  • Fix bad retrieval cases
  • Improve chunking rules
  • Add hard-coded routes for very common questions

Summary

Good RAG is mostly good retrieval engineering + good logging. The LLM is the easy part.