AI-Native Systemsconcept · 3 मिनट · अपडेट 12 जून 2026

Retrieval-Augmented Generation (RAG)

लेखक HealthAtoms Editorial (AI-assisted draft)विशेषज्ञ समीक्षा लंबित

Ground an LLM's answer in your own documents: retrieve relevant passages first, then generate with them in context — citations included.

In one line

RAG answers questions from your knowledge base, not the model's memory: fetch the most relevant passages, hand them to the model with the question, and require the answer to cite them.

How it works

Offline: split documents into chunks, compute an embedding for each, store them in a vector index. At question time: embed the query, retrieve the nearest chunks (often re-ranked, often hybrid with keyword search), assemble them into the prompt, and generate. Done well, the answer cites which chunks support which sentence — and "I don't know" beats invention when retrieval comes back empty. Failure modes worth respecting: bad chunking, stale indexes, and retrieval misses that the model papers over fluently.

Where it shows up in digital health

Clinical-policy assistants grounded in a hospital's own protocols; coding/billing helpers grounded in payer rules; this platform's Vaidya, grounded in Kosha entries with visible citations and match scores — exactly the pattern on the Ask page. In health, ungrounded generation is a safety issue, not a style choice; RAG is the difference.

संदर्भ

Lewis et al. — Retrieval-Augmented Generation (2020)