Guardrails & safety
The layers around a model that keep it in-scope, grounded and resistant to manipulation — engineering, not hope.
In one line
Guardrails are the defence-in-depth around an LLM: input checks, scoped capabilities, output validation and audit — so one clever prompt can't turn your tutor into a prescriber or your agent into a data leak.
How it works
Layered, because no single layer holds: scope rules in the system prompt (what the assistant is and is not); input screening for prompt-injection patterns — including injection hidden in retrieved documents, the RAG-era twist; capability scoping for tools (least privilege, schema-validated arguments, human confirmation for consequential actions); output validation (schema checks, citation-presence checks, PII filters); and audit logs plus feedback loops so failures are findable. OWASP's LLM Top 10 is the checklist that keeps this honest.
Where it shows up in digital health
The medical-advice boundary is the canonical health guardrail: an education assistant explains concepts, refuses patient-specific recommendations, and says so clearly — Vaidya's contract in this platform's checklist. Add jailbreak resistance for anything public-facing, and grounding enforcement (no citation, no claim) as the difference between a knowledge tool and a liability.