LLMOps & evals
The operations discipline for AI features: versioned prompts, automated evals, monitoring and cost control — because 'it seemed fine in the demo' is not a deployment strategy.
In one line
LLMOps is everything around the model call that makes an AI feature shippable: evaluation suites that catch regressions, versioning of prompts and models, runtime monitoring, and cost/latency budgets.
How it works
The centre of gravity is evals: a curated set of inputs with graded expectations, run automatically whenever the prompt, retrieval, or model version changes. Grading is exact-match where possible, rubric-by-LLM where judgement is needed, and human review for the safety-critical slice. Around that: tracing every request (prompt, retrieved context, output, feedback), canary rollouts for prompt changes, fallback models, token budgets, and drift watch — yesterday's accuracy is not a property of tomorrow's deployment.
Where it shows up in digital health
Anywhere an LLM output reaches a clinician or learner. For Vaidya the eval set is explicit before launch: grounding fidelity (does every claim trace to a cited Kosha entry?), refusal correctness (no patient-specific advice), tone-by-fidelity, and the golden rule from our own cost docs — per-tier limits and monitoring live before the feature does, never after the first bill.