WhiteRabbit & Rabbit-in-a-Hat (OMOP ETL design)
OHDSI tools that profile your source database and help you design the ETL that maps it to the OMOP Common Data Model — the planning stage of data harmonization.
In one line
Before you can harmonize data into OMOP, you have to understand what you actually have: WhiteRabbit scans and profiles a source database, and Rabbit-in-a-Hat turns that scan into a visual ETL design mapping each source field to the CDM. Licence: Apache 2.0 (fully open).
The problem it solves
The hardest part of an OMOP conversion isn't writing the ETL code — it's understanding the messy source first: what tables exist, what the fields really contain, what codes appear and how often. Guess, and the mapping is wrong. WhiteRabbit and Rabbit-in-a-Hat are the disciplined planning stage that makes the conversion correct.
How the two tools work together
- WhiteRabbit connects to a source database (or CSVs) and produces a scan report: every table, every field, with value distributions and frequencies — without exposing raw records (so it's safe to share with mappers).
- Rabbit-in-a-Hat reads that report and gives you a drag-and-connect canvas to document
how
source.diagnosis_codebecomescondition_occurrence.condition_concept_id, field by field — including the vocabulary mapping (often refined with Usagi for source-code → standard-concept matching).
The output is the ETL specification your engineers then implement — the design, not the code.
Where it shows up in digital health
Every real OMOP conversion — a hospital's EHR, a claims extract, a registry — starts with a WhiteRabbit scan and a Rabbit-in-a-Hat design. It is the methodical counterpart to the "source → standard" scenario in the OMOP Data Harmonization lab: the lab shows the mapping in code; these tools are how teams plan it at scale before writing a line.
Common pitfalls
- Skipping the scan — mapping from memory or documentation (which is always out of date) instead of from the data's reality.
- Undocumented decisions — the ETL spec is also the audit record of how data was transformed; vague mappings haunt later analysis.
- Ignoring frequencies — a code that appears twice and one appearing two million times deserve different attention.
Key takeaways
- WhiteRabbit profiles the source (tables, fields, value frequencies) — safely, no raw data.
- Rabbit-in-a-Hat turns that into a documented field-by-field ETL design to OMOP.
- They're the planning front half of harmonization; engineers implement the spec.
- Skipping this stage is the most common cause of a broken OMOP dataset.
Check your recall
0 of 2 recalledActive recall beats re-reading — try to answer, then reveal.
What do WhiteRabbit and Rabbit-in-a-Hat do?
Why is the scan-first step important?