Midas Certification: Airbnb’s End-to-End Data Quality Framework
Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.
At Airbnb, data‑driven culture drives the creation of a sophisticated data infrastructure, including open‑source projects such as Apache Airflow and Apache Superset. As the company grew, the size of its data warehouse created challenges that led to the development of a “golden standard” data‑quality certification called Midas.
Definition of the golden standard – The rapid growth of Airbnb’s data warehouse made it difficult to enforce a unified data‑quality and reliability standard. Internal surveys in 2019 showed data scientists struggled to locate needed data and to assess its quality. This motivated the definition of a multi‑dimensional “golden standard” covering accuracy, consistency, usability, timeliness, cost‑effectiveness, completeness, and more.
End‑to‑end data quality – The standard applies to all data assets, not only warehouse tables. Metrics defined in Minerva serve as a single source of truth for downstream analysis, experiments, anomaly detection, and machine‑learning feature stores.
Midas commitment – Data that passes Midas certification meets the golden‑standard criteria and is clearly marked in internal tools.
Certification process – The Midas workflow consists of nine steps (illustrated in Figure 4) that evaluate data models, pipelines, tables, and metrics. The process is collaborative, requiring data engineers and data scientists to co‑design each model.
Multi‑party collaboration – Each Midas model is built by a data engineer and a data scientist, ensuring both technical feasibility and business relevance. The process also invites broader stakeholder review, reducing later rework.
Design documentation – The first step is to produce a design document that describes pipelines, tables, and metrics. A standardized template captures owners, auditors, metric definitions, pipeline diagrams, SLA targets, and data‑quality checks.
Data verification – After the design and pipeline are built, data is validated through automated health checks and manual historical‑data verification (including SQL queries and code snippets).
Review stages – Four review phases are performed: design‑doc review, data‑verification review, code review of the pipeline, and metric review in Minerva.
Bug fixing and change requests – Midas certification improves the management of data‑pipeline bugs and feature requests by clarifying ownership and providing a formal change‑request process.
Conclusion – Midas provides a comprehensive, company‑wide data‑quality standard that ensures certified data is accurate, reliable, cost‑effective, well‑documented, and supported. While certification requires significant upfront investment, it underpins critical downstream applications such as reporting, product analysis, experimentation, and machine‑learning at Airbnb.
Airbnb Technology Team
Official account of the Airbnb Technology Team, sharing Airbnb's tech innovations and real-world implementations, building a world where home is everywhere through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.