Airbnb Data Privacy and Security Engineering: Automated Data Protection Service Overview
Airbnb’s Data Protection Service unifies privacy and security metadata, offering APIs that automate annotation verification, export and IDL validation, data‑subject‑rights orchestration, and secret‑leak detection, while assigning ownership, minimizing manual effort, and ensuring global, consistent compliance across the platform.
Welcome to the third part of the "Airbnb Data Privacy and Security Engineering" series, which describes how Airbnb builds a powerful, automated, and extensible data security platform.
Key challenges addressed:
Accountability: Security and privacy compliance must be shared by platform teams, developers, product lifecycles, and suppliers, with service owners responsible for the data they control.
Minimal operational overhead: Most protection work should be automated to reduce manual effort.
Global consistency: Provide a single source of truth for privacy and security annotations across teams.
Data Protection Service (DPS) integrates all components of the Data Protection Platform (DPP). It exposes APIs for downstream services to query privacy and security metadata stored in Madoka, and it defines automated "jobs" such as creating JIRA tickets and generating GitHub pull requests.
Data protection annotation verification introduces three data‑type levels (critical, personal, public) and requires owners to tag each table column. These tags drive access‑control and retention policies.
Database export validation uses a CI check that queries DPS for each column’s privacy classification. The check fails if the PR annotation does not match the DPS classification, otherwise it applies regex‑based suggestions. A daily task also validates stored Hive export files and notifies owners of mismatches.
IDL validation captures traffic samples via Inspekt, scans them, and compares the resulting sensitivity classification with the annotations in Thrift IDL files. Discrepancies trigger JIRA tickets and automatically generated PRs that update the IDL annotations.
Data Subject Rights (DSR) orchestration – Obliviate coordinates deletion, access, and export requests. Requests are propagated via Kafka to downstream services, which execute the DSR actions. The Obliviate client provides a Thrift schema template for each service and abstracts Kafka handling, retry logic, and compliance notifications.
Automated integration of Obliviate reduces manual effort by having DPS generate a list of columns that contain personal data but are not yet integrated, create PRs that add the Obliviate client code and Thrift structures, and open JIRA tickets linking to those PRs.
Eliminating accidental secret leaks builds on previous work (Angmar and Inspekt) to detect business or infrastructure keys in code, logs, and data stores. DPS de‑duplicates findings, sends Slack/Datadog alerts, creates security‑issue tickets, and can trigger regression scans after a ticket is closed to verify that the secret has been removed.
The article concludes that this series demonstrates how large‑scale, automated data protection can lower security and privacy risk across Airbnb’s ecosystem.
Airbnb Technology Team
Official account of the Airbnb Technology Team, sharing Airbnb's tech innovations and real-world implementations, building a world where home is everywhere through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.