Design and Evolution of Airbnb's Log Data Storage and Query Platform
The article describes how Airbnb's data infrastructure team built a next‑generation log storage and query platform to improve data quality, timeliness, flexibility, and anomaly detection, outlining the system architecture, key requirements, five improvement areas, and the resulting benefits.
As Airbnb’s business grew rapidly, traditional batch‑processing and unstructured log handling could no longer meet application needs, prompting the data infrastructure team to develop a new log storage and query platform focused on data quality, real‑time availability, flexible querying, multidimensional analytics, and anomaly detection.
Background: Logs serve as the bridge between product and data warehouse, enabling fraud detection, user acquisition, A/B testing, and product decisions; however, the previous system with over 800 unstructured JSON log types suffered from errors, lack of monitoring, and reliability issues.
Platform Requirements: The new platform must ensure data timeliness (predictable ingestion), completeness (no loss or duplication), and quality (valid, deserializable records).
Architecture Overview: Client applications (web and mobile) generate logs that are sent to Kafka via proxies; downstream jobs consume Kafka messages using Camus, storing data in Hive/Presto for offline analysis, while derived databases feed data products back to front‑end services. The stack is built on Ruby services and internal clusters.
Key Improvement Areas (five):
Module monitoring to guarantee correctness and reliability of each pipeline component.
End‑to‑end audit mechanisms for overall system reliability.
Enforced log format constraints to reduce invalid logs and improve data quality.
Anomaly detection modules for rapid identification of failures.
Real‑time stream processing to enable faster queries and aggregations.
Module Monitoring Details: Monitor process health, CPU/memory usage, compare input/output data volumes, and detect seasonal patterns to trigger alerts when deviations occur.
Conclusion: The revamped log platform provides system‑level detection and alerts, quantifies platform reliability, enforces log format standards, introduces real‑time stream processing, and offers an anomaly detection service for swift issue identification.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.