Recap of Baidu Waimai Tech Team’s “Code Talk” Session on Data Platform Architecture and Big Data Practices
The article summarizes Baidu Waimai’s recent “Code Talk” event, highlighting the speaker’s overview of the company’s big‑data platform evolution, its technical architecture, practical challenges such as data security and accuracy, and a lively Q&A covering storm, high availability, and metric management.
The latest “Code Talk” session organized by Baidu Waimai’s technology team concluded successfully, with a full house and enthusiastic participants. The report includes photos of the venue before and after the audience filled the seats.
The speaker, Mr. Lv, introduced the development history of Lianjia’s big‑data platform, its mission, and positioning, giving the audience insight into the stages experienced by other large‑data enterprises.
Platform Technical Architecture
1. Data Services: data visualization, ad‑hoc queries, data APIs.
2. Data Capabilities: computation engine, ETL scheduling system, management and control.
3. Data Management: metadata management platform, metric management (noting inconsistencies across teams).
Practical Reflections
1. Data security – linking data to individuals and cost accounting.
2. Data accuracy – ensuring precision at each processing stage.
3. Unified platform with differentiated fulfillment – prioritizing critical tasks.
4. Technology freshness vs. stability – preferring stable solutions.
5. Scaling with limited personnel – abstracting requirements rather than supporting every request directly.
6. Future direction of the data platform – focusing on current improvements.
Q&A Highlights
Why stop using Storm? The team originally used Storm due to PHP expertise; they switched to Java because the team is now more comfortable with Java and Storm’s community activity declined.
Big‑data high availability status? Development and operations are combined; monitoring via Falcon, resource isolation, queue segregation, and validation are used. Future plans include cloud backup and multi‑cluster support.
API scope for OLAP and Kylin? OLAP query capability is limited; external APIs should use Redis cache and HBase, while internal queries can use other solutions.
Number of metric dimensions? Recommended not to exceed 15.
How to trace inaccurate data through a long pipeline? Re‑run the process; avoid relying solely on HDFS and consider using GPDB for selective data changes.
Ensuring metric consistency? Build a unified metric platform and adopt consistent modeling for external reporting.
Does adding more technical solutions increase complexity? After solving generic problems, more personalized requirements lead to adopting varied solutions, which naturally raises complexity.
The session ended with thanks to Mr. Lv for his thorough sharing and to all participants for their enthusiasm. The next “Code Talk” is eagerly anticipated.
Baidu Waimai Technology Team
The Baidu Waimai Technology Team supports and drives the company's business growth. This account provides a platform for engineers to communicate, share, and learn. Follow us for team updates, top technical articles, and internal/external open courses.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.