Tagged articles
3 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Jun 30, 2025 · Big Data

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

This article explains how Baidu’s next‑generation data platform Turing 3.0 integrates Apache Iceberg to solve the inefficiencies of the legacy MEG stack, detailing ecosystem components, migration strategies from Hive, table‑level optimizations, and future roadmap for high‑frequency, low‑latency analytics.

Apache IcebergHive MigrationTable Format
0 likes · 17 min read
How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning
0 likes · 23 min read
Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)
Youzan Coder
Youzan Coder
Jan 9, 2019 · Big Data

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

This article details Youzan's transition from Hive to SparkSQL, covering platform architecture, usability and performance enhancements, migration strategies, automated engine selection, and future plans that together reduced resource consumption by up to 67% while handling thousands of daily jobs.

AvailabilityBig DataData Platform
0 likes · 13 min read
How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive