How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

This article explains how Baidu’s next‑generation data platform Turing 3.0 integrates Apache Iceberg to solve the inefficiencies of the legacy MEG stack, detailing ecosystem components, migration strategies from Hive, table‑level optimizations, and future roadmap for high‑frequency, low‑latency analytics.

Apache IcebergHive MigrationTable Format

0 likes · 17 min read

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

Big Data Technology & Architecture

Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning

0 likes · 23 min read

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

Youzan Coder

Jan 9, 2019 · Big Data

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

This article details Youzan's transition from Hive to SparkSQL, covering platform architecture, usability and performance enhancements, migration strategies, automated engine selection, and future plans that together reduced resource consumption by up to 67% while handling thousands of daily jobs.

AvailabilityBig DataData Platform

0 likes · 13 min read

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

How Youzan Scaled 5,000 Daily SparkSQL Jobs: Migration Lessons from Hive

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance