Big Data Technology & Architecture
Author

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

1.0k
Articles
0
Likes
427
Views
0
Comments
Recent Articles

Latest from Big Data Technology & Architecture

100 recent articles max
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 4, 2025 · Artificial Intelligence

How Large Language Models Are Transforming Data Development and Developer Roles

The article discusses how large language model tools such as Cursor, DeepSeek, and Doubao are increasingly assisting code writing, SQL translation, job‑failure analysis, and documentation in data‑development workflows, while also reshaping job requirements and creating new opportunities for skilled developers.

AIData DevelopmentSQL automation
0 likes · 5 min read
How Large Language Models Are Transforming Data Development and Developer Roles
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices

This article presents a detailed overview of Douyin Group's Data Asset Management Platform, focusing on the evolution, architecture, modeling, metrics, and application scenarios of its large‑scale data lineage system, and outlines future directions for full‑coverage, fine‑grained lineage capabilities.

Big DataData Asset ManagementData Lineage
0 likes · 17 min read
Douyin Group Data Asset Management Platform: Comprehensive Data Lineage Overview and Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 15, 2025 · Big Data

From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide

This article shares a data‑engineering student’s personal experience—from a misaligned operations role to mastering big‑data technologies, building a portfolio, crafting a targeted resume, and navigating multi‑stage interviews—offering concrete advice and a structured learning roadmap for aspiring data professionals.

Big DataData EngineeringInterview preparation
0 likes · 14 min read
From Operations to Data Engineering: A Student’s Real‑World Journey and Practical Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 13, 2025 · Big Data

How Apache Paimon Manages Snapshot Expiration: Synchronous vs Asynchronous Modes

This article explains Apache Paimon's snapshot expiration mechanism, comparing synchronous and asynchronous execution modes, their advantages and drawbacks, and how table properties control expiration to balance data consistency, performance, and back‑pressure in large‑scale data processing systems.

Apache PaimonData ConsistencySynchronous
0 likes · 6 min read
How Apache Paimon Manages Snapshot Expiration: Synchronous vs Asynchronous Modes
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 2, 2025 · Big Data

Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details

This article provides a comprehensive overview of Apache Paimon, covering its real‑time lake ingestion, unified stream‑batch processing, table types (primary‑key and append‑only), LSM‑tree storage, bucket mechanisms, merge‑engine options, compaction strategies, concurrency control, consumption methods, tag management, data cleanup, and system tables for big‑data workloads.

Apache PaimonBig DataFlink
0 likes · 25 min read
Apache Paimon: Core Capabilities, Table Types, LSM Tree, Buckets, Merge Engines, and Operational Details
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 31, 2024 · Big Data

Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables

This article explains how Spark ≥ 3.3 introduces Storage Partitioned Join (SPJ) to avoid costly shuffle operations when joining partitioned V2 source tables such as Apache Iceberg, detailing the required conditions, configuration settings, practical code examples, and various join scenarios including mismatched partitions and data skew.

BucketingData SkewSQL
0 likes · 15 min read
Eliminating Shuffle in Spark Joins with Storage Partitioned Join (SPJ) for Iceberg Tables
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 26, 2024 · Fundamentals

Detailed Granularity Fact Tables (DWD): Types, Design Principles, and Comparison

The article explains the three detailed-granularity fact table types—transaction, periodic snapshot, and cumulative snapshot—detailing their purposes, design principles, and comparative usage, and offers a simplified interpretation to help data engineers choose the appropriate fact table for data warehouse modeling.

Big DataDWDData Warehouse
0 likes · 5 min read
Detailed Granularity Fact Tables (DWD): Types, Design Principles, and Comparison
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation
0 likes · 5 min read
Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse