Author

Past Memory Big Data

A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.

Articles

Likes

Views

Comments

Latest from Past Memory Big Data

58 recent articles

Past Memory Big Data

Dec 27, 2022 · Operations

How Volcano Engine DataTester Handles Private Deployment: Architecture, Challenges, and Business‑Driven Solutions

This article details Volcano Engine DataTester's private deployment architecture, the version‑management, performance, and stability challenges encountered, and the business‑oriented solutions—including branch strategies, pipeline automation, ClickHouse model optimizations, and multi‑level caching—that enable reliable, efficient A/B testing in on‑premise environments.

A/B testingAnsibleClickHouse

0 likes · 13 min read

How Volcano Engine DataTester Handles Private Deployment: Architecture, Challenges, and Business‑Driven Solutions

Past Memory Big Data

Nov 26, 2022 · Big Data

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

The interview with Alibaba researcher Wang Feng reviews Flink's eight‑year journey to a top Apache project, its massive scale at Double 11, the push toward unified stream‑batch computing, emerging storage challenges, and the roadmap for cloud‑native, real‑time data warehousing.

Apache FlinkBatch processingCDC

0 likes · 16 min read

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

Past Memory Big Data

Nov 15, 2022 · Big Data

How Uber Accelerated Presto Queries with Alluxio Local Cache

Uber processes over 500,000 daily Presto queries across 20 clusters handling more than 50 PB of data, and by deploying Alluxio Local Cache on NVMe disks they raised cache‑hit rates from roughly 65% to over 90% while addressing real‑time partition updates, node churn, and cache‑size constraints.

AlluxioBig DataConsistent Hashing

0 likes · 15 min read

How Uber Accelerated Presto Queries with Alluxio Local Cache

Past Memory Big Data

Oct 29, 2022 · Big Data

How to Adapt Hadoop for Domestic Big Data Requirements

The article analyzes Hadoop’s declining relevance, the dominance of CDH/HDP, security pressures from vulnerabilities, and outlines ten technical steps—including hardware adaptation, component selection, dependency resolution, compilation, Ambari integration, packaging, testing, and functional verification—required to create a domestic ARM‑based Hadoop distribution, which the authors have released as a free HDP 3.3.1 build.

ARMAmbariBig Data

0 likes · 15 min read

How to Adapt Hadoop for Domestic Big Data Requirements

Past Memory Big Data

Oct 13, 2022 · Big Data

Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

This article walks through the performance bottleneck of CPU in data analytics, introduces the Velox vectorized execution engine, and provides a detailed, zero‑to‑one tutorial for downloading Presto source, syncing Velox, fixing build paths, compiling both Java and C++ components, configuring CLion and IntelliJ, launching the servers, and executing SQL queries while noting stability concerns.

big-datacppintegration

0 likes · 19 min read

Step-by-Step Guide: Integrating Presto with Velox on macOS (Build, Configure, and Run)

Past Memory Big Data

Oct 12, 2022 · Databases

SelectDB Beats ClickHouse to Top ClickBench Analytical Database Rankings

SelectDB, a new cloud‑native data warehouse built on Apache Doris, topped the ClickBench analytical database benchmark on c6a.4xlarge 500 GB gp2 instances, outpacing ClickHouse by 35% in Hot Run and 25% in Cold Run while also delivering high write throughput of over 140 MB/s.

BenchmarkClickBenchClickHouse

0 likes · 6 min read

SelectDB Beats ClickHouse to Top ClickBench Analytical Database Rankings

Past Memory Big Data

Oct 9, 2022 · Operations

How Cloud Music Scaled Data Governance: Practices, Metrics, and Lessons Learned

The article details Cloud Music’s data‑governance journey, covering early modeling standards, self‑service data tools, quality and metadata management, asset‑reuse improvements, and cost‑saving Spark optimizations, while sharing concrete metrics, processes, and the team’s systematic methodology.

Data WarehouseMetadatacloud music

0 likes · 18 min read

How Cloud Music Scaled Data Governance: Practices, Metrics, and Lessons Learned

Past Memory Big Data

Sep 26, 2022 · Databases

How ClickHouse Gains Full Upsert and Delete Support with UniqueMergeTree

This article examines ClickHouse’s lack of native upsert and delete operations, explains how ByteHouse’s UniqueMergeTree engine introduces a Mark‑Delete + Insert approach to provide real‑time upsert/delete capabilities, and presents benchmark results showing dramatically faster queries with only modest write overhead.

ByteHouseClickHouseColumnar Database

0 likes · 10 min read

How ClickHouse Gains Full Upsert and Delete Support with UniqueMergeTree

Past Memory Big Data

Sep 13, 2022 · Databases

Velox: An Open‑Source Unified Execution Engine for Data Systems

Velox is Meta's open‑source unified execution engine that consolidates common data‑intensive components, integrates with engines like Presto, Spark, and TorchArrow, and delivers up to ten‑fold speedups on CPU‑bound queries while simplifying development and fostering a reusable, community‑driven ecosystem.

Data ManagementPerformanceSpark

0 likes · 9 min read

Velox: An Open‑Source Unified Execution Engine for Data Systems

Past Memory Big Data

Sep 8, 2022 · Backend Development

Implementing Fixed-Length Queues and Batch Consumption in Redis with Lua

The article details how a gaming data‑reporting pipeline uses Redis lists, sets and Lua scripts to build a pseudo‑message queue that provides per‑game message grouping, fixed‑length queues, and batch consumption while meeting strict timeliness and throughput requirements.

BackendBatch processingFixed-length queue

0 likes · 12 min read

Implementing Fixed-Length Queues and Batch Consumption in Redis with Lua