Tagged articles
3697 articles
Page 13 of 37
政采云技术
政采云技术
Mar 9, 2023 · Fundamentals

Redesigning Data Warehouse Models: When and How to Use Dimensional Modeling

This article explains the concept of data models, why warehouse models need reconstruction, compares normative and dimensional modeling approaches, and provides a step‑by‑step guide—including information gathering, design, and implementation—to build efficient, maintainable data warehouse architectures.

Big DataData WarehouseDatabase Design
0 likes · 12 min read
Redesigning Data Warehouse Models: When and How to Use Dimensional Modeling
Architect's Tech Stack
Architect's Tech Stack
Mar 9, 2023 · Big Data

Improving Data Warehouse Performance: From Clusters and Pre‑Computation to esProc SPL

The article analyzes the growing performance challenges of data warehouses, evaluates traditional solutions such as clustering, pre‑computation and optimization engines, and presents esProc SPL as a non‑SQL, low‑complexity alternative that delivers orders‑of‑magnitude speedups on modest hardware.

Big DataData WarehousePerformance Optimization
0 likes · 16 min read
Improving Data Warehouse Performance: From Clusters and Pre‑Computation to esProc SPL
Architects Research Society
Architects Research Society
Mar 8, 2023 · Big Data

Understanding DataOps: Principles, Benefits, and Implementation

DataOps, rooted in agile and DevOps philosophies, uses automation and collaborative practices to streamline data processing, improve quality, and align analytics with business goals, offering continuous analytics, faster insights, and breaking data silos for better decision‑making across organizations.

Big DataContinuous AnalyticsDataOps
0 likes · 10 min read
Understanding DataOps: Principles, Benefits, and Implementation
Alimama Tech
Alimama Tech
Mar 8, 2023 · Artificial Intelligence

Secure Data Hub: Alibaba's Marketing Privacy Computing Platform

Alibaba’s Secure Data Hub (SDH) is a privacy‑preserving data clean‑room platform that uses secure multi‑party computation and privacy‑enhancing machine learning to let advertisers, ad platforms, and auditors jointly analyze marketing data via a simple SQL API while keeping raw data encrypted, column‑level protected, and confined to each party’s private domain.

Big Datadata clean roomsql
0 likes · 13 min read
Secure Data Hub: Alibaba's Marketing Privacy Computing Platform
DataFunTalk
DataFunTalk
Mar 8, 2023 · Artificial Intelligence

Applying AI Algorithms to Big Data Governance: Use Cases and Future Directions

This article presents Datacake's experience of integrating AI algorithms into big data governance, covering the bidirectional relationship between AI and big data, health‑score assessment of data tasks, intelligent Spark parameter tuning, SQL engine selection, and future application scenarios across the data lifecycle.

AIBig DataSpark
0 likes · 18 min read
Applying AI Algorithms to Big Data Governance: Use Cases and Future Directions
政采云技术
政采云技术
Mar 7, 2023 · Databases

Data Warehouse Modeling: Concepts, Methods, and Implementation

This article explains what data models are, why model refactoring is necessary, compares normalized and dimensional data warehouse modeling approaches, and details a three‑step implementation process—including information research, model design, and model deployment—while highlighting best‑practice naming conventions and practical examples.

Big DataData WarehouseDatabase Design
0 likes · 14 min read
Data Warehouse Modeling: Concepts, Methods, and Implementation
Baidu Geek Talk
Baidu Geek Talk
Mar 6, 2023 · Big Data

Accelerating Data Production and Consumption in Baidu's Performance Platform

Baidu's Performance Platform speeds data production and consumption by adopting a unified stream‑batch architecture with TM and Spark, leveraging the Turing warehouse, introducing tiered service grading, robust governance and compliance measures, and offering self‑service analytics, cutting latency from minutes or days to milliseconds while handling billions of daily records and boosting SLA adherence, data accuracy, and user satisfaction.

Big DataData engineeringReal-time Processing
0 likes · 12 min read
Accelerating Data Production and Consumption in Baidu's Performance Platform
Architects Research Society
Architects Research Society
Mar 5, 2023 · Big Data

Best Open‑Source and Commercial ETL Tools: Detailed Comparison

This article introduces the concept of ETL, explains its importance for modern data‑driven applications, and provides a comprehensive comparison of the most popular open‑source and commercial ETL platforms—including their key features, supported data sources, and deployment options—helping readers choose the right tool for their data integration needs.

Big DataData IntegrationData Warehouse
0 likes · 19 min read
Best Open‑Source and Commercial ETL Tools: Detailed Comparison
DataFunSummit
DataFunSummit
Mar 3, 2023 · Artificial Intelligence

Intelligent Risk Control System Architecture and Development Trends

This article introduces the architecture of intelligent risk control, detailing its four-layer structure, the underlying data, feature, model, and decision components, platform interactions, and future development trends, highlighting how AI and big data enhance risk management efficiency and accuracy.

Big DataDecision SystemsFeature Engineering
0 likes · 12 min read
Intelligent Risk Control System Architecture and Development Trends
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 3, 2023 · Big Data

How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance

This article outlines Alibaba Cloud EMR's three‑stage evolution—compatibility, contribution, and beyond open source—detailing its early Hadoop adoption, Flink and Spark innovations, cloud‑native optimizations, and enterprise‑grade features such as Remote Shuffle Service, performance benchmarks, and integrated diagnostics.

Alibaba CloudBig DataEMR
0 likes · 13 min read
How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance
DataFunSummit
DataFunSummit
Mar 2, 2023 · Big Data

Huya's Data Self‑Service Product: Challenges, Design, and Practice

The article presents Huya's data‑self‑service product, describing the problems of traditional data services, the principles of a good data service, the MVP implementation, architectural components, project outcomes, and future evolution, while also addressing common Q&A scenarios.

Big DataData ProductData engineering
0 likes · 12 min read
Huya's Data Self‑Service Product: Challenges, Design, and Practice
Programmer DD
Programmer DD
Mar 2, 2023 · Backend Development

Why DolphinScheduler Is the Next Powerhouse for Distributed Task Management

DolphinScheduler is an open‑source distributed task scheduling system that supports multiple task types, offers visual workflow orchestration and monitoring, and scales to thousands of servers, making it a robust solution for backend and big‑data processing scenarios.

Big DataDistributed SchedulingDolphinScheduler
0 likes · 4 min read
Why DolphinScheduler Is the Next Powerhouse for Distributed Task Management
DataFunTalk
DataFunTalk
Mar 2, 2023 · Artificial Intelligence

DataFun Summit 2023 – Knowledge Graph Online Summit

DataFun Summit 2023’s Knowledge Graph Online Summit, held on March 18, brings together leading experts from academia and industry to present six forums covering unified knowledge representation, large‑scale graph construction, massive knowledge storage, KG‑based QA, KG‑AIGC integration, and best‑practice industry applications, with free live streaming registration via QR code.

AIBig DataDataFun
0 likes · 36 min read
DataFun Summit 2023 – Knowledge Graph Online Summit
DataFunSummit
DataFunSummit
Mar 1, 2023 · Big Data

Data Governance: Challenges, Framework, and Implementation Practices

This article explains the problems that data governance addresses, outlines a comprehensive governance framework—including system architecture, processes, and policies—and describes practical implementation steps such as integrated tooling, standardized modeling, metadata management, lake‑in and lake‑out governance, and organizational structures for sustainable data management.

Big DataGovernance Frameworkmetadata management
0 likes · 12 min read
Data Governance: Challenges, Framework, and Implementation Practices
DataFunTalk
DataFunTalk
Mar 1, 2023 · Databases

Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0

This article details the evolution of Tencent Music's content library data platform from version 1.0 to 4.0, describing business requirements, architectural redesigns—including migration from ClickHouse to Apache Doris, introduction of a semantic layer, and extensive write, query, and cost optimizations—while sharing practical lessons and future directions.

Apache DorisBig DataData Warehouse
0 likes · 21 min read
Evolution and Optimization of Tencent Music Content Library Data Platform: From Architecture 1.0 to 4.0
macrozheng
macrozheng
Feb 28, 2023 · Big Data

How Tencent Music Scaled Its Content Data Platform with Apache Doris: From ClickHouse to 4.0 Architecture

This article details the evolution of Tencent Music's content data platform from version 1.0 to 4.0, describing the migration from ClickHouse to Apache Doris, the introduction of a semantic layer, optimization of data ingestion, query performance, and cost reduction strategies that dramatically improved data timeliness, operational efficiency, and storage costs.

Apache DorisBig DataData Architecture
0 likes · 23 min read
How Tencent Music Scaled Its Content Data Platform with Apache Doris: From ClickHouse to 4.0 Architecture
DataFunTalk
DataFunTalk
Feb 27, 2023 · Big Data

Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks

This article provides a detailed overview of data middle platform concepts, describing a decoupled six‑subsystem architecture—including storage, collection, processing, governance, security, and operation frameworks—while illustrating typical enterprise implementations, industry‑specific solutions, and best‑practice considerations for building scalable, secure, and value‑driven data platforms.

Big DataData IntegrationData Platform
0 likes · 25 min read
Comprehensive Overview of Data Middle Platform Architecture and Its Core Frameworks
Programmer DD
Programmer DD
Feb 27, 2023 · Big Data

Why Hadoop/Spark Feel Heavy and How SPL Offers a Lightweight Big Data Solution

With data volumes soaring, traditional Hadoop and Spark clusters become costly and cumbersome for small to medium workloads, prompting many to seek lighter alternatives; this article examines the technical, operational, and financial burdens of Hadoop/Spark and introduces the open‑source SPL engine as a fast, low‑cost, easy‑to‑use big‑data solution.

Big DataHadoopSpark
0 likes · 16 min read
Why Hadoop/Spark Feel Heavy and How SPL Offers a Lightweight Big Data Solution

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

This article details how NetEase Yanxuan transformed its data platform from a dual Lambda architecture to a unified batch‑stream solution built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and Delta Lake, implementation of stream‑batch pipelines, message ordering fixes, snapshot generation, and extensive table‑governance optimizations.

Apache FlinkApache SparkBatch-Stream Integration
0 likes · 14 min read
How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration
DataFunTalk
DataFunTalk
Feb 26, 2023 · Big Data

Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform

This article presents an in‑depth overview of DataLeap's data lineage capabilities, covering the challenges, multi‑layer model design, implementation with Apache Atlas and JanusGraph, performance optimizations, diverse use cases across asset, development, governance and security domains, and future trends for lineage technology.

Apache AtlasBig DataData Platform
0 likes · 19 min read
Design, Optimization, and Use Cases of Data Lineage in ByteDance's DataLeap Platform
21CTO
21CTO
Feb 25, 2023 · Big Data

Which IT Skills Earn Over $140K? 2023’s Top-Paying Tech Expertise Revealed

Based on Dice’s 2023 Tech Salary Report, the article lists the ten highest‑earning IT skill sets in the U.S., detailing average salaries—often exceeding $140,000—and explains why expertise in areas such as containers, Kubernetes, PaaS, Redis, Teradata, Kafka, Elasticsearch, and Go commands premium pay.

2023Big DataCloud Computing
0 likes · 10 min read
Which IT Skills Earn Over $140K? 2023’s Top-Paying Tech Expertise Revealed
DataFunTalk
DataFunTalk
Feb 25, 2023 · Big Data

T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices

This article details T3 Travel’s exploration of the Modern Data Stack, describing its four‑point overview, business scenarios, the initial MDS implementation using Apache Hudi and Kyuubi, and the design of a feature platform that integrates Metricflow, Feast, and other components to support data processing, analytics, and machine‑learning workflows.

Apache HudiBig DataData Lake
0 likes · 22 min read
T3 Travel’s Modern Data Stack and Feature Platform: Architecture and Practices
DeWu Technology
DeWu Technology
Feb 24, 2023 · Big Data

Real-Time Data Architecture Evolution for a Complex Supply Chain

The article traces Dewu’s supply‑chain data platform from slow MySQL reporting through early CDC‑based wide tables to a Flink‑Kafka‑ClickHouse 1.0 design, then to a more scalable Flink‑Kafka‑Hologres 2.0 architecture that solves upsert and compute‑storage separation, while detailing key operational tricks, code‑generation tools, and future plans for lake‑house integration.

Big DataClickHouseFlink
0 likes · 10 min read
Real-Time Data Architecture Evolution for a Complex Supply Chain
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 24, 2023 · Big Data

What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?

MPP (Massive Parallel Processing) databases, designed for large‑scale analytical workloads, use distributed, shared‑nothing architectures with multiple control and compute nodes, offering high scalability, diverse data‑sharding strategies, and powerful SQL compatibility, as illustrated by vendors like Teradata, Vertica, Greenplum, and emerging open‑source solutions.

Big DataDistributed computingGreenplum
0 likes · 15 min read
What Makes MPP Databases the Powerhouse Behind Modern Data Analytics?
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 24, 2023 · Big Data

Common Flink Task Submission Issues and Solutions on YARN

This article compiles frequent Flink job submission problems on YARN—including WordCount jar errors, HBase dependency conflicts, MySQL timeout, checkpoint restoration failures, parallelism limits, and unexpected container termination—provides root‑cause analysis and step‑by‑step remediation instructions.

Big DataCheckpointFlink
0 likes · 21 min read
Common Flink Task Submission Issues and Solutions on YARN
DataFunTalk
DataFunTalk
Feb 21, 2023 · Databases

Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB

This article details how SelectDB’s data technology architect designed and implemented a new stream‑batch unified data platform using Apache Doris, covering the shortcomings of the early CDH‑based architecture, the selection process, data modeling, ingestion pipelines, performance testing, operational optimizations, and future plans.

Apache DorisBatch processingBig Data
0 likes · 17 min read
Building a Stream‑Batch Integrated Data Architecture with Apache Doris at SelectDB
ITPUB
ITPUB
Feb 20, 2023 · Databases

Why Teradata Is Leaving China and What It Means for the Domestic Data Warehouse Market

Teradata's withdrawal from China, driven by geopolitical tensions and the rise of mature domestic data‑warehouse solutions, prompts a detailed look at its MPP architecture, the three main Chinese warehouse designs, Gartner market positioning, and migration tools for alternatives like GBase 8a and GaussDB DWS.

Big DataData WarehouseGBase
0 likes · 9 min read
Why Teradata Is Leaving China and What It Means for the Domestic Data Warehouse Market
DataFunSummit
DataFunSummit
Feb 20, 2023 · Product Management

Evaluating the Value of Data Products: Scenarios, Frameworks, and Improvement Methods

This article explains why data product value assessment is essential, outlines common usage scenarios and a DBA evaluation framework, describes quantitative methods such as usage, business, and data‑driven metrics, and offers practical ways to enhance data product value through metric optimization, high‑value direction selection, and resource allocation.

Big DataData ProductMetrics
0 likes · 13 min read
Evaluating the Value of Data Products: Scenarios, Frameworks, and Improvement Methods
DataFunTalk
DataFunTalk
Feb 20, 2023 · Big Data

Understanding Data Lakes and Their Application at iQIYI: Concepts, Scenarios, and Iceberg Implementation

This article explains the definition of data lakes (public‑cloud and non‑public‑cloud), outlines their key characteristics, presents three typical business scenarios—real‑time event analysis, change‑data analysis, and stream‑batch integration—summarizes required product features, evaluates open‑source lake formats, and details iQIYI's adoption of Apache Iceberg across multiple services to achieve low‑latency, large‑scale, cost‑effective analytics.

Big DataData LakeIceberg
0 likes · 23 min read
Understanding Data Lakes and Their Application at iQIYI: Concepts, Scenarios, and Iceberg Implementation
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 20, 2023 · Big Data

How Alibaba’s DataWorks Transforms Data Governance for Efficiency, Security, and Cost Savings

This article explores Alibaba's DataWorks platform and its comprehensive data governance practices, covering application efficiency, security controls, cost optimization, organizational structure, and cultural initiatives that together enable scalable, secure, and cost‑effective data management across the enterprise.

Big DataDataWorkscost optimization
0 likes · 31 min read
How Alibaba’s DataWorks Transforms Data Governance for Efficiency, Security, and Cost Savings
DataFunTalk
DataFunTalk
Feb 18, 2023 · Big Data

Xiaomi Data Governance Evolution: Cost Governance Practices for HDFS and HBase

The article outlines Xiaomi's data governance journey, focusing on storage‑service cost governance, describing the transition from simple cost‑centered governance to big‑data‑driven asset management, and detailing concrete HDFS and HBase practices that achieved significant resource and cost reductions.

Big DataHBaseHDFS
0 likes · 15 min read
Xiaomi Data Governance Evolution: Cost Governance Practices for HDFS and HBase
DataFunSummit
DataFunSummit
Feb 17, 2023 · Big Data

Data Governance Practices and Platform Construction with Alibaba DataWorks

Alibaba’s DataWorks team shares extensive experiences in building and operating a large‑scale data platform, covering data governance across stages—from data stability and quality to security, cost control, and organizational culture—illustrating how systematic practices and tools drive efficiency, reliability, and value for enterprises.

Big DataData Platformcost optimization
0 likes · 55 min read
Data Governance Practices and Platform Construction with Alibaba DataWorks
DataFunTalk
DataFunTalk
Feb 17, 2023 · Big Data

Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics

This article describes how Tencent's Alluxio-based Data Orchestration Platform (DOP) was applied to financial analytics, detailing the business background, challenges of large‑scale OLAP workloads, the Alluxio architecture and usage modes, performance results, and the series of optimizations and tuning performed to achieve significant speedups.

AlluxioBig DataData Orchestration
0 likes · 15 min read
Tencent Alluxio (DOP) Deployment and Optimization in Financial Data Analytics
Tencent Advertising Technology
Tencent Advertising Technology
Feb 17, 2023 · Big Data

Cost Optimization and Mixed‑Resource Deployment in Tencent's Taiji Machine Learning Platform

The article details how Tencent's Taiji machine‑learning platform reduces training costs and improves efficiency for large‑scale advertising models by leveraging cloud‑native mixed‑resource strategies—including online idle, offline elastic, and compute‑resource sharing—while maintaining high service stability through advanced scheduling, fault‑tolerance, and resource‑prediction techniques.

Big DataMachine Learning PlatformTencent
0 likes · 16 min read
Cost Optimization and Mixed‑Resource Deployment in Tencent's Taiji Machine Learning Platform
DataFunSummit
DataFunSummit
Feb 16, 2023 · Artificial Intelligence

Curated Collection of Articles on AI‑Powered Smart Medicine

This guide introduces the challenges in healthcare, explains how artificial intelligence is already reshaping the field, and provides a curated list of recent articles on smart medicine for readers to explore the emerging AI‑healthcare integration.

AIBig DataHealthcare
0 likes · 4 min read
Curated Collection of Articles on AI‑Powered Smart Medicine
DataFunSummit
DataFunSummit
Feb 16, 2023 · Big Data

JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations

This article summarizes JD's real‑time data product practice, covering product overview, low‑code real‑time platform construction, stream‑batch integrated architecture, and the three‑layer operational defense model, while highlighting challenges, evolution, user distribution, and future directions.

Big DataLow‑code platformReal-time Data
0 likes · 13 min read
JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations
Kuaishou Big Data
Kuaishou Big Data
Feb 15, 2023 · Big Data

Kuaishou’s Data Application Factory: Boosting BI with Low‑Code & Unified Queries

This article details how Kuaishou’s Data Application Factory tackles the challenges of rapid BI delivery, data accuracy, and service stability by leveraging low‑code development, unified query services, standardized configurations, and service isolation to achieve efficient, high‑quality data products across multiple business lines.

BIBig DataLow‑code
0 likes · 16 min read
Kuaishou’s Data Application Factory: Boosting BI with Low‑Code & Unified Queries
Alimama Tech
Alimama Tech
Feb 15, 2023 · Big Data

Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview

Dolphin, Alibaba’s hyper‑converged multi‑modal big‑data engine, unifies OLAP, AI, streaming, and batch workloads on a decoupled compute‑storage MPP foundation, offering a Dolphin SQL layer, advanced bitmap/GroupTable/AFile indexes, intelligent materialization, and one‑write‑multiple‑read storage that cuts costs over 70% while delivering sub‑millisecond queries on trillion‑row datasets.

AIBig DataOLAP
0 likes · 14 min read
Dolphin: Alibaba's Hyper‑Converged Multi‑Modal Big Data Engine Overview
DataFunTalk
DataFunTalk
Feb 15, 2023 · Big Data

Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training

This article summarizes how Ant Group introduced Alluxio to address storage I/O, capacity, and latency challenges in large‑scale model training, detailing stability improvements through worker‑register follower and master migration, performance gains via follower‑only reads, and horizontal scaling using metadata sharding and multi‑cluster deployment.

AlluxioBig DataPerformance Optimization
0 likes · 15 min read
Alluxio Deployment at Ant Group: Stability Building, Performance Optimization, and Scale‑up for Large‑Scale Model Training
ByteDance Data Platform
ByteDance Data Platform
Feb 15, 2023 · Databases

How ByteHouse Powers Real‑Time Data Warehousing at Scale

ByteHouse, a cloud‑native data warehouse built on ClickHouse, delivers ultra‑fast real‑time and massive offline analytics with elastic scaling, addressing business needs in ByteDance and the financial sector through optimized architecture, ROI‑driven monitoring, and comprehensive operational tools.

Big DataByteHouseClickHouse
0 likes · 16 min read
How ByteHouse Powers Real‑Time Data Warehousing at Scale
Data Thinking Notes
Data Thinking Notes
Feb 14, 2023 · Big Data

How Cloud Music Turned 60k Tables into Valuable Data Assets

This article details Cloud Music's year‑long data assetization journey, covering the background, practical achievements, governance methods, and future roadmap for turning massive data warehouses into high‑value, well‑governed assets that drive cost reduction and business insight.

Big DataData PlatformData Warehouse
0 likes · 10 min read
How Cloud Music Turned 60k Tables into Valuable Data Assets
Alibaba Terminal Technology
Alibaba Terminal Technology
Feb 14, 2023 · Artificial Intelligence

How ChatGPT Is Reshaping Front‑End Development and Data Engineering

This article reflects on the rapid rise of ChatGPT, reviews key AI concepts and high‑quality external resources, analyzes its current limitations, and explores how the technology is transforming front‑end development, big‑data workflows, and engineers' daily practices, offering practical advice for adapting to the AI‑driven future.

Big Dataproductivity
0 likes · 18 min read
How ChatGPT Is Reshaping Front‑End Development and Data Engineering
DataFunTalk
DataFunTalk
Feb 12, 2023 · Big Data

Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache

This article presents a comprehensive technical overview of Bilibili's Presto cluster architecture, the challenges of query performance on Hadoop, and the systematic optimizations—including Alluxio integration, local cache mechanisms, multi‑active coordinators, label‑based scheduling, and real‑time penalties—that together improve availability, stability, and latency for large‑scale analytics workloads.

AlluxioBig DataCache
0 likes · 23 min read
Optimizing Bilibili Presto Cluster Query Performance with Alluxio and Local Cache
Sohu Tech Products
Sohu Tech Products
Feb 8, 2023 · Big Data

Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model

This article presents a comprehensive H5 (HTML5) tracking solution that details the planning of event‑collection points, the full data‑warehouse modeling process—including schema design, retention calculations, and SQL implementations—and the automatic data‑capture mechanisms needed to improve user‑behavior analysis efficiency across the product lifecycle.

Big DataData WarehouseH5 analytics
0 likes · 17 min read
Design and Implementation of a General H5 User Behavior Tracking and Data Warehouse Model
Architects' Tech Alliance
Architects' Tech Alliance
Feb 8, 2023 · Artificial Intelligence

Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios

This article provides a comprehensive overview of Computing‑in‑Memory technology, covering its definition, historical evolution, performance advantages over traditional von Neumann architectures, various technical classifications, storage‑media choices, market drivers, and its pivotal role in AI and big‑data workloads across edge, cloud and automotive domains.

AI accelerationBig Datacomputing-in-memory
0 likes · 17 min read
Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios
DataFunSummit
DataFunSummit
Feb 8, 2023 · Product Management

Content‑Driven Data Product Management: Challenges, Governance Frameworks, and Implementation Strategies

This article shares practical insights from a data product expert on the problems faced by content‑oriented data products, outlines a comprehensive governance methodology—including DAMA, Huawei, and Alibaba frameworks—and demonstrates how to operationalize these ideas through concrete examples such as event‑tracking and metric governance.

Big DataData Product ManagementMethodology
0 likes · 16 min read
Content‑Driven Data Product Management: Challenges, Governance Frameworks, and Implementation Strategies
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Feb 8, 2023 · Big Data

Why MapReduce and Spark Still Matter: A Deep Dive into Distributed Computing

Distributed computing splits massive tasks across multiple servers, and this article explains the classic MapReduce batch engine and the modern Spark framework, covering their architectures, strengths, limitations, and evolution, while highlighting key features like fault tolerance, in‑memory processing, and real‑time streaming capabilities.

Big DataDistributed computingMapReduce
0 likes · 12 min read
Why MapReduce and Spark Still Matter: A Deep Dive into Distributed Computing
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 8, 2023 · Big Data

How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms

This article summarizes Alibaba Cloud senior product expert He Yuan's presentation on EMR 2.0, outlining the challenges of open‑source big data, the evolution of EMR, and the new features—including cloud‑native architecture, enhanced performance, diverse resource models, and expanded analysis scenarios—aimed at reducing cost and complexity.

Alibaba CloudBig DataData Lake
0 likes · 11 min read
How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms
Youzan Coder
Youzan Coder
Feb 7, 2023 · Big Data

Automated Offline Data Cost Optimization in Youzan's Data Platform

Youzan built an automated offline data cost‑optimization platform that gathers accurate metadata, mines unused or failing tables and tasks, and safely decommissions them through a backend‑frontend workflow with owner validation, notifications, rollback safeguards, and plans to extend lineage coverage and real‑time asset handling.

Big DataCost ReductionPipeline Automation
0 likes · 11 min read
Automated Offline Data Cost Optimization in Youzan's Data Platform
Data Thinking Notes
Data Thinking Notes
Feb 6, 2023 · Big Data

How Tencent Tackles Data Governance Challenges with the WeData Platform

This article outlines Tencent's data governance challenges, its internal three‑stage practice, detailed case studies such as Tencent News and PCG cost governance, and introduces the WeData platform's architecture and tools for standardization, quality, security, and metadata management, concluding with a Q&A session.

Big DataData PlatformTencent
0 likes · 17 min read
How Tencent Tackles Data Governance Challenges with the WeData Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 6, 2023 · Big Data

Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations

This article presents an in‑depth overview of real‑time data‑warehouse scenarios, discusses challenges such as timeliness, update efficiency, and resource consumption, and details practical solutions using Apache Hudi, Flink, Presto, and related optimizations for ingestion, indexing, compaction, and query performance.

Big DataData LakeFlink
0 likes · 17 min read
Real-Time Data Warehouse Solutions with Hudi: Scenarios, Challenges, and Optimizations
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 4, 2023 · Big Data

Apache Linkis Graduates to Top-Level Project – Overview, Core Features, Roadmap, and Ecosystem

The article announces Apache Linkis’s graduation to an Apache top‑level project, explains its role as a computing middleware linking applications to engines like Spark, Hive, and Flink, details its core capabilities, roadmap, ecosystem integrations, and provides official resources for the community.

ApacheBig DataComputing Middleware
0 likes · 8 min read
Apache Linkis Graduates to Top-Level Project – Overview, Core Features, Roadmap, and Ecosystem
DataFunTalk
DataFunTalk
Feb 4, 2023 · Big Data

Design and Practice of Tencent Lighthouse Fusion Analysis Engine

This article presents the design and implementation of Tencent Lighthouse's Fusion Analysis Engine, covering its background, challenges, fusion architecture, kernel optimizations, acceleration techniques, practical outcomes, and future evolution directions for high‑performance data access.

Big DataFusion EngineLighthouse
0 likes · 12 min read
Design and Practice of Tencent Lighthouse Fusion Analysis Engine
Kuaishou Big Data
Kuaishou Big Data
Feb 3, 2023 · Big Data

Inside Kuaishou’s Company‑Wide Metric Platform: Architecture, Lessons & Best Practices

This article details Kuaishou’s three‑year evolution of its metric middle platform, covering the data infrastructure, key challenges of data inconsistency and low analysis efficiency, the enterprise‑level OneMetric solution, architectural design, development phases, practical lessons, system implementation, and real‑world applications.

Big DataData engineeringKuaishou
0 likes · 23 min read
Inside Kuaishou’s Company‑Wide Metric Platform: Architecture, Lessons & Best Practices
DataFunTalk
DataFunTalk
Feb 2, 2023 · Big Data

SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap

This article provides a comprehensive overview of Apache SeaTunnel, covering its design objectives, current capabilities such as multi‑engine support and extensive connector ecosystem, detailed architecture including engine‑independent APIs and execution flows, and outlines the upcoming roadmap to expand connectors, launch a visual web UI, and introduce a dedicated SeaTunnel Engine.

ApacheBatch processingBig Data
0 likes · 12 min read
SeaTunnel: Design Goals, Current Status, Architecture, and Future Roadmap
DataFunTalk
DataFunTalk
Jan 31, 2023 · Big Data

Tencent's Data Governance Practices and Technical Implementation

This article presents Tencent's comprehensive data governance framework, covering its definition, objectives, challenges, methodology, organizational structure, metadata management, data asset lifecycle, security measures, and technical implementation details such as microservice architecture, data collection, lineage analysis, and storage solutions.

Big DataTencentdata governance
0 likes · 19 min read
Tencent's Data Governance Practices and Technical Implementation
DataFunTalk
DataFunTalk
Jan 31, 2023 · Big Data

SPI Refactoring Practice in Apache InLong Manager to Reduce Maintenance Cost and Enhance Extensibility

This article presents the SPI-based refactoring of Apache InLong Manager, describing the project's background, existing maintenance challenges, the concept of Java Service Provider Interface, the concrete implementation steps, code restructuring, and the resulting benefits such as higher code reuse, easier extension, and reduced DDL changes.

Apache InLongBig DataCode Refactoring
0 likes · 10 min read
SPI Refactoring Practice in Apache InLong Manager to Reduce Maintenance Cost and Enhance Extensibility
Bilibili Tech
Bilibili Tech
Jan 31, 2023 · Big Data

Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System

Bilibili redesigned its real-time data-quality control platform by replacing per-rule Flink jobs with a unified, dynamically-configured architecture that classifies Kafka topics, aggregates via InfluxDB full-table and continuous queries, mitigates data inflation, adds a high-performance proxy, and implements robust monitoring and recovery to ensure scalable, reliable data quality for its big-data services.

Big DataDQCFlink
0 likes · 22 min read
Design and Optimization of Real-Time Data Quality Control (DQC) Platform on Bilibili's Big Data System
DataFunTalk
DataFunTalk
Jan 30, 2023 · Big Data

Data Governance Strategies: Principles, Practices, and Real‑World Case Studies

The article explains why data governance is essential for high‑quality data in big‑data organizations, outlines narrow and broad governance scopes, presents strategic principles, and shares eight detailed case studies from leading Chinese tech companies illustrating practical implementation and lessons learned.

Big Datadata governance
0 likes · 7 min read
Data Governance Strategies: Principles, Practices, and Real‑World Case Studies
Data Thinking Notes
Data Thinking Notes
Jan 29, 2023 · Big Data

How to Turn Data Assets into Business Value: A Roadmap for Enterprises

Enterprises must shift their perception of data assets and embed data‑value into every digital process, establishing governance, unified asset catalogs, operational metrics, security controls, integration, services, and visualization to transform raw data into strategic business outcomes.

Big DataData IntegrationData Security
0 likes · 12 min read
How to Turn Data Assets into Business Value: A Roadmap for Enterprises
DataFunSummit
DataFunSummit
Jan 29, 2023 · Big Data

Data Serviceization at JD: From Zero to One and Beyond

This article presents JD's data service platform, describing its origin, performance optimizations, flexible API generation, caching strategies, service orchestration, and governance, and includes a Q&A that addresses security, performance, and multi‑source data handling challenges.

APIBig DataCaching
0 likes · 11 min read
Data Serviceization at JD: From Zero to One and Beyond
DataFunTalk
DataFunTalk
Jan 28, 2023 · Big Data

Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design

This article explores the ongoing debate between data lakes and data warehouses, clarifies their distinct purposes and technologies, discusses how they can coexist or complement each other, and introduces the concept of an integrated lakehouse architecture while promoting a comprehensive data intelligence knowledge map.

Big DataData LakeData Warehouse
0 likes · 5 min read
Data Lake vs Data Warehouse: Differences, Evolution, and Integrated Lakehouse Design
DataFunSummit
DataFunSummit
Jan 27, 2023 · Big Data

Data Governance Strategies: Principles, Practices, and Case Studies

The article explains the importance of data governance, distinguishes narrow and broad governance, outlines strategic principles such as systemic engineering and prioritization, and presents eight case studies from leading Chinese tech companies illustrating practical implementations and effective strategies.

Big DataData Managementcase study
0 likes · 8 min read
Data Governance Strategies: Principles, Practices, and Case Studies
Tencent Cloud Developer
Tencent Cloud Developer
Jan 26, 2023 · Operations

Technical Article Digest: Operations, AI, Web3, Rust, Big Data, and More

This technical digest surveys Tencent’s health‑code operations architecture, dissects ChatGPT’s training pipeline, contrasts Web 2.0 and Web 3.0 on Ethereum, explains AI‑generated art, details WeChat’s overload controls and QQ Music’s high‑availability design, examines the rapid scaling of the “Sheep Sheep” mini‑game, introduces Rust for front‑end developers, showcases big‑data football prediction models, and outlines common C++ pitfalls and best‑practice recommendations.

Big DataC++Rust
0 likes · 7 min read
Technical Article Digest: Operations, AI, Web3, Rust, Big Data, and More
DataFunTalk
DataFunTalk
Jan 26, 2023 · Big Data

Tencent Data Governance Practices and the WeData Platform

This article outlines Tencent's data governance challenges, internal practices across three maturity stages, and introduces the WeData platform that provides comprehensive capabilities for data assetization, cost control, quality assurance, security, and metadata management to support large‑scale big‑data operations.

Big DataTencentWeData
0 likes · 15 min read
Tencent Data Governance Practices and the WeData Platform
DataFunTalk
DataFunTalk
Jan 26, 2023 · Big Data

Data Governance Strategies: Principles, Practices, and Real‑World Case Studies

This article explains why data is a company's most valuable asset, distinguishes narrow and broad data‑governance approaches, outlines strategic design principles, and presents eight detailed case studies from leading Chinese tech firms illustrating practical governance implementations and lessons learned.

Big Datadata governance
0 likes · 8 min read
Data Governance Strategies: Principles, Practices, and Real‑World Case Studies
DataFunSummit
DataFunSummit
Jan 23, 2023 · Big Data

Design and Practice of the 58 Agile BI System (Starfire)

This article presents a comprehensive overview of the 58 Agile BI platform called Starfire, covering its background, technical architecture, core permission and query engine challenges, MPP cache acceleration, visualization resource library, developer services, and future development directions.

BIBig DataClickHouse
0 likes · 13 min read
Design and Practice of the 58 Agile BI System (Starfire)
DataFunSummit
DataFunSummit
Jan 22, 2023 · Big Data

Applying Spark SQL at Ping An Insurance: Business Background, Deployment Choices, Migration Process, and Lessons Learned

This article details how Ping An Insurance migrated its offline Hive SQL workloads to Spark SQL, covering business background, deployment mode selection, migration workflow, typical challenges, optimization measures, and the resulting performance and resource utilization improvements.

Big DataCluster MigrationDeployment Modes
0 likes · 16 min read
Applying Spark SQL at Ping An Insurance: Business Background, Deployment Choices, Migration Process, and Lessons Learned
DataFunSummit
DataFunSummit
Jan 21, 2023 · Big Data

Building and Evolving Data Management Systems: From IT to DT Era, Standards, Models, and Marketization

This article outlines the evolution of data management in the big‑data era, covering the history of the industry, key governance frameworks such as DMBOK, DCMM and DMM, the steps to construct a data‑management system, the requirements for a data‑factor market, and an introduction to the DataEasy company and its services.

Big DataDCMMDMBOK
0 likes · 15 min read
Building and Evolving Data Management Systems: From IT to DT Era, Standards, Models, and Marketization
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
DataFunTalk
DataFunTalk
Jan 19, 2023 · Big Data

Tencent Alluxio: Accelerating the Next Generation of Big Data and AI

This article presents a comprehensive overview of Tencent's Alluxio project, covering the evolution of big‑data architecture, recent Alluxio research progress, typical deployment cases, and future work, while highlighting performance improvements, integration with cloud and AI workloads, and community contributions.

AIAlluxioBig Data
0 likes · 21 min read
Tencent Alluxio: Accelerating the Next Generation of Big Data and AI
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Jan 17, 2023 · Big Data

How NetEase Cloud Music Cut Data Pipeline Delays by 60% with Full‑Link Baseline Governance

This case study details NetEase Cloud Music's full‑link baseline governance initiative, outlining the challenges of massive data pipelines, the metrics used to measure success, the three‑pronged action plan (infrastructure, task optimization, and standards), and the resulting improvements in availability, resource utilization, and monitoring accuracy.

Big Databaseline governancedata ops
0 likes · 11 min read
How NetEase Cloud Music Cut Data Pipeline Delays by 60% with Full‑Link Baseline Governance
Huolala Tech
Huolala Tech
Jan 16, 2023 · Big Data

How Leading Logistics Companies Master Data Governance for Cost and Stability

At the 2022 DataFun Summit, data governance experts from Huolala, Zhongtong, and SF Express shared comprehensive practices—including governance drivers, quality monitoring, model management, master data processes, platform architecture, cost control, and stability measures—illustrating how large logistics firms implement end‑to‑end data governance to boost efficiency, compliance, and business value.

Big DataCost ManagementData Quality
0 likes · 13 min read
How Leading Logistics Companies Master Data Governance for Cost and Stability
JD Tech
JD Tech
Jan 13, 2023 · Big Data

UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements

This article introduces the UData platform, explains its data‑integration architecture, details the StarRocks‑based query engine workflow from SQL parsing to distributed execution, and describes recent optimizations such as computation push‑down, support for JSF/HTTP/ClickHouse external tables, and a proxy‑based federated query framework.

Big DataData IntegrationQuery Engine
0 likes · 20 min read
UData: Solving the Last Mile of Data Usage – Architecture, Query Engine Design, and Federated Query Enhancements
DataFunSummit
DataFunSummit
Jan 12, 2023 · Big Data

Data Governance Strategies: Systemic Engineering and Practical Cases from Leading Companies

This article explains the importance of data governance, distinguishes narrow and broad governance, outlines its systemic and selective nature, and presents eight practical case studies from companies like Tencent, NetEase, and MobTech, offering actionable strategies for high‑quality data across its lifecycle.

Big DataData ManagementEnterprise Strategy
0 likes · 8 min read
Data Governance Strategies: Systemic Engineering and Practical Cases from Leading Companies
DataFunSummit
DataFunSummit
Jan 12, 2023 · Big Data

Industrial IoT Data Collection Platform: Neuron v2.0 Architecture, Design, and Case Studies

This article presents a comprehensive overview of EMQ's Neuron industrial IoT data collection platform, detailing the lessons learned from version 1.x, the redesigned v2.0 architecture, core modules, plugin mechanisms, data‑tag management, eKuiper integration, and two real‑world case studies in oil‑field and smart‑factory environments.

Big DataData CollectionIoT
0 likes · 16 min read
Industrial IoT Data Collection Platform: Neuron v2.0 Architecture, Design, and Case Studies
Ctrip Technology
Ctrip Technology
Jan 12, 2023 · Big Data

Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0

This article details the evolution of Ctrip's log infrastructure, describing the shift from fragmented departmental logging to a unified Elasticsearch-based platform, the migration to ClickHouse for cost‑effective, high‑performance storage, and the subsequent Log 3.0 redesign that leverages Kubernetes, sharding, and a unified query governance layer to handle petabyte‑scale data.

Big DataClickHouseETL
0 likes · 16 min read
Evolution of Ctrip's Log System: From Elasticsearch to ClickHouse and Log 3.0
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 12, 2023 · Operations

What Is DataOps and How Can It Transform Your Data Management?

DataOps, the data‑centric counterpart of DevOps, combines agile principles, standardized tools, and cross‑team collaboration to manage the full data lifecycle—from integration and development to storage, governance, and service—enabling organizations to handle massive, diverse datasets efficiently, reduce silos, and turn data into actionable value.

Big DataData IntegrationData Management
0 likes · 15 min read
What Is DataOps and How Can It Transform Your Data Management?
vivo Internet Technology
vivo Internet Technology
Jan 11, 2023 · Cloud Native

Practices of Distributed Message Middleware at vivo: From RocketMQ to Kafka and Pulsar

vivo’s Internet Storage team details how it operates RocketMQ for low‑latency online services and Kafka for massive big‑data pipelines, outlines resource isolation, traffic balancing, intelligent throttling, and governance practices, and describes its migration from RabbitMQ and planned shift from Kafka to cloud‑native Pulsar.

Big DataKafkaMessage Middleware
0 likes · 22 min read
Practices of Distributed Message Middleware at vivo: From RocketMQ to Kafka and Pulsar
Data Thinking Notes
Data Thinking Notes
Jan 10, 2023 · Big Data

How Bilibili Built a Scalable Data Quality Platform for Billions of Events

This article describes Bilibili’s data quality platform, outlining its background, objectives, theoretical models, workflow stages (recording, checking, alerting), DSL for metrics, root‑cause analysis, scheduling strategies, heterogeneous source integration, rule coverage, intelligent monitoring, and future plans to achieve automated, real‑time, high‑reliability data assurance for massive daily workloads.

Big DataData QualityRoot Cause Analysis
0 likes · 21 min read
How Bilibili Built a Scalable Data Quality Platform for Billions of Events
dbaplus Community
dbaplus Community
Jan 10, 2023 · Big Data

Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips

This article introduces OLAP concepts, compares major OLAP solutions such as Druid, Kylin, Doris, and ClickHouse, outlines their features and suitable scenarios, and shares practical optimization techniques—including materialized views, caching, node tiering, and query tuning—to improve performance for high‑concurrency analytical workloads.

Big DataClickHouseData Warehouse
0 likes · 16 min read
Choosing the Right OLAP Engine: Druid vs ClickHouse and Optimization Tips