Tagged articles
3697 articles
Page 20 of 37
DevOps
DevOps
Nov 16, 2021 · Operations

Key Strategies and Recommendations for Successful Enterprise Digital Transformation

The article outlines how enterprises can assess digital transformation outcomes, formulate effective strategies, build large‑scale capabilities, foster agile culture, and continuously monitor progress, drawing on McKinsey research and real‑world examples to guide traditional firms toward sustainable digital growth.

Big DataDigital TransformationEnterprise Strategy
0 likes · 17 min read
Key Strategies and Recommendations for Successful Enterprise Digital Transformation
Big Data Technology Architecture
Big Data Technology Architecture
Nov 13, 2021 · Big Data

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

This article details Baicaowei's migration from an IDC‑hosted Hadoop cluster to a cloud‑native data lake on Alibaba Cloud, outlining the business drivers, pain points of the legacy platform, architectural goals, design principles, solution selection, implementation steps, and future outlook for the new big‑data ecosystem.

Alibaba CloudBig DataDelta Lake
0 likes · 16 min read
Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 9, 2021 · Fundamentals

Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”

This article presents a concise PPT‑style summary of Ma Xiaodong’s book “Digital Transformation Methodology”, outlining eight essential topics—why, when, what, whether, who, how, tools, and case studies of digital transformation—along with numerous illustrative slides and links to related big‑data resources.

Big DataCase StudiesDigital Transformation
0 likes · 5 min read
Eight Key Aspects of Digital Transformation – Summary of Ma Xiaodong’s “Digital Transformation Methodology”
21CTO
21CTO
Nov 8, 2021 · Big Data

How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons

Facing rapid business iteration, Baidu’s iFanFan data team designed a unified real‑time and offline big‑data platform, tackling business, technical, and organizational challenges through Lambda/Kappa architectures, data integration, storage, computation, governance, and scalable analytics to deliver timely, accurate, and valuable data products.

Big DataData ArchitectureData Warehouse
0 likes · 33 min read
How Baidu iFanFan Built a Real-Time Big Data Platform: Challenges & Lessons
DataFunSummit
DataFunSummit
Nov 8, 2021 · Big Data

Building JD's OLAP System: From Data Ingestion to Management and Future Plans

This article explains how JD.com designs and evolves its OLAP platform, covering data sources, ingestion, storage, real‑time and offline processing, key challenges such as timeliness, high throughput, consistency, and the solutions implemented to support massive e‑commerce analytics.

Big DataData WarehouseDistributed Systems
0 likes · 13 min read
Building JD's OLAP System: From Data Ingestion to Management and Future Plans
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 7, 2021 · Databases

Understanding Secondary Indexes and Coprocessor Solutions in HBase

This article explains the concept of secondary indexes in HBase, describes how coprocessors (including observers and endpoints) enable server‑side processing, compares coprocessor‑based solutions such as Apache Phoenix with non‑coprocessor approaches using Elasticsearch or Solr, and outlines their advantages and trade‑offs.

Big DataCoprocessorHBase
0 likes · 11 min read
Understanding Secondary Indexes and Coprocessor Solutions in HBase
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Nov 6, 2021 · Big Data

Why User Profiling Projects Fail: Common Pitfalls and Deep Causes

The article analyzes why user profiling initiatives frequently collapse, highlighting surface mistakes such as confusing past behavior with future predictions, mixing behavior with motivation, and mistaking correlation for causation, while also exposing deeper issues like unrealistic business expectations, over‑reliance on static tags, and insufficient predictive modeling and causal analysis.

Big DataBusiness IntelligenceData Analysis
0 likes · 9 min read
Why User Profiling Projects Fail: Common Pitfalls and Deep Causes
Architecture Digest
Architecture Digest
Nov 2, 2021 · Databases

Comparative Analysis of MySQL and HBase: Architecture, Engine, and Use Cases

This article compares MySQL and HBase across architecture, storage engine, indexing structures (B+ tree vs LSM tree), data access features, and ecosystem integration, highlighting each system's strengths, limitations, and the scenarios where HBase is a suitable complement to MySQL for large‑scale data workloads.

B+TreeBig DataDatabases
0 likes · 9 min read
Comparative Analysis of MySQL and HBase: Architecture, Engine, and Use Cases
21CTO
21CTO
Nov 1, 2021 · Big Data

Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master

This guide outlines the fast‑growing data engineering career path, covering essential Linux fundamentals, programming languages, testing, database concepts, data warehouses, processing frameworks, messaging systems, cluster computing, workflow scheduling, monitoring, infrastructure as code, and CI/CD tools.

Big DataData engineeringdata pipelines
0 likes · 5 min read
Essential Data Engineering Roadmap: Skills, Tools, and Technologies to Master
Kuaishou Big Data
Kuaishou Big Data
Oct 28, 2021 · Big Data

How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding

Kuaishou reduced half of its massive object storage expenses by redesigning its architecture to use HBase indexing, HDFS large‑file storage, MemoryCache, and a cross‑IDC LRC erasure‑coding warm layer that maintains disaster‑recovery while dynamically moving data from hot to warm to cold tiers.

Big DataKuaishouLRC
0 likes · 12 min read
How Kuaishou Cut Object Storage Costs by 50% with LRC Erasure Coding
DataFunTalk
DataFunTalk
Oct 27, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article explains how CITIC Bank's software development center built a data value system and management cockpit, detailing business objectives, overall architecture, digital management methodology, implementation steps, and real‑world usage to support the bank's digital transformation.

Big DataDigital Transformationbanking analytics
0 likes · 16 min read
Data Value System and Cockpit Construction: A Case Study from CITIC Bank
dbaplus Community
dbaplus Community
Oct 26, 2021 · Databases

Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching

JD.com’s customer service team leverages the open‑source MPP database Doris to power real‑time and offline OLAP dashboards, detailing data ingestion pipelines, full‑link monitoring, dual‑stream high‑availability design, dynamic partition management, multi‑layer caching strategies, and performance optimizations applied during the 2020 11.11 shopping festival.

Big DataDorisOLAP
0 likes · 15 min read
Scaling JD.com Customer Service with Doris OLAP: Architecture & Caching
DataFunSummit
DataFunSummit
Oct 26, 2021 · Big Data

Data Value System and Cockpit Construction: A Case Study from CITIC Bank

This article presents a comprehensive overview of CITIC Bank's data value system and cockpit construction, detailing business objectives, overall planning, digital management framework, methodology, implementation cases, and current usage, illustrating how data-driven analytics support the bank's digital transformation.

Big DataData CockpitData Value
0 likes · 17 min read
Data Value System and Cockpit Construction: A Case Study from CITIC Bank
High Availability Architecture
High Availability Architecture
Oct 25, 2021 · Big Data

iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application

The article details iQIYI's comprehensive data governance initiative for event tracking (Pingback), covering definitions, timing, quality requirements, governance challenges, standardized specifications, coordinate management, testing and gray‑release processes, upgrade workflows, and data security measures that together reduced event volume by 40% and cut resource consumption in half.

AnalyticsBig DataData Quality
0 likes · 16 min read
iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application
DataFunSummit
DataFunSummit
Oct 25, 2021 · Big Data

Building a Multi-Dimensional Analysis System: Practice at Baixin Bank

This talk by Baixin Bank's BI leader outlines the bank's business model, multi-dimensional data analysis requirements, and the design of a laddered analysis solution, including indicator and analysis system construction, user‑product‑enterprise scenario modeling, and productization of data insights for operational decision‑making.

Big DataBusiness IntelligenceData Analysis
0 likes · 20 min read
Building a Multi-Dimensional Analysis System: Practice at Baixin Bank
DataFunSummit
DataFunSummit
Oct 21, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing two coordinator high‑availability solutions, and explaining the cross‑cluster scheduling architecture that leverages idle Presto resources to improve overall big‑data processing efficiency.

Big DataCloud ComputingCross-Cluster Scheduling
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
dbaplus Community
dbaplus Community
Oct 20, 2021 · Big Data

How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP

JD's OLAP platform runs on ClickHouse and Doris across 3,000 servers, handling billions of daily queries and petabytes of data, and this article details the selection criteria, cluster deployment models, high‑availability architecture, operational challenges, and future roadmap.

Big DataClickHouseCluster Deployment
0 likes · 21 min read
How JD Achieves ClickHouse High‑Availability for Billion‑Scale OLAP
21CTO
21CTO
Oct 18, 2021 · Operations

What Emerging IT Roles Will Shape the Future of Tech?

The article surveys rapidly growing IT positions—from quantum computing engineers and security‑compliance managers to big‑data, analytics, and DataOps engineers—explaining how these roles combine advanced technologies, regulatory expertise, and operational practices to drive business transformation and meet the evolving demands of digital enterprises.

Big DataCloudOpsDataOps
0 likes · 9 min read
What Emerging IT Roles Will Shape the Future of Tech?
Java High-Performance Architecture
Java High-Performance Architecture
Oct 17, 2021 · Backend Development

How to Choose the Right Tech Stack: Lessons from a Java Backend Veteran

The author, a seasoned Java backend developer, shares personal experiences and insights on evaluating efficiency, ecosystem, and team dynamics when selecting technologies—from legacy frameworks and databases to modern big‑data tools like Spark and Flink—offering practical guidance for developers and teams navigating today’s rapidly evolving tech landscape.

Big DataTechnology Selectionsoftware engineering
0 likes · 11 min read
How to Choose the Right Tech Stack: Lessons from a Java Backend Veteran
DataFunSummit
DataFunSummit
Oct 16, 2021 · Databases

Practical Use Cases of Materialized Views and Indexes in Doris

This article shares practical experiences with Doris, covering materialized view concepts, typical use cases, index principles, performance optimizations, and real‑world scenarios such as order analysis, PV/UV aggregation, and detailed queries, while also providing operational tips and Q&A insights.

Big DataDorisOLAP
0 likes · 16 min read
Practical Use Cases of Materialized Views and Indexes in Doris
JD Retail Technology
JD Retail Technology
Oct 15, 2021 · Big Data

How JD’s Activity Cockpit Supercharges Mega‑Sale Performance with Optimize Table, BitMap, and Materialized Views

The article explains how JD’s Activity Cockpit tackles mega‑sale challenges by monitoring the consumer golden‑link, applying Optimize Table, BitMap, and materialized view techniques to reduce data volume, accelerate queries, and enable precise real‑time marketing for brands.

Big DataPerformance Optimizationbitmap indexing
0 likes · 6 min read
How JD’s Activity Cockpit Supercharges Mega‑Sale Performance with Optimize Table, BitMap, and Materialized Views
iQIYI Technical Product Team
iQIYI Technical Product Team
Oct 15, 2021 · Industry Insights

How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance

This article details iQIYI's comprehensive data‑governance practice for event tracking, covering the definition of pingback, the need for governance, the governance framework, coordinate management, gray‑data handling, and the upgrade process that reduced tracking volume by 40% while cutting resource consumption in half.

AnalyticsBig DataData Quality
0 likes · 17 min read
How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance
21CTO
21CTO
Oct 14, 2021 · Big Data

How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays

LinkedIn’s engineers detail how they repeatedly doubled their Hadoop cluster to over 11,000 nodes, tackled YARN scheduling delays caused by workload imbalances, and created the DynoYARN simulation tool to predict performance impacts of massive scaling.

Big DataDynoYARNHadoop
0 likes · 7 min read
How LinkedIn Scaled Hadoop to 11,000 Nodes and Solved YARN Delays
IT Xianyu
IT Xianyu
Oct 14, 2021 · Databases

Comparing MySQL and HBase: Architecture, Engine, and Application Scenarios

This article compares MySQL and HBase by examining their architectural designs, storage engines, data access patterns, and ecosystem features, highlighting the strengths and trade‑offs of each system and outlining the scenarios where HBase is a suitable complement to MySQL.

B+TreeBig DataHBase
0 likes · 5 min read
Comparing MySQL and HBase: Architecture, Engine, and Application Scenarios
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 13, 2021 · Big Data

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

This article examines the true meaning of consistency in stream computing, clarifies common misconceptions about exactly‑once semantics, formalizes consistency challenges, and reviews how major stream engines such as Google MillWheel, Apache Flink, Kafka Streams, and Spark Streaming implement end‑to‑end consistency.

Big DataExactly-OnceStream Processing
0 likes · 29 min read
Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing
Java High-Performance Architecture
Java High-Performance Architecture
Oct 12, 2021 · Big Data

Unpacking the Core Technologies Behind Modern Big Data Platforms

This article breaks down a typical big data platform architecture into its four layers—data collection, storage and analysis, sharing, and real‑time computation—detailing the essential tools such as Flume, HDFS, Hive, Spark, DataX, and task scheduling systems that enable scalable, low‑latency data processing and delivery.

Big DataData ArchitectureDataX
0 likes · 8 min read
Unpacking the Core Technologies Behind Modern Big Data Platforms
Architecture Digest
Architecture Digest
Oct 11, 2021 · Big Data

Core Technologies and Architecture of a Big Data Platform

This article explains the typical architecture of a big‑data platform, detailing its four core layers—data collection, storage & analysis, data sharing, and application—and describing the key technologies such as Flume, DataX, HDFS, Hive, Spark, Spark Streaming, and task scheduling components.

Big DataData ArchitectureDataX
0 likes · 8 min read
Core Technologies and Architecture of a Big Data Platform
DataFunTalk
DataFunTalk
Oct 7, 2021 · Big Data

Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios

This article introduces Impala's overall architecture, storage options, key features, concurrency mechanisms, CBO‑based join optimization techniques, storage‑layer principles and data‑filtering strategies, and summarizes practical performance‑tuning experiences from Tencent's financial big‑data platform.

Big DataCBOConcurrency
0 likes · 12 min read
Impala Architecture, Concurrency, CBO Join Optimization, and Storage Layer in Tencent Financial Big Data Scenarios
Architect
Architect
Oct 6, 2021 · Big Data

Design and Implementation of a Real-time and Offline Integrated Query System

This article details the requirements, architecture, and implementation of a real-time and offline integrated query system, covering data ingestion via Debezium and Confluent Platform, storage in Kudu and HDFS, query engines Presto and Kylin, and strategies for data synchronization, partitioning, and scaling.

Big DataData WarehouseDebezium
0 likes · 19 min read
Design and Implementation of a Real-time and Offline Integrated Query System
Architects' Tech Alliance
Architects' Tech Alliance
Oct 4, 2021 · Industry Insights

Key Technologies and Trends Powering Enterprise Digital Transformation

This article outlines the concept of enterprise digital transformation, detailing network evolution, platform‑centric infrastructure, business deconstruction, customer‑focused data value creation, and the importance of measurable value improvement as a core metric for successful digital change.

Artificial IntelligenceBig DataBlockchain
0 likes · 7 min read
Key Technologies and Trends Powering Enterprise Digital Transformation
DataFunTalk
DataFunTalk
Oct 2, 2021 · Artificial Intelligence

Baidu Data Federation Platform: Architecture, Applications, Federated Learning, and Explainability

This article presents an in‑depth overview of Baidu's Data Federation Platform, detailing its layered architecture, core technical capabilities, privacy‑preserving collaborative research on epidemic prediction and shared vehicle optimization, and explores federated learning types, PaddleFL implementations, and model explainability techniques.

Big Dataexplainabilityfederated learning
0 likes · 22 min read
Baidu Data Federation Platform: Architecture, Applications, Federated Learning, and Explainability
AntTech
AntTech
Sep 28, 2021 · Databases

GeaGraph: Large-Scale Graph Computing System Wins World Internet Conference Award

The Ant Group and Tsinghua University’s jointly developed large‑scale graph computing system GeaGraph, recognized at the 2021 World Internet Conference, showcases world‑leading performance in trillion‑edge graph queries and exemplifies successful industry‑academia‑research collaboration for advanced database technology.

Big DataGeaGraphIndustry-Academia Collaboration
0 likes · 8 min read
GeaGraph: Large-Scale Graph Computing System Wins World Internet Conference Award
21CTO
21CTO
Sep 27, 2021 · Big Data

Tech Highlights: China Crypto Ban, Huawei’s New Language, Kafka 3.0

A roundup of recent tech news covering China's crackdown on cryptocurrency, Huawei's upcoming programming language, the release of Apache Kafka 3.0, and other major developments in China's digital economy and industry leadership.

Apache KafkaBig DataDigital Economy
0 likes · 8 min read
Tech Highlights: China Crypto Ban, Huawei’s New Language, Kafka 3.0
Airbnb Technology Team
Airbnb Technology Team
Sep 27, 2021 · Big Data

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.

AirbnbBig DataData Quality
0 likes · 12 min read
Midas Certification: Airbnb’s End-to-End Data Quality Framework
Cloud Native Technology Community
Cloud Native Technology Community
Sep 26, 2021 · Big Data

Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests

Apache Kafka 3.0.0, released on September 21, 2021, introduces major changes such as deprecating Java 8 and Scala 2.12, adding Raft‑based metadata quorum, stronger producer delivery guarantees, removal of old message formats, numerous performance optimizations, extensive bug fixes, and a large set of new and updated JIRA issues across features, improvements, bugs, tasks, tests, and subtasks.

ApacheBig DataKafka3.0
0 likes · 37 min read
Apache Kafka 3.0.0 Release Summary: New Features, Improvements, Bugs, Tasks, and Tests
转转QA
转转QA
Sep 26, 2021 · Big Data

A/B Testing Process Improvement and Validation Guide

This article outlines a comprehensive A/B testing workflow, covering historical issues, business test process improvements, detailed implementation steps, SQL validation scripts, data verification in analytics platforms, and practical notes to ensure accurate experiment data collection and analysis.

A/B testingBig DataProcess Improvement
0 likes · 10 min read
A/B Testing Process Improvement and Validation Guide
Programmer DD
Programmer DD
Sep 26, 2021 · Big Data

What’s New in Apache Kafka 3.0? Key Features and Improvements Explained

Apache Kafka 3.0.0 introduces a host of enhancements—including deprecated Java 8/Scala 2.12 support, Raft metadata snapshots, stronger producer guarantees, MirrorMaker 2 upgrades, and Kafka Streams improvements—while continuing to serve real‑time data pipelines and streaming applications.

Apache KafkaBig DataKafka 3.0
0 likes · 3 min read
What’s New in Apache Kafka 3.0? Key Features and Improvements Explained
DataFunTalk
DataFunTalk
Sep 23, 2021 · Databases

Practical Use Cases of Materialized Views and Indexes in Doris

This article shares practical experiences with Doris, covering materialized view concepts, typical use cases, advantages, creation syntax, prefix index principles, performance‑boosting scenarios such as order analysis, PV/UV counting, detail queries, and operational tips for high‑throughput and low‑latency workloads.

Big DataDorisOLAP
0 likes · 18 min read
Practical Use Cases of Materialized Views and Indexes in Doris
Java Architect Essentials
Java Architect Essentials
Sep 21, 2021 · Big Data

Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices

The interview with Kuaishou senior architect Zhao Jianbo details the three‑phase evolution of its trillion‑scale big data platform, covering foundational Hadoop services, real‑time and OLAP extensions, deep customizations, Spring Festival Gala challenges, scheduling innovations, Hadoop usage, and the relationship between big data and cloud architectures.

Big DataFlinkHadoop
0 likes · 19 min read
Interview on Kuaishou's Billion‑Scale Big Data Architecture Evolution and Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 15, 2021 · Big Data

Linkis: Open‑Source Big Data Middleware Joins the Apache Incubator

Linkis, an open‑source computing middleware from WeBank, has entered the Apache Software Foundation Incubator, offering REST/WebSocket/JDBC interfaces to a wide range of engines such as Spark, Hive, Presto and Flink, and providing powerful governance, orchestration, and resource‑management capabilities for big‑data platforms.

Apache IncubatorBig DataData Platform
0 likes · 5 min read
Linkis: Open‑Source Big Data Middleware Joins the Apache Incubator
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 15, 2021 · Big Data

How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data

This article examines the evolution of big‑data architectures toward cloud‑native, real‑time processing, and provides a detailed comparison of dimension‑table and result‑table options—including MySQL, Redis, and Alibaba Cloud Tablestore—along with their performance, cost, and scalability characteristics for Flink SQL workloads.

Big DataFlink SQLReal-Time Computing
0 likes · 28 min read
How to Pick Real-Time Dimension & Result Tables for Cloud‑Native Big Data
IT Architects Alliance
IT Architects Alliance
Sep 12, 2021 · Industry Insights

Data Warehouse vs. Database: Core Differences and Building a Data Platform

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a data warehouse—including model selection, topic domain division, bus matrix, layered architecture, and data governance—then expands to the concept of a data middle platform and its distinction from data lakes and big‑data platforms.

Big DataData PlatformData Warehouse
0 likes · 18 min read
Data Warehouse vs. Database: Core Differences and Building a Data Platform
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2021 · Big Data

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a warehouse—including model selection, subject‑area definition, bus matrix, layering, and data quality—while also covering related concepts such as data middle platforms, data lakes, metadata, and modeling techniques.

Big DataData QualityData Warehouse
0 likes · 16 min read
Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices
DataFunTalk
DataFunTalk
Sep 11, 2021 · Cloud Computing

Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT

This article explains the background, challenges, overall architecture, core technology optimizations, edge‑computing integration, data modeling, serialization, and real‑world case studies of moving industrial IoT data to Alibaba Cloud, illustrating how cloud‑native solutions enable digital transformation in manufacturing.

Big DataCloud ComputingData Integration
0 likes · 16 min read
Industrial Data Cloud Migration: Architecture, Core Technologies, and Case Studies with Alibaba Cloud IoT
Tencent Tech
Tencent Tech
Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

Big DataEMRGame Analytics
0 likes · 9 min read
How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime
DataFunTalk
DataFunTalk
Sep 10, 2021 · Big Data

Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling

This article details Meitu's adoption of the Presto ad‑hoc ROLAP engine, comparing it with Hive on Spark and Impala, describing enhancements for coordinator high‑availability, and explaining a cross‑cluster scheduling strategy that leverages idle Presto resources to improve overall big‑data workload efficiency.

Big DataCross-Cluster SchedulingData engineering
0 likes · 16 min read
Presto High‑Performance Engine Practice at Meitu: Technical Selection, HA Design, and Cross‑Cluster Scheduling
Ctrip Technology
Ctrip Technology
Sep 9, 2021 · Big Data

Building Data Lineage at Ctrip: Architecture, Implementation, and Real‑World Applications

This article describes how Ctrip built a data lineage system for its big data platform, covering the concept of data lineage, collection methods, open‑source tools such as Apache Atlas and DataHub, the in‑house table‑level and field‑level solutions, implementation details for Hive, Spark and Presto, storage in JanusGraph, and practical applications in data governance, metadata management, scheduling and sensitivity labeling.

Big DataHiveJanusGraph
0 likes · 16 min read
Building Data Lineage at Ctrip: Architecture, Implementation, and Real‑World Applications
vivo Internet Technology
vivo Internet Technology
Sep 8, 2021 · Big Data

Overview of Vivo Marketing Automation Platform Architecture and Technical Design

The article outlines Vivo's marketing automation platform, explaining how it automates multi‑channel campaigns to solve timing, personalization, and ROI challenges, and describes its four business modules, layered system architecture—including gateway, service, compute, and storage components—and high‑availability features such as monitoring, smooth releases, rate limiting, and idempotent operations.

Big Data
0 likes · 14 min read
Overview of Vivo Marketing Automation Platform Architecture and Technical Design
Selected Java Interview Questions
Selected Java Interview Questions
Sep 7, 2021 · Big Data

Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips

This article provides a comprehensive overview of Elasticsearch, covering its fundamental architecture, key concepts such as indices, shards and replicas, the complete write and search workflows, consistency mechanisms, master node election, and practical performance‑tuning recommendations for large‑scale deployments.

Big DataCluster ManagementElasticsearch
0 likes · 15 min read
Elasticsearch Basics: Core Concepts, Indexing, Write and Search Processes, Cluster Management and Performance Tips
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 6, 2021 · Databases

How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics

ByteDance’s ByteHouse, an enterprise‑grade ClickHouse, powers real‑time recommendation and ad‑delivery analytics at massive scale, detailing two case studies, technical selections, architectural designs, and performance optimizations such as asynchronous indexing, multi‑threaded Kafka consumption, and enhanced buffer engines to ensure data integrity.

Big DataByteHouseClickHouse
0 likes · 10 min read
How ByteDance Optimized ClickHouse for Real‑Time Recommendation and Ad Analytics
Laravel Tech Community
Laravel Tech Community
Sep 5, 2021 · Artificial Intelligence

Comprehensive Collection of Open Data Sources and Datasets for AI and Data Analysis

This article provides a curated list of publicly available data query websites, simple universal datasets, large-scale collections, and specialized datasets for machine learning, image classification, text classification, and recommendation systems, offering valuable resources for AI research and data-driven projects.

Artificial IntelligenceBig DataImage Classification
0 likes · 7 min read
Comprehensive Collection of Open Data Sources and Datasets for AI and Data Analysis
IT Architects Alliance
IT Architects Alliance
Sep 5, 2021 · Big Data

Big Data Platform Architecture: Core Layers, Technologies, and Practices

This article outlines a typical big data platform architecture, detailing its core layers—data acquisition, storage and analysis, sharing, application, real‑time computation, and task scheduling—while introducing key technologies such as Flume, HDFS, Hive, Spark, DataX, and monitoring considerations.

Big DataData PlatformHadoop
0 likes · 9 min read
Big Data Platform Architecture: Core Layers, Technologies, and Practices
Architects Research Society
Architects Research Society
Sep 4, 2021 · Databases

Why Data Scientists Should Learn PostgreSQL

This article explains why mastering SQL and PostgreSQL is essential for data scientists, outlines the core skills of the role, describes PostgreSQL’s features, lists its advantages and drawbacks for data science, and suggests resources for getting started.

Big DataHTAPPostgreSQL
0 likes · 10 min read
Why Data Scientists Should Learn PostgreSQL
DataFunTalk
DataFunTalk
Sep 4, 2021 · Big Data

High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations

The article details JD.com’s large‑scale OLAP strategy using ClickHouse as the primary engine and Doris as a secondary engine, covering application scenarios, component selection criteria, cluster deployment models, high‑availability architecture, fault‑handling procedures, performance tuning, and future cloud‑native plans.

Big DataClickHouseCluster Deployment
0 likes · 19 min read
High‑Availability Practices of ClickHouse in JD.com: Architecture, Deployment, and Operations
DataFunTalk
DataFunTalk
Sep 3, 2021 · Big Data

Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations

This article details ByteDance's implementation of an exabyte‑scale data lake using Apache Hudi, covering scenario requirements, engine selection, functional support, schema management, extensive performance tuning, and future directions, while also noting recruitment opportunities within the team.

Apache HudiBig DataByteDance
0 likes · 9 min read
Building an Exabyte‑Scale Data Lake with Apache Hudi at ByteDance: Architecture, Design Choices, and Performance Optimizations
ByteDance ADFE Team
ByteDance ADFE Team
Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataData engineeringDatabase
0 likes · 24 min read
Evolution of the Big Data Technology Stack Over the Past Five Years
Baidu Geek Talk
Baidu Geek Talk
Aug 30, 2021 · Artificial Intelligence

Baidu Credibility Certification Platform: Architecture, Core Capabilities, and Technical Design

Baidu Credibility Certification Platform is an AI‑powered verification service that offers unified authentication, qualification certification, workflow orchestration, and intelligent document validation for enterprises, institutions, and individuals, built on a mid‑platform architecture with shared components and future plans to expand content and service certification.

AIBaiduBig Data
0 likes · 15 min read
Baidu Credibility Certification Platform: Architecture, Core Capabilities, and Technical Design
Programmer DD
Programmer DD
Aug 30, 2021 · Big Data

Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput

This article explains how Kafka achieves remarkable speed and massive throughput by using sequential disk I/O, OS page cache, zero‑copy transfers, partitioned log segments with indexes, batch processing, and efficient compression, making it a cornerstone of modern big‑data pipelines.

Big DataHigh ThroughputKafka
0 likes · 9 min read
Why Is Kafka So Fast? Unveiling the Secrets Behind Its High Throughput
Tencent Cloud Developer
Tencent Cloud Developer
Aug 26, 2021 · Big Data

Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices

The first Shenzhen Elasticsearch meetup on August 21, 2021, jointly hosted by the ES Chinese community and Tencent Cloud, gathered experts from Tencent, Tapdata, ByteDance and Vivo to showcase rapid community growth, compression‑encoding optimizations, real‑time ES‑MongoDB data fusion, custom kernel extensions, large‑scale cluster practices, and concluded with extensive Q&A and networking.

Big DataCluster ManagementElasticsearch
0 likes · 11 min read
Recap of Shenzhen Elasticsearch Meetup – Community Growth, Compression Optimization, Real‑time Data Fusion, and Cluster Practices
Selected Java Interview Questions
Selected Java Interview Questions
Aug 25, 2021 · Databases

ClickHouse Overview: Architecture, MySQL Migration, Performance Testing, and Practical Tips

This article introduces ClickHouse, a high‑performance open‑source columnar database, explains its architecture versus row‑based systems, details migration from MySQL, showcases installation, performance benchmarks, data‑sync strategies, common pitfalls, and summarizes its benefits for large‑scale analytical workloads.

Big DataClickHouseColumnar Database
0 likes · 7 min read
ClickHouse Overview: Architecture, MySQL Migration, Performance Testing, and Practical Tips
DataFunSummit
DataFunSummit
Aug 22, 2021 · Big Data

Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans

This article details the historical development, architectural layers, ETL migration to Spark, data modeling standards, governance processes, resource optimization, security measures, and future roadmap of Meituan Waimai's offline data warehouse, illustrating how the team addressed scalability and efficiency challenges.

Big DataData WarehouseETL
0 likes · 21 min read
Evolution and Optimization of Meituan Waimai Offline Data Warehouse: Architecture, ETL, Modeling, Governance, and Future Plans
Top Architect
Top Architect
Aug 18, 2021 · Big Data

Elasticsearch Indexing and Retrieval Optimization for Billion‑Scale Data

This article describes how a top architect optimized Elasticsearch for handling billions of records, covering Lucene fundamentals, index and shard design, DocValues, query performance tuning, bulk indexing strategies, hardware considerations, and testing methods to achieve sub‑second query responses across multi‑year data ranges.

Big DataElasticsearchIndex Optimization
0 likes · 12 min read
Elasticsearch Indexing and Retrieval Optimization for Billion‑Scale Data
Architects' Tech Alliance
Architects' Tech Alliance
Aug 17, 2021 · Cloud Computing

Integrated Vehicle‑Road Cloud Control System Architecture

The integrated vehicle‑road cloud control system is a next‑generation information‑physical architecture that unifies vehicles, roads, and cloud services through edge, regional, and central clouds, providing real‑time perception, decision‑making, and control to improve traffic safety, efficiency, and sustainability.

Big DataSystem architectureedge computing
0 likes · 10 min read
Integrated Vehicle‑Road Cloud Control System Architecture
dbaplus Community
dbaplus Community
Aug 17, 2021 · Big Data

How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics

This article examines JD's shift from a traditional Lambda‑based data warehouse to a Delta Lake‑powered real‑time data lake, detailing the challenges of legacy architectures, the evaluation of open‑source table formats, Delta Lake's core mechanisms, and the resulting simplified batch‑stream development workflow.

Batch-StreamBig DataData Lake
0 likes · 11 min read
How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics
DataFunTalk
DataFunTalk
Aug 14, 2021 · Databases

Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption

The article chronicles Lenovo Liancheng Zhida’s three‑stage evolution of OLAP engines—from early SQL Server scripts, through a Hadoop‑based Presto solution, to the adoption of DorisDB—detailing architecture, tool comparisons, implementation practices, and the performance and operational benefits achieved.

AnalyticsBig DataDorisDB
0 likes · 12 min read
Evolution of OLAP Engines at Lenovo Liancheng Zhida and DorisDB Adoption
IT Architects Alliance
IT Architects Alliance
Aug 14, 2021 · Big Data

An Introduction to Dimensional Modeling in Data Warehousing

This article provides a comprehensive overview of data warehouse concepts, compares classic warehouse models, explains dimensional modeling fundamentals such as fact and dimension tables, demonstrates a practical e‑commerce scenario with schema design and SQL query examples, and discusses real‑world trade‑offs.

Big DataETLStar Schema
0 likes · 9 min read
An Introduction to Dimensional Modeling in Data Warehousing
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData QualityStream Processing
0 likes · 19 min read
How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform
Baidu Intelligent Testing
Baidu Intelligent Testing
Aug 10, 2021 · Backend Development

Evolution and Architecture of Baidu's Fengjing APM System

This article chronicles the four‑year evolution of Baidu's Fengjing performance‑monitoring platform, detailing its data collection, processing pipelines, successive architectural versions (1.0‑4.0), challenges such as probe intrusion and massive data volume, and the engineering solutions that enabled large‑scale, low‑cost, cloud‑native observability for thousands of Java services.

APMBig DataJava
0 likes · 9 min read
Evolution and Architecture of Baidu's Fengjing APM System
DataFunTalk
DataFunTalk
Aug 5, 2021 · Big Data

Building a Unified High‑Performance OLAP Platform with DorisDB at Beike Real Estate

The article describes how Beike Real Estate consolidated multiple OLAP engines into a single DorisDB‑based platform, detailing the business challenges, DorisDB’s technical advantages, extensive performance and concurrency benchmarks, and the resulting improvements in stability, query speed, and operational simplicity across various business scenarios.

AnalyticsBenchmarkBig Data
0 likes · 14 min read
Building a Unified High‑Performance OLAP Platform with DorisDB at Beike Real Estate
Baidu Intelligent Testing
Baidu Intelligent Testing
Aug 5, 2021 · Operations

Baidu Search Stability Issue Analysis: Automated Fault Detection and Resolution Techniques

This article details Baidu Search's high‑availability engineering, describing eight major challenges in fault analysis and the corresponding innovations—index mirroring, streaming analysis, comprehensive label sets, feature engineering, query reconstruction, intelligent ranking, timeline analysis, and chaos engineering—that together enable near‑real‑time, 99% accurate detection and mitigation of search service failures.

Big DataReliabilitySearch
0 likes · 13 min read
Baidu Search Stability Issue Analysis: Automated Fault Detection and Resolution Techniques
Alimama Tech
Alimama Tech
Aug 4, 2021 · Big Data

Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution

FAE, Alibaba’s high‑performance distributed MPP engine, stores billions of user‑behavior events in a time‑ordered AFile model and uses stateless masters, importers, mergers and workers with Redis and MySQL metadata to deliver sub‑second, 10‑100× faster ad‑attribution queries across ad‑hoc, offline and near‑real‑time scenarios such as frequency, path and funnel analysis.

Ad AttributionBig DataDistributed computing
0 likes · 11 min read
Fast Attribution Engine (FAE): High‑Performance Distributed Computing for User Behavior and Advertising Attribution
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 3, 2021 · Big Data

Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events

This article, compiled from a Volcano Engine meetup, explains how ByteDance’s unified traffic platform designs, governs, and processes massive event‑tracking data in real time, covering embedding content solutions, link architecture, dynamic processing engines, and data‑governance practices that support trillions of daily events.

Big DataData engineeringReal-time Processing
0 likes · 16 min read
Inside ByteDance’s Traffic Platform: Powering Trillions of Real‑Time Events
Efficient Ops
Efficient Ops
Aug 2, 2021 · Operations

How Alibaba Scales Massive Big Data Engines with an SRE Framework

This article describes Alibaba’s comprehensive SRE system for managing ultra‑large‑scale big data engines, detailing stability metrics, resource cost management, and intelligent operation productization, and introduces speaker Fu Tianyuan, a senior operations expert leading the MaxCompute and DataWorks SRE team.

AlibabaBig DataCloud Computing
0 likes · 3 min read
How Alibaba Scales Massive Big Data Engines with an SRE Framework