Tagged articles

3697 articles

Page 21 of 37

Aug 2, 2021 · Big Data

How to Build a Beginner Hadoop Cluster on CentOS 7

This article introduces Apache Hadoop’s open‑source framework, explains its core components such as HDFS, MapReduce, ZooKeeper, HBase, Hive, Pig, Mahout, Sqoop, Flume, Chukwa, Oozie, Ambari and YARN, and outlines the steps to set up a beginner‑level Hadoop cluster on CentOS 7.

Big DataCentOS 7HBase

0 likes · 11 min read

How to Build a Beginner Hadoop Cluster on CentOS 7

Big Data Technology & Architecture

Aug 2, 2021 · Big Data

Comprehensive Big Data Interview Question Guide for Major Tech Companies

This article compiles extensive interview questions and topics covering Hadoop, Spark, Flink, Hive, Kafka, MySQL, Redis, Java fundamentals, and algorithms, organized by companies such as Xiaomi, ByteDance, Alibaba, Shopee, Tencent, Meituan, NetEase, and Baidu, to help candidates prepare effectively for big‑data engineering roles.

Big DataFlinkHadoop

0 likes · 22 min read

Comprehensive Big Data Interview Question Guide for Major Tech Companies

ByteDance SE Lab

Jul 30, 2021 · Operations

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

The article examines Salesforce’s five‑hour global outage caused by a shortcut DNS deployment and the subsequent recovery challenges, then explores a viral experiment where twenty smartphones generated artificial traffic congestion, illustrating how real‑time data feeds and operational safeguards can prevent large‑scale service disruptions.

Big DataCloud ComputingIncident Management

0 likes · 7 min read

Inside Salesforce’s Global Outage: What Went Wrong and How to Prevent It

JD Tech

Jul 30, 2021 · Databases

Practical Use of HBase in a Logistics HR Data Preprocessing Platform

This article details how the logistics HR data preprocessing platform processes around 20 million daily records by adopting HBase for high‑performance, scalable, column‑oriented storage, covering its architecture, read/write mechanisms, best practices, and performance considerations.

Big DataHBaseNoSQL

0 likes · 10 min read

Practical Use of HBase in a Logistics HR Data Preprocessing Platform

DataFunTalk

Jul 29, 2021 · Big Data

Real-Time Data Warehouse Construction at TAL Using DorisDB

This article details TAL's transition from offline to real-time data warehousing, describing business drivers, pain points, architectural evolution through Hive, Flink+Kudu, and DorisDB, and outlining the system design, data flow, scheduling, monitoring, and the resulting business and cost benefits.

AirflowBig DataDorisDB

0 likes · 14 min read

Real-Time Data Warehouse Construction at TAL Using DorisDB

Airbnb Technology Team

Jul 29, 2021 · Big Data

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

Airbnb’s 2019 Data Quality Improvement Plan reorganized its data‑engineering workforce, introduced a dedicated data‑engineer role, adopted a decentralized Minerva‑based architecture with Spark pipelines, instituted rigorous testing, governance, and certification processes, and established SLAs and monitoring to ensure timely, trustworthy, well‑documented data across the enterprise.

AirbnbBig DataData Architecture

0 likes · 13 min read

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

DataFunTalk

Jul 28, 2021 · Big Data

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

This article reviews the Pravega project and its Flink connector, covering Pravega's design for large‑scale streaming, the connector's evolution and exact‑once semantics, Flink 1.11 integration challenges, checkpoint mechanisms, and future plans such as schema‑registry and new Flink features.

Big DataCheckpointConnector

0 likes · 10 min read

Pravega Flink Connector: Past, Present, and Future – Architecture, Checkpoint Integration, and Upcoming Features

DataFunTalk

Jul 27, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

This article describes how Shuhai Supply Chain upgraded its data warehouse from a complex, high‑cost 1.0 architecture to a streamlined, real‑time solution built around Apache Doris, detailing the motivations, design choices, zero‑code ingestion, metadata management, Flink connector, and the resulting performance gains.

Apache DorisBig DataFlink

0 likes · 13 min read

Building a Real‑Time Data Warehouse with Apache Doris at Shuhai Supply Chain

Big Data Technology Architecture

Jul 27, 2021 · Big Data

Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch

This article introduces the most important and still mainstream components of the big data ecosystem—including Hadoop’s storage and compute framework, Hive data warehouse, HBase NoSQL database, Spark unified engine, Kafka messaging platform, and Elasticsearch search engine—explaining their core concepts, architectures, and typical use cases.

Big DataElasticsearchHBase

0 likes · 9 min read

Key Components of the Big Data Ecosystem: Hadoop, Hive, HBase, Spark, Kafka, and Elasticsearch

DataFunTalk

Jul 26, 2021 · Big Data

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

This article describes how SmartNews integrated Flink into its Airflow‑driven Hive batch pipeline to cut the actions table generation latency from three hours to about thirty‑four minutes, detailing the technical challenges, design decisions, and production results.

Big DataFlinkHive

0 likes · 12 min read

Accelerating Hive Daily Tables with Flink: A SmartNews Case Study

DataFunTalk

Jul 25, 2021 · Databases

Practical Application of Apache Kudu at NetEase: Architecture, Use Cases, Challenges and Future Directions

This article explains Apache Kudu’s architecture, schema design, update mechanism, and how NetEase leverages it for real‑time data ingestion, dimension table joins, data‑warehouse ETL, and AB‑testing, while also discussing encountered issues and upcoming feature requests.

Apache KuduBig DataNetEase

0 likes · 11 min read

Practical Application of Apache Kudu at NetEase: Architecture, Use Cases, Challenges and Future Directions

Architecture Digest

Jul 25, 2021 · Big Data

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

The article details the background, architecture, core features, scheduling mechanisms, Lisp‑based query DSL, and Alluxio integration of Vipshop's self‑developed Hera data service, illustrating how it unifies multi‑engine data access, improves SLA, and accelerates large‑scale crowd computing tasks.

AlluxioBig DataData Service

0 likes · 21 min read

Design and Architecture of Hera Data Service for Unified Data Access at Vipshop

Architects Research Society

Jul 22, 2021 · Big Data

Enterprise Data Strategy: Aligning Tactics, Governance, and the Experience Economy

This article explores how a clear enterprise data strategy—distinguishing strategic goals from tactical steps, emphasizing clean and governed data, and integrating analytics with business missions—drives reliable outcomes and supports the experience economy through coordinated CXM platforms and data products.

AnalyticsBig DataEnterprise Data

0 likes · 9 min read

Enterprise Data Strategy: Aligning Tactics, Governance, and the Experience Economy

dbaplus Community

Jul 21, 2021 · Big Data

Youzan’s Blueprint: Data Governance, Quality Scoring, and Cost Reduction for AI

At Youzan, data governance evolves from massive data assets to AI readiness through systematic data assetization, quantitative quality scoring, cost measurement, and targeted operational tactics, enabling precise quality monitoring, cost allocation, and continuous improvement that drive both data value and cost efficiency.

AI readinessBig Datacost optimization

0 likes · 18 min read

Youzan’s Blueprint: Data Governance, Quality Scoring, and Cost Reduction for AI

Tencent Cloud Developer

Jul 21, 2021 · Big Data

Bloom Filter: Introduction, Theory, Construction, Query, and Applications

The article explains Bloom filters—a probabilistic, space‑efficient data structure using multiple hash functions on a bit array to answer set‑membership queries with controllable false‑positive rates, detailing their construction, query process, optimal parameters, and common uses such as URL deduplication, cache protection, and spam filtering.

Big DataCache Optimizationbloom-filter

0 likes · 8 min read

Bloom Filter: Introduction, Theory, Construction, Query, and Applications

IT Architects Alliance

Jul 20, 2021 · Big Data

Understanding Data Middle Platform: Layers, Architecture, and Implementation Methodology

The article explains the concept of a data middle platform, detailing its three-layer structure—data model, data service, and data development—illustrates how data modeling enables cross-domain integration, how services encapsulate data for flexible consumption, and how development tools support customized data applications, using a telecom operator example.

Big DataData ArchitectureData Platform

0 likes · 2 min read

Understanding Data Middle Platform: Layers, Architecture, and Implementation Methodology

Huawei Cloud Developer Alliance

Jul 20, 2021 · Backend Development

From Non‑Tech Student to Cloud MVP: Go, AI, and Startup Insights

In this interview, Huawei Cloud MVP Wang Ming shares how a non‑computer‑science background led him to a successful IT career, discusses the advantages of interdisciplinary skills, offers entrepreneurship advice, predicts future tech trends, and explains the key concepts of his popular Go concurrency book.

Artificial IntelligenceBig DataEntrepreneurship

0 likes · 7 min read

From Non‑Tech Student to Cloud MVP: Go, AI, and Startup Insights

Xianyu Technology

Jul 20, 2021 · Big Data

Design and Implementation of a Content Flow Control System for Xianyu Community

The Xianyu “Play” tab flow‑control system combines task‑specific and rule‑based strategies with a dynamic strategy‑, control‑, and distribution‑chain architecture that integrates real‑time data processing into the recommendation engine, delivering guaranteed exposure, boosting daily posts by 14.4 % and paving the way for multi‑objective, zero‑code control.

Big DataFlow ControlReal-time Streaming

0 likes · 6 min read

Design and Implementation of a Content Flow Control System for Xianyu Community

21CTO

Jul 18, 2021 · Databases

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

This article examines common causes of slow MySQL queries, explains index mechanics and failures, then compares ElasticSearch’s fast tokenized search and HBase’s column‑oriented storage, offering practical guidance on when and how to use each technology.

Big DataDatabase PerformanceHBase

0 likes · 21 min read

Why Your MySQL Queries Are Slow and How ElasticSearch & HBase Can Help

Open Source Linux

Jul 17, 2021 · Big Data

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

This article provides a clear, visual guide to Kafka’s core concepts—including producers, consumers, topics, partitions, consumer groups, message ordering, and the underlying ZooKeeper‑managed cluster architecture—helping readers grasp how Kafka enables reliable, scalable stream processing.

Big DataConsumersPartitions

0 likes · 6 min read

Master Kafka Basics: Topics, Partitions, Producers & Consumers Explained

Architects' Tech Alliance

Jul 15, 2021 · Cloud Computing

Edge Computing: Challenges, Research Focus, and Related Paradigms

The article explains edge computing as a decentralized computing model that addresses high‑reliability, low‑latency demands, data‑center energy consumption, big‑data processing pressure, low resource utilization, intelligent front‑ends, and security‑privacy concerns, and it outlines key research areas and related paradigms such as fog, mobile edge, sea, and intelligent edge computing.

Big DataFog Computingedge computing

0 likes · 8 min read

Edge Computing: Challenges, Research Focus, and Related Paradigms

Xianyu Technology

Jul 13, 2021 · Big Data

Design and Implementation of Xianyu Real-Time Data Warehouse

To meet Xianyu’s billion‑event‑per‑day real‑time analysis needs, the team built a petabyte‑scale warehouse using Hologres for storage and Alibaba‑enhanced Flink (Blink) for streaming, organized into ODS, DWD, DWS, ADS and DIM layers, enabling minute‑level aggregations, rapid anomaly detection, and instant product‑team insights.

Big DataHologresStream Processing

0 likes · 12 min read

Design and Implementation of Xianyu Real-Time Data Warehouse

dbaplus Community

Jul 11, 2021 · Big Data

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

This article explains how Beike adopted the Druid OLAP engine to handle massive real‑time and offline query workloads, detailing its four‑component architecture, key technologies such as deep storage and metadata storage, practical optimizations for data ingestion, query caching, dynamic throttling, timeout control, and a roadmap for future enhancements.

Big DataDruidOLAP

0 likes · 19 min read

Scaling Real‑Time & Offline Analytics with Druid: Architecture, Optimizations, and Lessons

Python Crawling & Data Mining

Jul 10, 2021 · Big Data

Why Tags Are the Core of Data Middle Platforms: Unlock Business Value

This article explains what tags are, how they function as data assets, defines the concept and architecture of a data middle platform, and demonstrates why tags are the pivotal element that enables enterprises to turn raw data into valuable, reusable business services.

Big DataData ArchitectureData Assets

0 likes · 7 min read

Why Tags Are the Core of Data Middle Platforms: Unlock Business Value

Tech Musings

Jul 8, 2021 · Big Data

Building a Simple Single-Node MapReduce System: From Theory to Code

This article walks through implementing a lightweight single‑machine MapReduce framework inspired by the original MapReduce paper, covering the abstract Map/Reduce model, task scheduling between master and workers, core Go code for map, reduce, worker, and coordinator, and a brief reflection on its limitations.

Big DataDistributed SystemsLab

0 likes · 10 min read

Building a Simple Single-Node MapReduce System: From Theory to Code

DataFunTalk

Jul 7, 2021 · Big Data

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

This article explains the growing analytical demands in the big‑data era, the limitations of traditional OLAP, and how Kyligence’s distributed OLAP engine addresses data‑island issues, multi‑dimensional and many‑to‑many analysis, unified security, and performance optimization with MDX on Spark, delivering a seamless Excel‑like experience.

AnalyticsBig DataData Integration

0 likes · 9 min read

Solving Data Island Challenges and Enabling Advanced OLAP Analysis on Heterogeneous Big Data Platforms – Kyligence Solution Overview

dbaplus Community

Jul 4, 2021 · Big Data

How Didi Scales MySQL‑to‑Hive Sync with Real‑Time Binlog Capture

This article explains Didi's end‑to‑end architecture for ingesting MySQL data into Hive using real‑time Binlog collection, a customized Canal component, message queues, HDFS storage, Dquality monitoring, and strategies for handling data drift and sharding in large‑scale big‑data environments.

Big DataCanalHive

0 likes · 13 min read

How Didi Scales MySQL‑to‑Hive Sync with Real‑Time Binlog Capture

DataFunTalk

Jul 2, 2021 · Big Data

Exploring JD Logistics’ Billion‑Scale Data Management and Analytics with Apache Doris

This article details JD Logistics’ challenges in handling petabyte‑level data, outlines their existing data architecture, and explains how they adopted Apache Doris for faster, scalable analytics, covering table management, data import workflows, visualization tools, and future roadmap for data engineering.

Apache DorisBig DataData engineering

0 likes · 14 min read

Exploring JD Logistics’ Billion‑Scale Data Management and Analytics with Apache Doris

37 Mobile Game Tech Team

Jul 2, 2021 · Big Data

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

This article walks through Flink's metric system by explaining the core interfaces such as MetricReporter and MetricRegistry, showing how metrics are added, registered, and queried during TaskManager startup, and detailing both REST and Prometheus approaches for retrieving metric values.

Big DataFlinkJava

0 likes · 16 min read

Inside Flink Metrics: Adding, Retrieving, and Exposing Metrics in TaskManager

TAL Education Technology

Jul 1, 2021 · Big Data

Optimization of A/B Test Metric Computation Using Spark and ClickHouse

This article details the design and multi‑stage optimization of an A/B testing metric system, describing its product architecture, Spark‑based computation engine, ClickHouse OLAP layer, cumulative calculation improvements, and batch processing techniques that reduced processing time from hours to a few minutes for hundreds of experiments and metrics.

A/B testingBig DataClickHouse

0 likes · 8 min read

Optimization of A/B Test Metric Computation Using Spark and ClickHouse

Architect

Jul 1, 2021 · Big Data

Data Governance Practices at Meituan Hotel Travel Platform

This article presents a comprehensive case study of Meituan's hotel‑travel data governance, covering the background, challenges, strategic goals, standardized processes, technical systems, cost and security optimizations, measurable outcomes, and future plans for automated governance.

Big DataData QualityData Security

0 likes · 29 min read

Data Governance Practices at Meituan Hotel Travel Platform

Big Data Technology & Architecture

Jul 1, 2021 · Big Data

Data Governance: Concepts, Goals, Methodology, Tools, and Case Studies

This article explains what data governance is, why it is needed, its objectives, core components, implementation methodology, required tools, and real‑world practices from Meituan Delivery and Ant Financial, illustrating how organized data management drives business value and risk control.

Big DataData ManagementData Quality

0 likes · 26 min read

Data Governance: Concepts, Goals, Methodology, Tools, and Case Studies

Youzan Coder

Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink

0 likes · 12 min read

Online Monitoring Practices for Offline and Real-Time Data at Youzan

JD Retail Technology

Jun 29, 2021 · Big Data

The Value of Data and Data Products: From Concept to Practice

This article explains how data has become a critical production resource, outlines the limitations of traditional data‑analysis workflows, defines data products and their components, describes their advantages and key characteristics, and shares practical case studies of data‑product implementations in a large e‑commerce environment.

Big DataData AnalysisData Product

0 likes · 16 min read

The Value of Data and Data Products: From Concept to Practice

DataFunTalk

Jun 26, 2021 · Big Data

Building a Scalable Big Data Service System at Didi: Practices and Lessons

Zhang Liang shares Didi's four-stage journey of constructing and governing large‑scale open‑source big‑data engine services—including engine selection, hardware sizing, PaaS platform building, proxy architecture, and governance—highlighting practical challenges, solutions, and ROI‑driven best practices for Kafka, Elasticsearch, Flink, and related technologies.

Big DataData InfrastructureElasticsearch

0 likes · 16 min read

Building a Scalable Big Data Service System at Didi: Practices and Lessons

Architects Research Society

Jun 26, 2021 · Big Data

Comprehensive Overview of Over 50 Big Data Terms and Technologies

This article presents an extensive glossary of more than fifty big‑data concepts—including Apache projects, data‑analysis methods, storage formats, AI‑related terms, and emerging metrics—providing concise English explanations for each term.

Apache HadoopBig DataData engineering

0 likes · 17 min read

Comprehensive Overview of Over 50 Big Data Terms and Technologies

Laravel Tech Community

Jun 25, 2021 · Big Data

Apache Kudu 1.15.0 – New Features and Improvements

Apache Kudu 1.15.0 adds experimental multi‑row transaction support (currently INSERT and INSERT_IGNORE), Raft‑based master configuration tools, table comment synchronization with Hive Metastore, per‑table size and row‑count limits configurable via flags or the kudu table set_limit tool, a customizable Kerberos principal flag, and TLS v1.3 with optional cipher‑suite selection, collectively enhancing low‑latency random access and analytical capabilities in the Hadoop ecosystem.

Apache KuduBig DataHadoop

0 likes · 3 min read

Apache Kudu 1.15.0 – New Features and Improvements

DataFunTalk

Jun 25, 2021 · Big Data

Building Data Products and a Data Middle Platform at NetEase Yanxuan: Practices and Lessons

The article details NetEase Yanxuan's end‑to‑end data product ecosystem and data middle platform, describing four core data products, the architecture of the data middle platform, efficient high‑quality delivery, governance practices, and key performance metrics that support data‑driven decision making.

BIBig DataData Product

0 likes · 14 min read

Building Data Products and a Data Middle Platform at NetEase Yanxuan: Practices and Lessons

Yuewen Technology

Jun 25, 2021 · Big Data

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

This article details how Yuedu Group designed and implemented an overseas big data platform, covering overall system architecture, offline data‑warehouse construction with dimensional modeling, real‑time streaming using Oceanus and ClickHouse, and future plans for cost reduction and data quality assurance.

Big DataCloud ComputingReal-time Processing

0 likes · 12 min read

Building Yuedu Group’s Overseas Big Data Platform: Architecture, Offline & Real‑Time Processing

Architecture Digest

Jun 24, 2021 · Big Data

Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook

This article introduces Kuaishou's data platform serviceification, outlining the background challenges for data engineers, the platform's architecture and key technologies such as configuration‑driven development, multi‑mode APIs, data acceleration, and high‑availability mechanisms, and concludes with a summary of achievements and future directions.

Big DataData AccelerationData Platform

0 likes · 12 min read

Kuaishou's Big Data Service Platform: Architecture, Key Technologies, and Future Outlook

dbaplus Community

Jun 22, 2021 · Databases

HBase vs Kudu vs ClickHouse: Architecture, Deployment, and Operations Compared

This article provides a side‑by‑side technical comparison of HBase, Kudu, and ClickHouse, covering their installation dependencies, architectural designs, read/write workflows, query capabilities, real‑world use cases at Didi, NetEase, and Ctrip, and practical operational tips.

Big DataClickHouseHBase

0 likes · 20 min read

HBase vs Kudu vs ClickHouse: Architecture, Deployment, and Operations Compared

Didi Tech

Jun 22, 2021 · Big Data

MySQL Binlog Real‑time Collection and Hive Ingestion at DiDi: Architecture and Practices

DiDi’s real‑time MySQL‑to‑Hive pipeline captures row‑mode binlog with a custom Canal component, converts it to JSON, streams it via Kafka to HDFS, restores it into Hive tables, and uses Dquality for integrity, achieving millisecond latency for over 19,000 daily sync tasks handling roughly 50 TB of data.

Big DataCanalETL

0 likes · 13 min read

MySQL Binlog Real‑time Collection and Hive Ingestion at DiDi: Architecture and Practices

DevOps

Jun 22, 2021 · Operations

Building Digital Champion Capabilities: Integrating Customer Solutions, Operations, Technology, and Talent Ecosystems

The article outlines how digital‑champion enterprises achieve superior performance by integrating four core ecosystems—customer solutions, operations, technology, and talent—through strategic planning, partnership, and advanced technologies such as AI, big data, and industrial IoT, while highlighting maturity stages and practical implementation steps.

Artificial IntelligenceBig DataDigital Transformation

0 likes · 28 min read

Building Digital Champion Capabilities: Integrating Customer Solutions, Operations, Technology, and Talent Ecosystems

DataFunTalk

Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink

0 likes · 13 min read

Flink + Iceberg 0.11 Practices in Qunar Data Platform

Tencent Cloud Developer

Jun 21, 2021 · Industry Insights

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

This article explains how Tencent Cloud EMR integrated Hadoop YARN with Kubernetes Pods to create a hybrid online‑offline deployment, implement elastic autoscaling and multi‑label resource allocation, and achieve several‑hundred‑percent improvements in CPU utilization while preserving cluster stability.

AutoscalingBig DataHadoop

0 likes · 11 min read

How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs

Architecture Digest

Jun 21, 2021 · Databases

Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices

This article introduces the HR performance data preprocessing platform’s requirements, explains why HBase was selected as the storage solution, details its core concepts, architecture, data write/read processes, best practices, limitations, and presents performance metrics demonstrating its suitability for large‑scale, high‑throughput workloads.

Big DataDatabase ArchitectureHBase

0 likes · 12 min read

Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices

Qunar Tech Salon

Jun 21, 2021 · Big Data

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

This article examines the challenges of using Kafka, Flink, and Hive for real‑time data warehousing, introduces Apache Iceberg 0.11 as a solution, details its architecture, query planning, Flink integration, code examples, optimization techniques, and summarizes the benefits for large‑scale data processing.

Big DataData LakeFlink

0 likes · 12 min read

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

DataFunTalk

Jun 20, 2021 · Databases

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

This article details Xiaohongshu’s multi‑stage evolution of its OLAP infrastructure—from Redshift to Presto, ClickHouse, and finally DorisDB—describing the data pipeline, tool comparisons, advertising use‑case implementation, and the resulting performance and operational benefits.

Big DataClickHouseDorisDB

0 likes · 12 min read

Xiaohongshu’s OLAP Architecture Evolution and DorisDB Adoption

ITFLY8 Architecture Home

Jun 20, 2021 · Big Data

Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing

This article explains how HBase’s distributed column‑oriented architecture, high‑performance read/write capabilities, and flexible schema make it a cost‑effective solution for handling massive, unstructured HR performance data, covering its core concepts, cluster operation, best practices, and performance metrics.

Big DataHBasedata preprocessing

0 likes · 11 min read

Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing

Ctrip Technology

Jun 17, 2021 · Big Data

Data Governance Practices and Cost Optimization at Ctrip's Data Asset Management Platform

The article outlines Ctrip's data governance framework, detailing background challenges, metadata construction, cost and quality optimization techniques, data flow improvements, platform modules, health metrics, and concludes with a summary of achievements and future directions.

Big DataCtripData Quality

0 likes · 13 min read

Data Governance Practices and Cost Optimization at Ctrip's Data Asset Management Platform

Sohu Tech Products

Jun 16, 2021 · Big Data

Understanding Databases, Data Warehouses, Data Lakes, and the Emerging Lake House Architecture

This article explains the fundamental differences between databases, data warehouses, and data lakes, describes how they complement each other, and introduces the Lake House concept that integrates transactional and analytical workloads using cloud services such as Amazon S3, Redshift Spectrum, and Athena.

Big DataData LakeData Warehouse

0 likes · 11 min read

Understanding Databases, Data Warehouses, Data Lakes, and the Emerging Lake House Architecture

Efficient Ops

Jun 16, 2021 · Databases

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

This article presents a comprehensive guide to synchronizing heterogeneous data sources with ElasticSearch, migrating clusters across environments, and implementing robust disaster‑recovery solutions for both intra‑city and inter‑city high‑availability scenarios.

Big DataCluster SyncData Migration

0 likes · 16 min read

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

DevOps

Jun 16, 2021 · Operations

Understanding Digital Transformation: Definitions, Strategic Questions, Drivers, Frameworks, Roadmaps, Benefits and Pitfalls

The article provides a comprehensive overview of digital transformation, covering its definition, essential strategic questions, key drivers such as customer expectations, cloud and AI, priority areas in the value chain, practical frameworks, roadmap steps, expected benefits and common reasons for failure.

Artificial IntelligenceBig DataDigital Transformation

0 likes · 20 min read

Understanding Digital Transformation: Definitions, Strategic Questions, Drivers, Frameworks, Roadmaps, Benefits and Pitfalls

IT Architects Alliance

Jun 15, 2021 · Industry Insights

How Cloud Computing, Big Data, and AI Intertwine to Power Modern Services

This article explains the evolution of cloud computing from resource management to elastic virtualization, the emergence of IaaS, PaaS and SaaS service models, how big‑data processing relies on distributed cloud platforms, and why artificial intelligence now depends on massive data and cloud‑scale compute to deliver intelligent services.

Artificial IntelligenceBig DataCloud Computing

0 likes · 37 min read

How Cloud Computing, Big Data, and AI Intertwine to Power Modern Services

Baidu Geek Talk

Jun 15, 2021 · Industry Insights

What Baidu Unveiled at QCon 2021: Key Takeaways from 7 Cutting‑Edge Sessions

This article compiles Baidu experts' presentations at QCon 2021, covering unified quality‑efficiency delivery for feed recommendation, software engineering capabilities, AIOps fault‑management practices, Apache Doris real‑time analytics, large‑scale Service Mesh deployment, massive service‑governance techniques, and deep‑learning platform innovations, with speaker details and audience benefits.

AIBaiduBig Data

0 likes · 12 min read

What Baidu Unveiled at QCon 2021: Key Takeaways from 7 Cutting‑Edge Sessions

DataFunTalk

Jun 11, 2021 · Big Data

Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark

This article explains how to efficiently move large volumes of data from Hive to HBase by leveraging HBase's bulkload mechanism, detailing the original MapReduce workflow, its performance bottlenecks, and a rewritten Spark‑based solution that simplifies ETL, improves partitioning, and achieves several‑fold speedup.

Big DataETLHBase

0 likes · 17 min read

Comprehensive Guide to Fast and Stable Hive‑to‑HBase Data Transfer Using Bulkload, MapReduce, and Spark

Big Data Technology & Architecture

Jun 10, 2021 · Big Data

User Profiling: Concepts, Tag Classification, Tag‑System Construction, Applications and Implementation Steps

This article provides a comprehensive overview of user profiling, covering its definition, the five‑dimensional framework (goal, method, organization, standards, validation), various tag classifications, tag‑system architecture, modeling techniques, practical applications such as precise marketing and product innovation, and a step‑by‑step guide for building a profiling system using big‑data and AI methods.

Big DataCustomer Segmentationdata tagging

0 likes · 24 min read

User Profiling: Concepts, Tag Classification, Tag‑System Construction, Applications and Implementation Steps

Architecture Digest

Jun 10, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's streaming ETL solution built on Flink, covering business background, log characteristics, specialized and generic ETL services, architectural evolution, Python UDF integration, runtime optimizations, fault‑tolerance mechanisms, and future roadmap for unified real‑time and offline data warehouses.

Big DataFlinkLog Processing

0 likes · 19 min read

NetEase Game Streaming ETL Architecture and Practices Based on Flink

58 Tech

Jun 9, 2021 · Big Data

Designing and Implementing a Unified Data Metric System for 58 Commercial Data Team

This article explains how 58's commercial data team built a comprehensive data metric system—from identifying common metric definition issues to establishing a domain‑driven hierarchy, distinguishing atomic and derived metrics, implementing a unified metric management platform, and providing APIs and examples for querying and visualizing metrics.

Big DataJavadata governance

0 likes · 17 min read

Designing and Implementing a Unified Data Metric System for 58 Commercial Data Team

Xianyu Technology

Jun 8, 2021 · Big Data

Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data

The Longgong Data Analysis Platform enables Idle Fish to capture, store, and analyze billions of structured product attributes in real time across more than 8,000 categories, using TableStore, MySQL, ODPS, and a distributed scheduler to achieve over 50% query speedup, 80% category coverage, and rapid support for search and recommendation teams.

AlibabaBig DataData Platform

0 likes · 9 min read

Longgong Data Analysis Platform: Architecture and Solutions for Large‑Scale Structured Data

Alibaba Cloud Developer

Jun 8, 2021 · Artificial Intelligence

Can Low‑Code Bridge the Gap Between Business and AI? Insights on Its Future

The article explores how low‑code platforms can complement traditional algorithm development, enhance collaboration between business users and engineers, and accelerate big‑data and AI initiatives by improving data cleaning, modular design, and feedback loops, while highlighting the trade‑offs of abstraction and flexibility.

AIAlgorithm DevelopmentBig Data

0 likes · 9 min read

Can Low‑Code Bridge the Gap Between Business and AI? Insights on Its Future

Big Data Technology & Architecture

Jun 6, 2021 · Big Data

Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance

This article provides a comprehensive overview of data warehouses, explaining their purpose, differences from databases, OLTP vs OLAP, traditional versus internet data warehouse models, layered architecture, modeling theories, metric dictionaries, date dimensions, naming conventions, data governance, and incremental synchronization techniques with practical SQL examples.

Big DataETLdata governance

0 likes · 24 min read

Understanding Data Warehouses: Concepts, Architecture, Modeling, and Governance

DataFunTalk

Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data

0 likes · 17 min read

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

DataFunTalk

Jun 5, 2021 · Big Data

Building and Evolving a Data Service Platform for NetEase Cloud Music

The article details how NetEase Cloud Music co‑built a unified data service platform with NetEase YouShu, describing its architecture, phased development from internal use to online high‑concurrency services, feature enhancements such as API marketplace, multi‑source support, parameter conversion, and future roadmap for broader data products.

API PlatformBackendBig Data

0 likes · 16 min read

Building and Evolving a Data Service Platform for NetEase Cloud Music

dbaplus Community

Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data

0 likes · 14 min read

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

MaGe Linux Operations

Jun 3, 2021 · Big Data

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

This article introduces Kafka, LinkedIn’s high‑throughput distributed messaging system, explains its core concepts such as brokers, topics, partitions, offsets, producers, consumers, and consumer groups, outlines common use cases like asynchronous decoupling and data‑stream processing, and details its fast performance mechanisms, fault‑tolerance, installation, and configuration steps.

Big DataData StreamingInstallation

0 likes · 11 min read

Why Kafka Handles Billions of Messages: Architecture, Use Cases, and Fast Performance

ITFLY8 Architecture Home

Jun 3, 2021 · Big Data

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

This article walks through a complete Flink‑based recommendation system, detailing its v2.0 architecture, recommendation algorithms, front‑end and back‑end components, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 10 min read

Building a Real‑Time Flink Recommendation System: Architecture, Code & Deployment

dbaplus Community

Jun 2, 2021 · Databases

How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices

This article explains why data warehouses are critical for decision‑making, outlines the challenges of immature warehouses, and provides a step‑by‑step framework—including goal setting, technology selection, problem identification, domain modeling, layer design, modeling principles, and governance standards—to help teams build a robust, maintainable data warehouse.

Big DataData ArchitectureData Warehouse

0 likes · 22 min read

How to Build a Mature Data Warehouse: 7 Essential Steps and Best Practices

Big Data Technology Architecture

Jun 2, 2021 · Big Data

Practical Operations of NetEase Big Data Platform: Architecture, EasyOps, Monitoring, and Experience Sharing

The presentation details NetEase's big data platform operations, covering current usage, the internally built EasyOps control system, a generic service‑operation framework based on Ansible, Prometheus‑Grafana monitoring, configuration management, network and storage optimizations, and lessons learned from cloud migration.

AnsibleBig DataEasyOps

0 likes · 9 min read

Practical Operations of NetEase Big Data Platform: Architecture, EasyOps, Monitoring, and Experience Sharing

Tencent Advertising Technology

Jun 2, 2021 · Big Data

Tencent Advertising Real-Time Strategy Data Framework: Architecture, Performance, and High Availability

The article presents a detailed overview of Tencent Advertising's real‑time strategy data framework, explaining its role in the ad system, the challenges of massive log volumes, and the architectural, performance, and high‑availability solutions implemented to achieve fast, reliable, and scalable ad decision making.

Big DataDistributed SystemsReal-Time Strategy

0 likes · 24 min read

Tencent Advertising Real-Time Strategy Data Framework: Architecture, Performance, and High Availability

dbaplus Community

Jun 1, 2021 · Big Data

How Didi Boosted SQL Performance by 40%: Migrating 10k Hive Jobs to Spark

Didi migrated over 10,000 Hive SQL tasks to Spark SQL, achieving 85% Spark task share, cutting execution time by 40%, and reducing CPU and memory usage by 21% and 49% respectively, through a systematic migration process that addressed syntax, UDF, performance, and functional differences between the two engines.

Big DataHivePerformance Optimization

0 likes · 20 min read

How Didi Boosted SQL Performance by 40%: Migrating 10k Hive Jobs to Spark

Qunar Tech Salon

Jun 1, 2021 · Big Data

Integrating TensorFlow for Java with Spark‑Scala for Distributed Machine Learning Prediction

This article shares practical experience of building a high‑performance distributed prediction service by combining TensorFlow for Java with Spark‑Scala, covering framework selection, performance comparison, model training, loading, inference, deployment, and optimization techniques for large‑scale data processing.

Big DataJavaPerformance Optimization

0 likes · 16 min read

Integrating TensorFlow for Java with Spark‑Scala for Distributed Machine Learning Prediction

Top Architect

May 31, 2021 · Databases

How to Achieve Fast Queries: MySQL Index Optimization, Large‑Table Strategies, Elasticsearch Basics, and HBase Overview

This article explains common causes of slow MySQL queries, how proper indexing and lock handling can improve performance, introduces Elasticsearch’s inverted‑index advantages and suitable use cases, and outlines HBase’s column‑family storage model and row‑key design for large‑scale data.

Big DataDatabase OptimizationHBase

0 likes · 18 min read

How to Achieve Fast Queries: MySQL Index Optimization, Large‑Table Strategies, Elasticsearch Basics, and HBase Overview

IT Architects Alliance

May 30, 2021 · Big Data

NetEase Game Streaming ETL Architecture and Practices Based on Flink

This article presents NetEase Game's Flink‑based streaming ETL system, detailing business background, log classifications, specialized and generic ETL services, Python UDF integration, runtime optimizations, HDFS write tuning, SLA metrics, fault‑tolerance mechanisms, and future roadmap for unified data lakes and PyFlink support.

Big DataData IntegrationETL

0 likes · 19 min read

DataFunTalk

May 28, 2021 · Artificial Intelligence

JD's Open‑Source Federated Learning Solution 9N‑FL: Architecture, Features, Timeline, and Business Impact

This article introduces JD's open‑source federated learning platform 9N‑FL, explaining the data‑island problem, the fundamentals and classifications of federated learning, its four key features, the system’s layered architecture, development timeline, real‑world advertising use case results, and future enhancements.

9N-FLBig DataData Security

0 likes · 15 min read

JD's Open‑Source Federated Learning Solution 9N‑FL: Architecture, Features, Timeline, and Business Impact

58 Tech

May 28, 2021 · Big Data

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

This article details the end‑to‑end upgrade of a 5000‑node Hadoop 2.6.0 cluster to Hadoop 3.2.1 at 58.com, covering HDFS migration, RBF and EC adoption, Yarn federation and rolling upgrades, MR3 integration, extensive compatibility testing, and operational lessons learned for large‑scale big‑data platforms.

Big DataCluster UpgradeHDFS

0 likes · 19 min read

Practical Upgrade Experience of Hadoop 3.2.1 in 58.com Data Platform: HDFS, YARN, and MR3

IT Architects Alliance

May 27, 2021 · Big Data

Mastering Data Model Architecture: Layered Design & Naming Best Practices

This article presents a comprehensive guide to data model architecture, detailing layered data store definitions, classification structures, processing flow, naming conventions, and core design principles to help engineers build scalable, maintainable data warehouses.

Best PracticesBig DataData Architecture

0 likes · 8 min read

Mastering Data Model Architecture: Layered Design & Naming Best Practices

dbaplus Community

May 27, 2021 · Big Data

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

This article details Vipshop's OLAP evolution, describing how Presto, Kylin, and ClickHouse are integrated, the deployment architecture with HAproxy and chproxy, containerization on Kubernetes, and the Flink‑ClickHouse pipeline that enables self‑service analysis of hundred‑billion‑row datasets while addressing performance challenges and future roadmap.

Big DataClickHouseData Warehouse

0 likes · 28 min read

How Vipshop Scales Billion‑Row OLAP with ClickHouse, Presto, and Flink

Tencent Cloud Developer

May 27, 2021 · Big Data

An Introduction to Kafka: Architecture, Core Components, Service Governance, Performance Optimizations, and Installation Guide

Kafka is a high‑throughput distributed publish‑subscribe system that uses brokers, topics, partitions, offsets, producers, consumers, and Zookeeper for metadata and leader election, offering fast sequential disk writes, page‑cache zero‑copy transfers, ISR‑based replication, and includes step‑by‑step installation of JDK, Zookeeper, and Kafka.

Big DataDistributed MessagingInstallation

0 likes · 11 min read

An Introduction to Kafka: Architecture, Core Components, Service Governance, Performance Optimizations, and Installation Guide

Top Architect

May 26, 2021 · Big Data

Comprehensive Introduction to Apache Kafka: Concepts, Architecture, Installation, and Usage

This article provides a comprehensive guide to Apache Kafka, covering its core concepts, architecture, key APIs, topics and partitions, deployment steps, multi‑broker clustering, fault tolerance, and data integration using Kafka Connect, with detailed command‑line examples.

Big DataConsumerDistributed Streaming

0 likes · 26 min read

Comprehensive Introduction to Apache Kafka: Concepts, Architecture, Installation, and Usage

IT Architects Alliance

May 25, 2021 · Big Data

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

This article provides a comprehensive technical overview of data middle platforms, covering data aggregation, offline and real‑time development, smart operations, data asset management, governance, service layers, platform implementations, warehouse layering, and key differences between offline and real‑time data warehouses.

Big DataData PlatformData Warehouse

0 likes · 26 min read

How Modern Data Middle Platforms Power Real‑Time and Offline Analytics

Alibaba Terminal Technology

May 25, 2021 · Frontend Development

Inside Alibaba’s Front‑End Visualization Showcase: Insights from CSIG’s Campus‑to‑Enterprise Event

The CSIG Visualization and Visual Analysis Committee’s visit to Alibaba’s Xixi Campus on May 21, 2021 brought together leading academics and industry experts to discuss graph data, big‑data research, spatio‑temporal data, low‑code design, and cutting‑edge visualization techniques, fostering deep industry‑academia collaboration.

Big Dataindustry‑academialow‑code

0 likes · 7 min read

Inside Alibaba’s Front‑End Visualization Showcase: Insights from CSIG’s Campus‑to‑Enterprise Event

Full-Stack Internet Architecture

May 25, 2021 · Backend Development

Comprehensive Interview Experience Summary and Preparation Guide for Major Tech Companies

This article compiles detailed interview experiences, question lists, and practical advice for candidates targeting backend, big‑data, and cloud positions at leading Chinese tech firms, offering timelines, personal background, preparation tips, and reflections to help job seekers navigate multi‑round technical interviews efficiently.

Big DataInterviewSystem Design

0 likes · 28 min read

Comprehensive Interview Experience Summary and Preparation Guide for Major Tech Companies

Architects Research Society

May 23, 2021 · Big Data

Data Architecture Trends: From Chaos to an Organized Era – Insights from Anthony J. Algmin

The article reviews Anthony J. Algmin’s reflections on past data‑architecture predictions, current hot topics such as cloud, AI/ML, data governance, and real‑time analytics, and forecasts future trends including metadata management, blockchain, and the evolving role of data architects within enterprises.

Artificial IntelligenceBig DataData Architecture

0 likes · 13 min read

Data Architecture Trends: From Chaos to an Organized Era – Insights from Anthony J. Algmin

DataFunTalk

May 22, 2021 · Databases

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

The article examines the strengths and weaknesses of combining HBase and Elasticsearch for massive data storage and retrieval, outlines three integration patterns and their challenges, and presents Alibaba Cloud's Lindorm Searchindex as a SQL‑driven, low‑cost, strongly consistent solution that simplifies development and improves performance.

Big DataElasticsearchHBase

0 likes · 11 min read

Combining HBase and Elasticsearch: Challenges and the Lindorm Searchindex Solution

DeWu Technology

May 22, 2021 · Big Data

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

A unified semantic layer for data development solves metric‑change ripple effects, developer burden, and large‑scale query performance problems by offering consistent metric definitions, multi‑view access, concise auto‑generated SQL, instant propagation of updates, and engine‑driven optimal query selection, thereby bridging business and engineering and cutting maintenance effort.

Big DataData engineeringOLAP

0 likes · 5 min read

Unified Semantic Layer for Data Development: Addressing Pain Points and Optimizing Queries

Top Architect

May 22, 2021 · Big Data

Kafka Basics: Topics, Partitions, Producers, Consumers, and Cluster Architecture

This article provides a comprehensive introduction to Kafka, covering its role as a message system, core concepts such as topics, partitions, producers, consumers, messages, the cluster architecture with replicas and controllers, performance optimizations, log segmentation, and network design, all illustrated with diagrams and code examples.

Big DataKafkaMessage queue

0 likes · 13 min read

Kafka Basics: Topics, Partitions, Producers, Consumers, and Cluster Architecture

Programmer DD

May 22, 2021 · Big Data

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

This article explains the concept of a data lake—its origin in 2011, how it differs from traditional databases and data warehouses, its core characteristics such as raw data storage, on‑demand computing, and schema‑on‑read, as well as its advantages, challenges, architectural components, and future outlook within the big‑data ecosystem.

Big DataData ArchitectureData Lake

0 likes · 20 min read

What Is a Data Lake? Origins, Architecture, and How It Powers Modern Big Data

IT Architects Alliance

May 22, 2021 · Big Data

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

This article presents a comprehensive walkthrough of a Flink‑powered recommendation system, detailing its v2.0 architecture, module functions, recommendation algorithms (hotness, product similarity, collaborative filtering), front‑end and back‑end UI, and step‑by‑step Docker deployment of MySQL, Redis, HBase, and Kafka services.

Big DataDockerFlink

0 likes · 11 min read

Flink-Based Real‑Time Recommendation System: Architecture, Logic, and Docker Deployment Guide

NetEase Game Operations Platform

May 22, 2021 · Big Data

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

This article systematically introduces NetEase Kyuubi, an open‑source high‑performance JDBC and SQL execution engine built on Apache Spark, covering its background, core architecture, service discovery, session and operation management, startup processes, and key source‑code implementations with detailed code examples.

Apache ThriftBig DataDistributed computing

0 likes · 47 min read

Comprehensive Overview and Source Code Analysis of NetEase Spark Kyuubi

Tencent Cloud Developer

May 21, 2021 · Big Data

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

Tencent Cloud Oceanus, a computing service powering internal apps like WeChat and external partners such as Bilibili, scales to over 30,000 cores handling 5 PB daily and 500,000 jobs, and tackles Flink SQL’s syntax, function and operational limits with table‑valued functions, incremental and enhanced tumble windows, and caching‑based retraction optimization that cuts downstream data volume up to 30× and improves join performance by about 20 %.

Big DataFlink SQLOceanus

0 likes · 19 min read

Tencent Cloud Oceanus: Flink SQL Optimization and Extension Practices

UCloud Tech

May 21, 2021 · Big Data

How US3 Hadoop Adapter Cuts Big Data Storage Costs and Boosts Performance

This article explains how UCloud's US3 object storage, combined with a custom Hadoop adapter, separates compute and storage, optimizes file system operations, and leverages caching and specialized APIs to dramatically reduce storage costs and improve read/write performance for large‑scale Hadoop workloads.

Big DataCacheHadoop

0 likes · 13 min read

How US3 Hadoop Adapter Cuts Big Data Storage Costs and Boosts Performance

iQIYI Technical Product Team

May 21, 2021 · Big Data

Design and Implementation of iQIYI's User Feedback Analysis System

iQIYI built an in‑house user‑feedback analysis system that automatically ingests multi‑channel data, classifies and clusters issues, assesses feedback quality, localizes problems, and streamlines repair closure, boosting recall accuracy, alarm precision, closure rates and reducing cycle time across business lines to enhance user experience.

AIBig Dataclassification

0 likes · 15 min read

Design and Implementation of iQIYI's User Feedback Analysis System

Byte Quality Assurance Team

May 19, 2021 · Big Data

Streaming 102: The World Beyond Batch

This article extends the concepts introduced in Streaming 101 by deeply exploring data processing patterns for unbounded data, covering windowing, watermarks, triggers, accumulation modes, and their practical implications for building robust low‑latency streaming pipelines.

Big DataStreamingTriggers

0 likes · 14 min read

Big Data Technology & Architecture

May 19, 2021 · Big Data

Comprehensive Guide to Data Governance: Metadata, Data Quality, Standards, and Asset Management

This article provides an extensive overview of data governance in the big‑data era, covering common pitfalls, the role of metadata, data quality management, data standardization, and data asset management, and offers practical recommendations for organizations to implement effective governance practices.

Big DataData Asset ManagementData Quality

0 likes · 42 min read

Comprehensive Guide to Data Governance: Metadata, Data Quality, Standards, and Asset Management

Tencent Cloud Developer

May 19, 2021 · Industry Insights

How Cloud‑Native Principles Transform Big Data Infrastructure

The article analyzes how cloud‑native concepts such as DevOps, micro‑services, continuous delivery, and containerization can be applied to big‑data foundations, outlining four guiding principles—industrialized delivery, cost quantification, load‑adaptive scaling, and data‑centric design—and describing concrete Hadoop‑based architectures and Tencent Cloud solutions that lower cost while boosting performance.

Big DataData InfrastructureHadoop

0 likes · 22 min read

How Cloud‑Native Principles Transform Big Data Infrastructure

UCloud Tech

May 18, 2021 · Big Data

Step‑by‑Step Guide to Deploy UCloud’s Free USDP for Big Data

This article provides a comprehensive tutorial on installing UCloud's free USDP version for private big‑data deployments, covering environment preparation, minimum node specifications, resource download, configuration files, one‑click initialization scripts, server startup, web UI access, license acquisition, and optional manual setup procedures.

Big DataLinuxUCloud

0 likes · 16 min read

Step‑by‑Step Guide to Deploy UCloud’s Free USDP for Big Data

Alibaba Cloud Native

May 17, 2021 · Big Data

How Vineyard Accelerates Cloud‑Native Big Data Workflows with Zero‑Copy Memory Sharing

Vineyard, an open‑source distributed memory data‑sharing engine, tackles the inefficiencies of traditional file‑system based big‑data pipelines by enabling zero‑copy, in‑memory object exchange, Kubernetes‑aware scheduling, and plug‑in operators, delivering up to 1.34× faster end‑to‑end execution.

Big DataMemory SharingVineyard

0 likes · 10 min read

How Vineyard Accelerates Cloud‑Native Big Data Workflows with Zero‑Copy Memory Sharing

Beijing SF i-TECH City Technology Team

May 17, 2021 · Artificial Intelligence

AIOps Overview: Concepts, Applications, and Case Studies

This article provides a comprehensive overview of AIOps, covering its definition, evolution from manual to AI-driven operations, core capabilities, and real-world applications in capacity prediction, anomaly detection, and alarm merging, illustrated with case studies from a food‑retail giant and internal logistics.

Anomaly DetectionArtificial IntelligenceBig Data

0 likes · 13 min read

AIOps Overview: Concepts, Applications, and Case Studies

Architecture Digest

May 17, 2021 · Big Data

Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices

The article provides a comprehensive technical overview of Toutiao's rapid growth, detailing its massive user base, data collection and processing pipelines, user modeling, cold‑start strategies, recommendation engines, storage solutions, push notification mechanisms, and the underlying microservice and PaaS architecture.

Big DataHadoopKafka

0 likes · 8 min read

Technical Architecture Overview of Toutiao: Data Pipeline, User Modeling, Recommendation System, and Microservices