Tagged articles
3697 articles
Page 24 of 37
Beike Product & Technology
Beike Product & Technology
Nov 27, 2020 · Artificial Intelligence

Mining User Housing Preference Schemes with Supply‑Filtered Tree‑Based Methods

The article proposes a supply‑filtered, tree‑based approach to discover multi‑dimensional user housing preference schemes, contrasting fixed‑length preference mining methods, and details algorithmic modules such as split‑point search, similarity calculation, split suppression, and user clustering to improve interpretability and offline applicability.

AIBig Datahousing recommendation
0 likes · 13 min read
Mining User Housing Preference Schemes with Supply‑Filtered Tree‑Based Methods
Practical DevOps Architecture
Practical DevOps Architecture
Nov 27, 2020 · Big Data

Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster

This tutorial provides a complete walkthrough for downloading Hadoop 2.8.2, setting up a three‑node master‑slave cluster, configuring core, HDFS, MapReduce and YARN settings, creating required directories, distributing the installation, starting the services, verifying the cluster status, and finally shutting it down.

Big DataCluster SetupHDFS
0 likes · 5 min read
Step-by-Step Guide to Install and Configure a Hadoop 2.8.2 Cluster
dbaplus Community
dbaplus Community
Nov 26, 2020 · Big Data

Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber

This article examines how leading Silicon Valley companies such as EA, Twitter, Airbnb, and Uber design and operate data middle platforms—detailing their architectures, data collection pipelines, standardization efforts, real‑time and batch processing, and the business impact of shared data capabilities.

Big DataData ArchitectureData Platform
0 likes · 25 min read
Silicon Valley's Data Middle Platform Secrets: EA, Twitter, Airbnb, Uber
DataFunTalk
DataFunTalk
Nov 26, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology

This article details the evolution of 58.com’s commercial data warehouse across three phases—1.0, 2.0, and 3.0—covering its scale, four‑layer architecture, migration from legacy Hadoop‑MapReduce pipelines to Flume/Kafka and Flink streaming, code optimizations, monitoring, and productization for real‑time business insights.

Big DataETLHadoop
0 likes · 9 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Architecture and Technology
DataFunTalk
DataFunTalk
Nov 24, 2020 · Artificial Intelligence

Building Next‑Generation Data Intelligence Infrastructure with Knowledge Graphs: From New Infrastructure to Cognitive AI Platforms

This presentation explains how knowledge graphs serve as the foundation for new‑infrastructure initiatives, detailing the evolution of AI from perception to cognition, the role of big‑data centers, DIKW modeling, intelligent data governance, and the construction of a cognitive AI middle‑platform for industry applications.

AI InfrastructureArtificial IntelligenceBig Data
0 likes · 18 min read
Building Next‑Generation Data Intelligence Infrastructure with Knowledge Graphs: From New Infrastructure to Cognitive AI Platforms
Big Data Technology Architecture
Big Data Technology Architecture
Nov 24, 2020 · Big Data

Using DeltaLake for Industrial Data Platforms: Distributed Stream Processing, Batch‑Stream Fusion, and Transactional Support

This article shares practical experiences of building an industrial data middle‑platform with DeltaLake, covering heterogeneous distributed stream handling, batch‑stream unified analytics, and transactional/algorithm support to improve data timeliness, reliability, and operational efficiency in manufacturing environments.

Batch-Stream FusionBig DataDeltaLake
0 likes · 11 min read
Using DeltaLake for Industrial Data Platforms: Distributed Stream Processing, Batch‑Stream Fusion, and Transactional Support
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 23, 2020 · Big Data

How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres

Alibaba’s Customer Experience (CCO) team transformed its real‑time data platform by evolving from a Lambda‑style database architecture to a cloud‑native real‑time data warehouse powered by Hologres and Flink, achieving higher throughput, lower latency, reduced costs, and self‑service analytics for massive Double‑11 traffic.

AlibabaBig DataFlink
0 likes · 15 min read
How Alibaba’s CCO Built a Cloud‑Native Real‑Time Data Warehouse with Hologres
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 19, 2020 · Databases

How AnalyticDB Powers Double 11: Cloud‑Native Data Warehouse Innovations

AnalyticDB, a cloud‑native MySQL‑compatible data warehouse, delivered extreme performance during Double 11 by handling billions of orders with ultra‑high write TPS, while introducing compute‑storage separation, hot‑cold tiering, resource groups, elastic scaling and intelligent optimization to meet demanding real‑time analytics workloads.

AnalyticDBBig DataResource Groups
0 likes · 17 min read
How AnalyticDB Powers Double 11: Cloud‑Native Data Warehouse Innovations
Meituan Technology Team
Meituan Technology Team
Nov 19, 2020 · Big Data

Optimizing Apache Kylin for High‑Performance OLAP in Meituan's Sales System

Meituan’s sales system “Qingtian” boosted OLAP performance by migrating Apache Kylin’s build engine from MapReduce to Spark, consolidating Hive files, refining dictionary creation, applying a By‑layer algorithm, and bulk‑loading cuboid files to HBase, cutting resource consumption and halving build time, ultimately reaching a 100 % SLA.

Apache KylinBig DataMeituan
0 likes · 15 min read
Optimizing Apache Kylin for High‑Performance OLAP in Meituan's Sales System
Tencent Tech
Tencent Tech
Nov 19, 2020 · Cloud Computing

How Tencent Built a Massive Cloud Storage System to Power QQ Album and Beyond

This article chronicles Tencent's journey from the early development of the TFS distributed storage platform to large‑scale data migrations, flexible bandwidth strategies, and the creation of the cloud‑native YottaStore, illustrating how a small architecture team solved massive storage challenges for billions of users.

Big DataData MigrationYottaStore
0 likes · 15 min read
How Tencent Built a Massive Cloud Storage System to Power QQ Album and Beyond
DeWu Technology
DeWu Technology
Nov 19, 2020 · Operations

HBase Operations and Use Cases for High‑Concurrency E‑commerce

In this talk, Yun Jin explains how HBase’s petabyte‑scale, horizontally‑scalable architecture—built on Hadoop, HMaster, RegionServers, and Zookeeper—enables e‑commerce platforms to handle extreme promotion‑day traffic by supporting high‑throughput reads/writes, time‑series monitoring, massive order storage, and robust HA, while covering essential table operations, monitoring, and troubleshooting techniques.

Big DataHBaseOperations
0 likes · 6 min read
HBase Operations and Use Cases for High‑Concurrency E‑commerce
Java High-Performance Architecture
Java High-Performance Architecture
Nov 18, 2020 · Big Data

Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks

This article examines Apache Pulsar, an open‑source messaging platform created by Yahoo, compares it with Kafka by outlining Kafka’s common pain points, highlights Pulsar’s multi‑tenant architecture, layered storage, built‑in functions, and security features, and discusses the trade‑offs of each solution.

Apache PulsarBig DataDistributed Systems
0 likes · 6 min read
Why Pulsar Might Outperform Kafka: Key Advantages and Drawbacks
JD Tech Talk
JD Tech Talk
Nov 17, 2020 · Databases

JUST Engine: Novel Spatio‑Temporal Indexes and Data Models for Large‑Scale Urban Data Management

The article introduces the JUST engine, a spatio‑temporal data platform that extends GeoMesa with three new indexes (Z2T, XZ2T, time_range), defines nine common and three specialized data models, provides default indexing strategies, and offers detailed SQL usage guidelines for efficient querying of massive urban datasets.

Big DataDatabasesGeoMesa
0 likes · 25 min read
JUST Engine: Novel Spatio‑Temporal Indexes and Data Models for Large‑Scale Urban Data Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 16, 2020 · Big Data

Understanding Data Skew in Big Data: Causes, Symptoms, and Solutions for Hadoop and Spark

This article explains what data skew is, why it occurs in large‑scale Hadoop and Spark jobs, how to recognize its symptoms such as stuck reducers or OOM executors, and presents practical strategies—including business‑level adjustments, code refactoring, and platform‑specific tuning—to mitigate the problem.

Big DataHadoopSpark
0 likes · 13 min read
Understanding Data Skew in Big Data: Causes, Symptoms, and Solutions for Hadoop and Spark
Alibaba Cloud Native
Alibaba Cloud Native
Nov 16, 2020 · Cloud Native

What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment

Fluid 0.4 introduces a DataLoad custom resource for declarative data pre‑warming, enhances support for massive small‑file datasets, adds HDFS‑compatible access for Spark and other big‑data frameworks, and enables mixed‑deployment of multiple datasets on a single node, all backed by significant performance gains.

AIAlluxioBig Data
0 likes · 8 min read
What’s New in Fluid 0.4? DataLoad, Small‑File Boost, HDFS Support & Multi‑Dataset Deployment
DataFunSummit
DataFunSummit
Nov 15, 2020 · Big Data

Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink

This article details the three‑stage evolution of 58.com’s commercial data warehouse, describing its massive scale, four‑layer architecture, technical challenges, migrations from MapReduce to Hive and Flink, real‑time streaming upgrades, and the resulting improvements in stability, accuracy, and timeliness.

Big DataData ArchitectureData Warehouse
0 likes · 10 min read
Evolution of 58.com Commercial Data Warehouse: From 0‑1 to 3.0 Using Hadoop, Flume, Kafka, Spark, and Flink
Beike Product & Technology
Beike Product & Technology
Nov 13, 2020 · Big Data

Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook

The article summarizes Beike's one‑stop big data development platform, describing its data business background, the evolution from a simple Hadoop‑Kafka‑Hive stack to a metadata‑driven, asset‑oriented platform, and outlines current capabilities in data management, integration, scheduling, quality, openness, and future plans.

Big DataData PlatformData engineering
0 likes · 11 min read
Beike One‑Stop Big Data Development Platform: Architecture, Evolution, and Future Outlook
Tencent Cloud Developer
Tencent Cloud Developer
Nov 13, 2020 · Big Data

Apache Spark Core: Architecture, Components, and Execution Flow

Apache Spark Core is a high‑performance, fault‑tolerant engine that abstracts distributed computation through SparkContext, DAG and Task schedulers, supports in‑memory and disk storage, runs on various cluster managers (YARN, Kubernetes, etc.), and unifies batch, streaming, ML and graph processing via its rich ecosystem.

Apache SparkBig DataDAG scheduler
0 likes · 17 min read
Apache Spark Core: Architecture, Components, and Execution Flow
DataFunSummit
DataFunSummit
Nov 12, 2020 · Big Data

OLAP Engine Selection and Challenges in Large-Scale Data at Youku

This article explores the challenges big data brings to traditional data technologies and reviews various OLAP solutions—including MPP, batch processing, pre‑computation, and Hadoop‑based engines—while detailing Youku’s specific business scenarios and how different OLAP engines are selected to meet performance, scalability, and real‑time analysis requirements.

AnalyticsBig DataData Warehouse
0 likes · 14 min read
OLAP Engine Selection and Challenges in Large-Scale Data at Youku
Xianyu Technology
Xianyu Technology
Nov 11, 2020 · Industry Insights

How Alibaba’s Double‑11 Tech Stack Powers Record‑Breaking Live Commerce

Alibaba’s Double 11 2023 showcased a suite of cutting‑edge technologies—including the GRTN real‑time transmission network, edge‑AI voice interaction, massive digital infrastructure, AI‑driven smart sample rooms, and 3D virtual home‑decoration live streams—that together delivered sub‑second latency, 30% cost reduction, and unprecedented merchant scalability.

3D virtual realityBig DataDigital Infrastructure
0 likes · 11 min read
How Alibaba’s Double‑11 Tech Stack Powers Record‑Breaking Live Commerce
Architect
Architect
Nov 11, 2020 · Big Data

Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips

This article explains how to build a real‑time click‑stream data warehouse using Flink for stream processing and ClickHouse for near‑real‑time OLAP, covering click‑stream characteristics, dimensional modeling, layered warehouse design, async dimension joins, sink implementation, and data rebalancing strategies.

Big DataClick StreamClickHouse
0 likes · 7 min read
Real-time Click Stream Data Warehouse with Flink and ClickHouse: Architecture, Layered Design, and Practical Tips
DataFunTalk
DataFunTalk
Nov 11, 2020 · Big Data

Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business

This article details the high‑complexity logistics scenario of Cainiao's international import business, explains the evolution from offline to real‑time data warehouses (versions 1.0 and 2.0), describes the layered architecture, enumerates technical challenges such as multi‑source joins, state explosion, out‑of‑order processing, and presents concrete solutions using Flink features, logical middle‑layers, union‑all joins, deduplication, timer services, and batch‑stream hybrid processing.

Big DataFlinkstate-management
0 likes · 21 min read
Evolution and Practices of Cainiao's Real‑Time Data Warehouse for International Import Business
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2020 · Big Data

Flume Tuning Guide for High‑Throughput Data Ingestion

This article explains how to identify and resolve performance bottlenecks in Apache Flume by configuring Taildir sources, optimizing channel capacities, tuning Kafka sinks, adjusting JVM options, and using simple monitoring scripts, enabling a single Flume‑NG agent to sustain over 50,000 RPS in production.

Big DataFlumeKafka
0 likes · 10 min read
Flume Tuning Guide for High‑Throughput Data Ingestion
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 6, 2020 · Big Data

Recommended Technical Articles: iQiyi Effect Advertising Exploration and Druid Practice

This recommendation highlights two iTech Forum pieces—one detailing iQiyi’s effect advertising exploration and implementation, and another documenting the company’s Druid practice and technical evolution—providing readers with in‑depth case studies, performance insights, and practical guidance for similar large‑scale data and advertising systems.

AdvertisingBig DataDruid
0 likes · 1 min read
Recommended Technical Articles: iQiyi Effect Advertising Exploration and Druid Practice
StarRing Big Data Open Lab
StarRing Big Data Open Lab
Nov 5, 2020 · Cloud Native

How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure

This article reviews the challenges of scheduling heterogeneous workloads—micro‑services, big‑data, AI, and HPC—on a unified cloud‑native platform, compares existing schedulers like Mesos and YARN, examines Kubernetes ecosystem extensions such as Volcano and YuniKorn, and details the design and components of the Transwarp Scheduler built on Kubernetes Scheduling Framework v2.

AIBig DataScheduler
0 likes · 16 min read
How Transwarp Scheduler Tackles Mixed Workloads in Unified Cloud‑Native Infrastructure
dbaplus Community
dbaplus Community
Nov 3, 2020 · Big Data

How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse

Ctrip’s hotel data team tackled a 3 TB daily data load by building a ClickHouse cluster on VMware, creating custom sync and execution tools, applying query optimizations, and handling merge and memory errors, ultimately achieving over 400% performance gains across multiple reporting themes.

Big DataClickHouseData Warehouse
0 likes · 7 min read
How Ctrip Boosted Hotel Data Warehouse Performance 400% with ClickHouse
AntTech
AntTech
Nov 2, 2020 · Frontend Development

Opportunities and Challenges of Enterprise Data Visualization Applications

The talk outlines why enterprise data visualization is essential for extracting value from massive, multi‑dimensional data, describes design and development challenges, presents AntV's comprehensive frontend visualization solutions, and predicts future trends such as intelligent, democratized, and decision‑integrated visual analytics.

AntVBig DataData visualization
0 likes · 15 min read
Opportunities and Challenges of Enterprise Data Visualization Applications
Liangxu Linux
Liangxu Linux
Nov 2, 2020 · Big Data

Master Shell Tricks to Analyze Beijing Points‑Based Residency Data in Seconds

This article demonstrates how to use standard shell utilities such as grep, cut, sort, uniq, awk, and join to quickly extract insights—like top companies, common surnames, popular given names, age distribution, and hometown rankings—from a JSON dataset of Beijing points‑based residency applicants.

Big DataData AnalysisJSON
0 likes · 13 min read
Master Shell Tricks to Analyze Beijing Points‑Based Residency Data in Seconds
Top Architect
Top Architect
Oct 31, 2020 · Big Data

Building a Zhihu User Data Crawler and Large‑Scale Analysis with SpringBoot, SeimiCrawler, RabbitMQ, ElasticSearch, and Kibana

This article describes how to build a Java‑based crawler to collect millions of Zhihu user profiles, handle anti‑crawling measures with rotating user‑agents and a proxy pool, deduplicate data using a Bloom filter, import the results into ElasticSearch, and analyze the dataset with Kibana and ECharts visualizations.

Big DataElasticsearchJava
0 likes · 15 min read
Building a Zhihu User Data Crawler and Large‑Scale Analysis with SpringBoot, SeimiCrawler, RabbitMQ, ElasticSearch, and Kibana
Tencent Cloud Middleware
Tencent Cloud Middleware
Oct 30, 2020 · Cloud Computing

How KonaJDK Powers Tencent Cloud Java, Big Data, and Secure Computing

This article explains how Tencent's self‑developed KonaJDK underpins cloud Java services, enhances micro‑service monitoring, adds national cryptography support, optimizes large‑heap tools like jmap, and delivers performance gains for big‑data workloads, while contributing key features back to the OpenJDK community.

Big DataCloud ComputingJVM
0 likes · 11 min read
How KonaJDK Powers Tencent Cloud Java, Big Data, and Secure Computing
ITPUB
ITPUB
Oct 30, 2020 · Fundamentals

Why Java Remains the Dominant Programming Language Across Industries

The article outlines Java’s history, its widespread adoption by top companies, key features such as simplicity, portability and security, and its extensive use in big‑data frameworks, IoT, Android, finance, web development, scientific tools, and cloud services, arguing why it will stay popular.

Big DataIoTJava
0 likes · 11 min read
Why Java Remains the Dominant Programming Language Across Industries
21CTO
21CTO
Oct 30, 2020 · Big Data

Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared

This article reviews the background, requirements, and architectural designs of major open‑source log collection systems—including Facebook’s Scribe, Apache’s Chukwa, LinkedIn’s Kafka, Cloudera’s Flume—and evaluates mature monitoring tools such as ELK, highlighting their features, use cases, advantages, and drawbacks for large‑scale log processing.

Big DataELKFlume
0 likes · 18 min read
Which Log Collection System Wins? Scribe, Chukwa, Kafka, Flume & ELK Compared
Zhongtong Tech
Zhongtong Tech
Oct 30, 2020 · Big Data

How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive

This article details ZTO Express's journey of adopting Apache Kylin for OLAP, comparing it with Presto, describing platform architecture, performance gains, integration with scheduling and monitoring systems, and the practical optimizations and future plans that enabled sub‑second query responses on massive daily data volumes.

Apache KylinBig DataHBase
0 likes · 16 min read
How Apache Kylin Supercharged OLAP at ZTO Express: A Deep Dive
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 29, 2020 · Frontend Development

How Big Data and AI Are Redefining Front‑End Development

From the early days of static web pages to today's data‑driven, AI‑enhanced interfaces, this article explores how the big‑data boom and artificial‑intelligence advances since 2010 have transformed front‑end technologies, driving innovations in data visualization, web‑based software, and diverse user interactions.

AIBig DataData visualization
0 likes · 11 min read
How Big Data and AI Are Redefining Front‑End Development
Tencent Cloud Developer
Tencent Cloud Developer
Oct 19, 2020 · Big Data

Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR

By parallelizing Spark’s driver‑side commit, trash, and move phases—previously single‑threaded operations that caused costly copy‑on‑rename when writing massive files to object storage—the Tencent Cloud EMR case achieved over a tenfold (1,100 %) speedup, making object storage a viable alternative to HDFS.

Big DataDistributed computingEMR
0 likes · 8 min read
Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR
ITPUB
ITPUB
Oct 16, 2020 · Big Data

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

Big DataCalciteFlink
0 likes · 8 min read
How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite
Yuewen Technology
Yuewen Technology
Oct 16, 2020 · Artificial Intelligence

How Intelligent Traffic Distribution Boosts New Book Exposure in Reading Apps

This article describes the design and implementation of an intelligent traffic distribution system for a reading platform, detailing its background, overall architecture, sub-modules such as the small‑traffic experiment platform, near‑line computation, retrieval strategies, pacing algorithms, and how it balances user personalization with content ecosystem growth.

AIBig DataReal-time Streaming
0 likes · 8 min read
How Intelligent Traffic Distribution Boosts New Book Exposure in Reading Apps
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 15, 2020 · Big Data

Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis

This article describes Meituan's massive OLAP workloads, the specific challenges of data scale, complex schemas, and precise counting, explains how Apache Kylin was integrated using wide tables and bitmap deduplication, compares its performance and features with Presto, Druid and other engines, and outlines future improvements.

Apache KylinBig DataData Warehouse
0 likes · 19 min read
Meituan's OLAP Requirements and Apache Kylin Deployment: Architecture, Challenges, and Comparative Analysis
Alibaba Cloud Developer
Alibaba Cloud Developer
Oct 11, 2020 · Operations

How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data

Alibaba Cloud’s Log Service (SLS) has evolved into a unified observability middle‑platform that handles tens of petabytes daily, offering integrated storage, processing, and AI‑driven analysis for logs, metrics, and traces, while addressing challenges of data ingestion, performance, and scalability across diverse Ops scenarios.

Big DataLog AnalyticsObservability
0 likes · 16 min read
How Alibaba’s SLS Powers a Unified Observability Platform for Massive Data
ITPUB
ITPUB
Oct 10, 2020 · Big Data

How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations

Didi’s three‑year journey with Presto transformed it into the company’s primary ad‑hoc and Hive‑SQL acceleration engine, serving over 6 000 users, processing 2‑3 PB of HDFS data daily, and achieving major gains in stability, performance, cost, and usability through extensive architectural tweaks, resource isolation, connector extensions, and monitoring enhancements.

Big DataCluster ManagementDruid Connector
0 likes · 18 min read
How Didi Scaled Presto for Petabyte‑Scale Queries: Architecture & Optimizations
JD Tech Talk
JD Tech Talk
Oct 10, 2020 · Big Data

Discovering Real-Time Reachable Areas Using Trajectory Connections

This article presents a novel method for real-time reachable area analysis that leverages recent trajectory data, introduces a Skip Graph Index for efficient query processing, predicts optimal trajectory‑splicing parameters with machine learning, and demonstrates its effectiveness through extensive experiments on multiple real‑world datasets.

Big Datak-value predictionreal-time reachable area
0 likes · 13 min read
Discovering Real-Time Reachable Areas Using Trajectory Connections
Didi Tech
Didi Tech
Oct 9, 2020 · Big Data

Presto at Didi: Architecture, Optimizations, and Operational Experience

At Didi, Presto has been the default ad‑hoc and Hive‑SQL engine for over three years, serving 6,000 users, processing 2‑3 PB daily and 30‑35 trillion rows, with mixed and dedicated clusters, migration to PrestoSQL 340, extensive Hive compatibility, label‑based isolation, a native Druid connector, usability and stability enhancements, and JVM‑level performance optimizations, while planning further resource‑saving upgrades.

Big DataCluster ManagementDistributed SQL
0 likes · 17 min read
Presto at Didi: Architecture, Optimizations, and Operational Experience
Alibaba Terminal Technology
Alibaba Terminal Technology
Oct 9, 2020 · Frontend Development

How Big Data and AI Are Redefining Front‑End Development

From the early days of static web pages to today’s data‑driven, AI‑enhanced interfaces, this article explores how the rise of big data platforms like Alibaba Cloud’s Feitian has transformed front‑end development through advanced visualization, software‑Web convergence, and diverse new interactions.

Big DataCloud ComputingData visualization
0 likes · 9 min read
How Big Data and AI Are Redefining Front‑End Development
DataFunTalk
DataFunTalk
Oct 7, 2020 · Big Data

Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework

This article outlines the Yanxuan data warehouse’s layered architecture, the offline and real‑time development platforms, the comprehensive standards for metric definition, model design, and SQL development, and proposes a six‑dimensional evaluation system covering data norms, security, quality, stability, continuous improvement, and development efficiency.

Big DataData engineeringSQL Standards
0 likes · 12 min read
Yanxuan Data Warehouse: Architecture, Standards, and Evaluation Framework
DataFunTalk
DataFunTalk
Sep 30, 2020 · Big Data

Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service

This article details Didi's end‑to‑end real‑time data warehouse design for the carpool business, covering its objectives, architecture layers from ODS to application, naming conventions, StreamSQL development, operational tooling, challenges faced, and future batch‑stream integration plans.

Big DataDidiFlink
0 likes · 20 min read
Real-time Data Warehouse Construction for Didi Ride-hailing's Carpool Service
IT Architects Alliance
IT Architects Alliance
Sep 29, 2020 · Big Data

How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms

Qualitis is a big‑data‑platform‑based data‑quality‑management service that defines, detects, and reports data‑set quality issues, featuring idempotent backend services, load‑balanced high‑availability, Zookeeper‑coordinated process synchronization, thread‑pool throttling, and clearly separated internal and external APIs.

Big DataData QualityQualitis
0 likes · 6 min read
How Qualitis Ensures High‑Availability Data Quality Monitoring on Big Data Platforms
Architects Research Society
Architects Research Society
Sep 29, 2020 · Big Data

Understanding DataOps: Principles, Benefits, and Implementation

DataOps, an Agile‑derived methodology that extends DevOps principles to data analytics, emphasizes automation, collaboration, and continuous delivery to accelerate and improve data processing, quality, and business insight, while outlining its benefits, relationship to Agile/DevOps, and practical steps for adoption.

Big DataContinuous AnalyticsDataOps
0 likes · 12 min read
Understanding DataOps: Principles, Benefits, and Implementation
Tencent Advertising Technology
Tencent Advertising Technology
Sep 29, 2020 · Artificial Intelligence

The Power of Data and AI: Highlights from the 2020 Tencent Advertising Algorithm Live Week

The 2020 Tencent Advertising Algorithm Live Week presented expert insights on federated learning, machine learning, big data, and deep‑learning applications in advertising, offering a comprehensive Q&A that explains how massive data fuels AI breakthroughs and reshapes business problem solving.

Big Datamachine learning
0 likes · 11 min read
The Power of Data and AI: Highlights from the 2020 Tencent Advertising Algorithm Live Week
High Availability Architecture
High Availability Architecture
Sep 29, 2020 · Artificial Intelligence

Architecture Design Overview of Recommendation Systems

This article reviews the core algorithm modules of recommendation systems from an architectural perspective, discussing offline, near‑line, and online layers, the trade‑offs between personalization, timeliness, and resource consumption, system boundaries, external dependencies, and the practical design of each layer.

AIBig Dataarchitecture
0 likes · 30 min read
Architecture Design Overview of Recommendation Systems
DataFunTalk
DataFunTalk
Sep 25, 2020 · Big Data

Meituan Waimai Data Warehouse: Architecture Evolution, Governance, and Future Roadmap

The article details Meituan Waimai's offline data warehouse evolution from its initial V1.0 design through V2.0 improvements to the V3.0 modeling‑tool driven architecture, covering the four‑layer framework, Spark‑based ETL, data governance processes, resource optimization, security measures, and future development plans.

Big DataETLMeituan
0 likes · 22 min read
Meituan Waimai Data Warehouse: Architecture Evolution, Governance, and Future Roadmap
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 24, 2020 · Big Data

HiveSQL Classic Optimization Cases: Partitioning, Subset Decomposition, and Percentile Approximation Improvements

This article presents three HiveSQL optimization case studies—restructuring a large‑scale query with partitioned tables, breaking a complex window‑function query into smaller subsets with joins, and refactoring excessive PERCENTILE_APPROX calls—demonstrating how each change reduces execution time from hours to minutes and improves overall performance.

Big DataHiveHiveSQL
0 likes · 9 min read
HiveSQL Classic Optimization Cases: Partitioning, Subset Decomposition, and Percentile Approximation Improvements
Java Architect Essentials
Java Architect Essentials
Sep 23, 2020 · Big Data

Evolution of JD.com Order Center Elasticsearch Cluster Architecture

The article details how JD.com's order center migrated its massive order query workload from MySQL to Elasticsearch, iteratively improving cluster isolation, node deployment, replica tuning, master‑slave redundancy, version upgrades, and data synchronization while addressing performance pitfalls such as deep pagination and FieldData usage.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD.com Order Center Elasticsearch Cluster Architecture
JD Tech Talk
JD Tech Talk
Sep 23, 2020 · Artificial Intelligence

Delivery Time Inference Based on Couriers' Trajectories

Leveraging large-scale courier trajectory data and spatiotemporal analytics, the DTInf framework infers parcel delivery times by detecting stay points, correcting delivery locations, and matching delivery events using a trained MLP model, achieving a mean absolute error of 401 seconds and outperforming baselines by over 30%.

Big DataLogisticscourier trajectories
0 likes · 10 min read
Delivery Time Inference Based on Couriers' Trajectories
Tencent Cloud Developer
Tencent Cloud Developer
Sep 22, 2020 · Big Data

Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility

Beike’s OLAP platform has progressed from a basic Hive‑MySQL batch pipeline to a Kylin‑based single‑engine solution, and now to a flexible multi‑engine architecture that uses a query‑engine layer to route metrics across Kylin, Druid, ClickHouse and Doris, dramatically cutting cube‑build times, supporting real‑time ingestion, and paving the way for further engine consolidation and automated performance routing.

Apache DruidApache KylinBeike
0 likes · 17 min read
Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2020 · Big Data

Understanding the Elasticsearch Master Election Process

This article explains when Elasticsearch triggers a master election, describes each election stage—including active master and candidate selection, Bully algorithm comparison, and master node responsibilities—while providing code excerpts that illustrate the underlying implementation details.

Big DataCluster ManagementDistributed Systems
0 likes · 8 min read
Understanding the Elasticsearch Master Election Process
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 18, 2020 · Big Data

Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management

This article explains how Kafka consumer groups accelerate message consumption by distributing partitions across multiple consumers, details the three key characteristics of consumer groups, and provides in‑depth guidance on partition assignment strategies and offset management with practical Java code examples.

Big DataKafkaOffset Management
0 likes · 13 min read
Understanding Kafka Consumer Groups, Partition Assignment, and Offset Management
Youku Technology
Youku Technology
Sep 18, 2020 · Big Data

Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture

Youku’s digital content‑supply‑chain system transforms long‑video production by introducing a three‑stage framework—structured evaluation of talent and scripts, information‑driven production management, and a unified demand‑aligned content strategy—that curtails delays, mitigates risk, and saves over 100 million RMB while scaling to billions of data records daily.

Artificial IntelligenceBig DataContent Supply Chain
0 likes · 11 min read
Digitalization of Youku Long‑Video Content Supply Chain: Practices and Architecture
Full-Stack Internet Architecture
Full-Stack Internet Architecture
Sep 17, 2020 · Big Data

How Big Data Is Used for Price Discrimination and the New Regulations to Stop It

The article explains how big‑data algorithms enable online price discrimination—often called “kill‑familiar” pricing—illustrates real‑world e‑commerce examples, outlines the recently enacted Chinese online tourism regulation prohibiting such practices, and discusses broader data‑privacy and security concerns.

Big DataData Privacyconsumer rights
0 likes · 6 min read
How Big Data Is Used for Price Discrimination and the New Regulations to Stop It
Programmer DD
Programmer DD
Sep 17, 2020 · Big Data

5 Open‑Source Quant Trading Tools Every Developer Should Explore

Discover five open‑source stock‑trading utilities—funds, ZVT, QUANTAXIS, StockAnalysisSystem, and match‑trade—each offering real‑time data, backtesting, multi‑asset support, and high‑performance matching to help programmers build powerful quantitative finance applications.

Big DataPythonQuantitative Trading
0 likes · 5 min read
5 Open‑Source Quant Trading Tools Every Developer Should Explore
DataFunTalk
DataFunTalk
Sep 17, 2020 · Big Data

Design and Implementation of a Scalable User Tag Production Platform

The article explains how a flexible, high‑performance user‑tagging system is built on a batch‑stream integrated architecture using big‑data technologies such as Impala, HDFS, and Flink to support both offline and real‑time label generation for precise marketing, product improvement, and operational analytics.

Big DataFlinkImpala
0 likes · 15 min read
Design and Implementation of a Scalable User Tag Production Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2020 · Big Data

Understanding Flink CEP's NFAb Automaton for Complex Event Processing

This article explains how Flink's Complex Event Processing (CEP) library implements pattern matching using a nondeterministic finite automaton with matching caches (NFAb), covering its theoretical foundation, construction, state transition semantics, event selection strategies, shared versioned match buffers, and computation state details.

Big DataCEPFlink
0 likes · 9 min read
Understanding Flink CEP's NFAb Automaton for Complex Event Processing
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2020 · Databases

Optimizing a Complex MySQL Slow Query for Article Comments

This article analyzes a 60‑second MySQL query that retrieves article comments with multiple filters, explains why the optimizer chooses a small table as the driver, and presents a step‑by‑step optimization—including avoiding semi‑joins, improving index usage, refining range conditions, and moving GROUP BY into a subquery—that reduces execution time to 1.3 seconds, achieving a 60‑fold speedup.

Big DataDatabaseMySQL
0 likes · 13 min read
Optimizing a Complex MySQL Slow Query for Article Comments
Architects Research Society
Architects Research Society
Sep 15, 2020 · Big Data

Key Factors to Consider When Building Your Own Data Warehouse

This article examines essential considerations for selecting and designing a data warehouse—including data volume, scalability, on‑premises versus cloud options, pricing models, and ETL/ELT approaches—to help organizations choose the most suitable solution for their needs.

Big DataData WarehouseScalability
0 likes · 9 min read
Key Factors to Consider When Building Your Own Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2020 · Big Data

Evolution of JD.com Order Center Elasticsearch Cluster Architecture

This article details how JD.com's order center migrated its Elasticsearch cluster from a simple, default‑configured setup to a highly available, multi‑replica, dual‑cluster architecture with version upgrades, data synchronization strategies, and performance optimizations to support billions of documents and hundreds of millions of daily queries.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD.com Order Center Elasticsearch Cluster Architecture
Ctrip Technology
Ctrip Technology
Sep 10, 2020 · Big Data

Design and Implementation of a Unified Log Framework for Ctrip Payment Center

The article describes the design, architecture, and operational details of a unified logging framework at Ctrip's payment center, covering log production via a Log4j2 extension, Kafka‑Camus collection, Hive/ORC storage, MapReduce parsing optimizations, and governance strategies for massive daily TB‑scale data.

Big DataCamusHadoop
0 likes · 15 min read
Design and Implementation of a Unified Log Framework for Ctrip Payment Center
DataFunTalk
DataFunTalk
Sep 10, 2020 · Databases

Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice

This technical presentation explains how Youku tackles the massive, real‑time update problem of video‑content graphs by adopting a graph‑database architecture, sub‑graph partitioning, schema‑driven logical views, and Flink‑based pipelines to achieve second‑level updates for billions of entities and attributes.

Big DataFlinkGraph Database
0 likes · 15 min read
Graph‑Based Real‑Time Content Update Architecture at Youku: Challenges, Design, and Practice