Tagged articles
3697 articles
Page 25 of 37
ITPUB
ITPUB
Sep 9, 2020 · Databases

How to Speed Up Massive MySQL User‑Log Tables: Partitioning, Indexing, and Migration Strategies

This article examines performance problems with a 20‑million‑row MySQL user‑log table on Alibaba Cloud RDS, outlines three solution paths—optimizing the existing database, migrating to a MySQL‑compatible high‑performance service, and adopting a big‑data engine—and provides detailed guidance on schema design, indexing, partitioning, and practical SQL tweaks.

Big DataDatabase OptimizationMySQL
0 likes · 17 min read
How to Speed Up Massive MySQL User‑Log Tables: Partitioning, Indexing, and Migration Strategies
DataFunTalk
DataFunTalk
Sep 9, 2020 · Big Data

NetEase Big Data User Profiling: Architecture, Tagging System, and Real‑World Applications

This presentation details NetEase's massive multi‑domain data ecosystem, the design of its user‑profile center—including basic, behavior, preference, and predictive tags—ID‑mapping techniques, quality assurance processes, and several real‑time and offline use cases such as marketing, recommendation, growth operations, advertising, and fraud detection.

Big DataID-MappingTag Management
0 likes · 13 min read
NetEase Big Data User Profiling: Architecture, Tagging System, and Real‑World Applications
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 7, 2020 · Big Data

How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues

This article explains how the Alibaba DChain Data Converger (ADC) automatically creates wide‑table SQL for real‑time cross‑database analytics by using a pipeline architecture, priority‑queue‑driven task scheduling, and specific design patterns to handle metadata, joins, and resource management.

Big DataReal-time DataSQL Generation
0 likes · 13 min read
How Alibaba’s ADC Project Automates Real‑Time SQL Generation with Design Patterns and Priority Queues
DataFunTalk
DataFunTalk
Sep 7, 2020 · Big Data

Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation

This article presents Alibaba's search‑recommendation real‑time data warehouse, describing its business background, typical use cases, key requirements, the evolution from architecture 1.0 to 2.0 with Flink and Hologres, best‑practice patterns such as row/column storage, stream‑batch integration, high‑concurrency updates, and future directions like real‑time joins and persistent dimension storage.

Big DataFlinkHologres
0 likes · 13 min read
Real‑time Data Warehouse Architecture and Best Practices in Alibaba Search Recommendation
Architecture Digest
Architecture Digest
Sep 3, 2020 · Databases

Practical Elasticsearch Performance and Stability Tuning Guide

This article consolidates practical Elasticsearch tuning techniques—including configuration file adjustments, system‑level optimizations, and usage‑level settings—to improve cluster performance, stability, and resource efficiency for production environments.

Big DataCluster ConfigurationElasticsearch
0 likes · 15 min read
Practical Elasticsearch Performance and Stability Tuning Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 2, 2020 · Big Data

An Overview of Apache Hudi: Architecture, Features, and Query Types

Apache Hudi is an open‑source data‑lake framework that leverages Spark to ingest, manage, and incrementally query large analytical datasets on HDFS‑compatible storage, offering features such as timeline management, copy‑on‑write and merge‑on‑read tables, and support for snapshot, incremental, and read‑optimized queries across engines like Hive, Spark SQL and Presto.

Apache HudiBig DataData Lake
0 likes · 12 min read
An Overview of Apache Hudi: Architecture, Features, and Query Types
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 1, 2020 · Big Data

Configuring Hadoop to Support LZO Compression

This guide explains how to enable LZO compression in Hadoop by installing the twitter‑provided hadoop‑lzo library, updating core‑site.xml, synchronizing files across nodes, creating LZO indexes, and running a WordCount MapReduce job with LZO‑compressed output.

Big DataHadoopLZO
0 likes · 6 min read
Configuring Hadoop to Support LZO Compression
DataFunTalk
DataFunTalk
Sep 1, 2020 · Big Data

NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook

This article introduces NetEase's real-time computing platform Sloth, detailing its architecture, component layers, integrated IDE, operational tooling, unified metadata management, challenges such as Kudu write amplification, and proposes a tiered real‑time data‑warehouse model with a vision for storage‑compute separation and unified batch‑stream APIs.

Big DataFlinkKafka
0 likes · 13 min read
NetEase Real-Time Computing Platform (Sloth): Architecture, Practices, and Future Outlook
Xianyu Technology
Xianyu Technology
Sep 1, 2020 · Artificial Intelligence

Interest-Based Live Stream Recommendation System for Xianyu

Within three weeks, the team built an interest‑based live‑stream recommendation platform for Xianyu that combined operational insights, BI analysis, and offline algorithms to generate user‑anchor interest tags, sync them to an online graph, and dramatically boost top‑room UV and click‑through rates.

Big DataGraph Databaseinterest tagging
0 likes · 8 min read
Interest-Based Live Stream Recommendation System for Xianyu
Laravel Tech Community
Laravel Tech Community
Aug 31, 2020 · Big Data

Evolution of JD Daojia Order System Elasticsearch Cluster Architecture

This article details the step‑by‑step evolution of the JD Daojia order‑center Elasticsearch cluster—from an initial loosely configured deployment to a real‑time dual‑cluster architecture with replica tuning, master‑slave adjustments, data‑sync strategies, and lessons learned about pagination, fielddata, and doc values—highlighting how each phase improved query throughput, stability, and scalability for billions of documents.

Big DataCluster ArchitectureElasticsearch
0 likes · 12 min read
Evolution of JD Daojia Order System Elasticsearch Cluster Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 30, 2020 · Big Data

Kylin Cube Construction Principles and Optimization Techniques

This article explains the fundamentals of Kylin Cube construction—including dimensions, measures, Cuboid generation, layer-by-layer and in‑memory building algorithms, storage mechanisms, and various optimization strategies such as derived dimensions, aggregation groups, row‑key design, and concurrency granularity—providing a comprehensive guide for big‑data OLAP practitioners.

Big DataCubeKylin
0 likes · 14 min read
Kylin Cube Construction Principles and Optimization Techniques
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 27, 2020 · Big Data

HBase Architecture, Components, and Operations Overview

This article provides a comprehensive overview of Apache HBase’s architecture, detailing its core components such as RegionServer, HMaster, ZooKeeper, WAL, MemStore, and HFiles, and explains key processes including read/write paths, compaction, region splitting, load balancing, and recovery mechanisms.

Big DataDatabase ArchitectureDistributed Systems
0 likes · 17 min read
HBase Architecture, Components, and Operations Overview
Tencent Cloud Developer
Tencent Cloud Developer
Aug 27, 2020 · Big Data

Elasticsearch Overview: Architecture, Lucene Foundations, Application Scenarios, and Optimizations

Elasticsearch, built on Apache Lucene, provides a distributed, near‑real‑time search platform that scales to billions of documents across thousands of nodes, supporting use cases such as log analytics, time‑series monitoring, and product search, while Tencent’s CES adds advanced availability, performance, and cost‑optimizing features.

Big DataElasticsearchPerformance Optimization
0 likes · 17 min read
Elasticsearch Overview: Architecture, Lucene Foundations, Application Scenarios, and Optimizations
Efficient Ops
Efficient Ops
Aug 24, 2020 · Operations

How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons

This article walks through a mid‑size gaming company's journey of deploying, tuning, and scaling an Elasticsearch cluster for massive log volumes, covering hot‑cold node architecture, ILM policies, shard management, Logstash‑Kafka optimization, emergency expansions, and the promise of searchable snapshots to achieve petabyte‑scale storage with cost efficiency.

Big DataElasticsearchILM
0 likes · 28 min read
How to Scale Elasticsearch for PB‑Level Game Logs: Real‑World Strategies & Lessons
Didi Tech
Didi Tech
Aug 24, 2020 · Big Data

Evolution and Architecture of DiDi Data Channel Service

DiDi’s Data Channel Service evolved from a fragmented component system into a unified, SLA‑driven platform with a UI‑based Sync Center and Flink‑powered StreamSQL engine, dramatically improving task creation speed, resource utilization, and reliability while automating issue diagnosis for company‑wide real‑time and offline data synchronization.

Big DataETLFlink
0 likes · 12 min read
Evolution and Architecture of DiDi Data Channel Service
58 Tech
58 Tech
Aug 24, 2020 · Big Data

Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control

This article presents the concepts, architecture, and practical techniques of an online real‑time feature system used in intelligent risk‑control, covering feature definition, time‑window types, calculation functions, distributed processing, low‑latency storage, and operational challenges in high‑concurrency environments.

Big DataFeature EngineeringReal-time Processing
0 likes · 16 min read
Design and Practice of an Online Real-Time Feature System for Intelligent Risk Control
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Java Architect Essentials
Java Architect Essentials
Aug 21, 2020 · Big Data

Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis

This article presents a modular architecture for real‑time ETL log analysis that combines Flume for log collection, Kafka as a buffering layer, Storm for stream processing, Drools for rule‑based data transformation, and Redis for fast storage, detailing installation, configuration, and code integration steps.

Big DataDroolsFlume
0 likes · 23 min read
Design and Integration of Flume, Kafka, Storm, Drools, and Redis for Real‑Time ETL Log Analysis
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Aug 21, 2020 · Big Data

How Big Data and IoT Are Transforming Vehicle Networks: Opportunities and Challenges

This article explains the concepts of the Internet of Things and big data, explores how massive sensor data fuels smart transportation and vehicle networking, outlines practical applications such as real‑time traffic control and autonomous driving, and analyzes the technical and managerial bottlenecks hindering future growth.

Autonomous DrivingBig DataIoT
0 likes · 13 min read
How Big Data and IoT Are Transforming Vehicle Networks: Opportunities and Challenges
Liangxu Linux
Liangxu Linux
Aug 19, 2020 · Operations

How to Quickly Analyze Beijing Residency Data with Shell Commands

This tutorial shows how to use standard Unix shell tools such as grep, cut, sort, uniq, awk, and join to extract insights—top companies, most common surnames, popular given names, age distribution, and hometown statistics—from a JSON dataset of over 6,000 Beijing residency applicants.

Big DataData AnalysisJSON
0 likes · 13 min read
How to Quickly Analyze Beijing Residency Data with Shell Commands
Suning Technology
Suning Technology
Aug 18, 2020 · Backend Development

Boosting Mega‑Sale Stability: Suning’s Backend Data Components in Action

The article details how Suning’s transaction middle‑platform leverages custom TPS collection, advanced flow‑control, big‑data analytics, and AI‑driven forecasting to ensure system stability, capacity planning, and intelligent inventory distribution during the high‑traffic 818 promotional event.

AIBackendBig Data
0 likes · 17 min read
Boosting Mega‑Sale Stability: Suning’s Backend Data Components in Action
Beike Product & Technology
Beike Product & Technology
Aug 17, 2020 · Big Data

Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse

This article describes how a data management platform (DMP) at Beike leverages ClickHouse bitmap structures and Spark pipelines to generate global numeric user IDs, design tag-specific bitmap rules for enum, continuous, and date attributes, handle boundary cases, and produce high‑performance bitmap SQL for real‑time user group estimation and complex segment logic.

Big DataClickHouseDMP
0 likes · 17 min read
Bitmap-Based User Segmentation in a DMP Platform Using ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 16, 2020 · Big Data

Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features

This article provides a detailed introduction to HDFS, covering its application scenarios, core architecture, fault‑tolerance benefits, drawbacks such as high latency and small‑file inefficiency, essential shell and API commands, cluster management procedures, and newer Hadoop 2.0 features like HA, Federation, snapshots, ACLs, and heterogeneous storage.

Big DataCLIData Storage
0 likes · 10 min read
Comprehensive Overview of HDFS: Architecture, Advantages, Limitations, Commands, and Advanced Features
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection

This tutorial provides a comprehensive, step-by-step procedure for setting up a log‑collection pipeline using Filebeat, Kafka, Zookeeper, Logstash, Elasticsearch, and Kibana across multiple servers, covering hardware preparation, system tuning, software installation, configuration files, and verification commands.

Big DataELKFilebeat
0 likes · 11 min read
Step-by-Step Guide to Building an ELK Stack with Kafka, Zookeeper, Logstash, and Filebeat for Log Collection
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 15, 2020 · Big Data

Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases

This comprehensive article explains what a data lake is, outlines its core characteristics and reference architecture, compares major cloud providers' data‑lake offerings, presents typical advertising and gaming use cases, and proposes a practical, agile process for building and operating a data lake.

Big DataData ArchitectureData Lake
0 likes · 50 min read
Understanding Data Lakes: Concepts, Architecture, Vendor Solutions, and Practical Use Cases
Suning Technology
Suning Technology
Aug 14, 2020 · Big Data

Building SuNing’s Supply‑Chain Data Platform with DDD and Big‑Data Design

This article recounts SuNing’s step‑by‑step journey of designing and implementing a supply‑chain data middle platform, outlining its business rationale, DDD‑based domain modeling, layered system architecture, and practical deployment insights that illustrate how a tailored big‑data solution can enhance data services and governance.

Big DataDDDData Platform
0 likes · 11 min read
Building SuNing’s Supply‑Chain Data Platform with DDD and Big‑Data Design
Huolala Tech
Huolala Tech
Aug 13, 2020 · Operations

How Huolala’s “Smart Brain” Uses AI and Optimization to Revolutionize Logistics

At the 2020 Global Logistics Technology Conference in Haikou, Huolala CTO Zhang Hao detailed the company’s self‑developed “Smart Brain” system, which leverages AI, big‑data analytics, IoT and custom optimization algorithms to achieve real‑time, intelligent dispatch, dynamic pricing and safer, more efficient logistics operations.

AIBig DataIoT
0 likes · 6 min read
How Huolala’s “Smart Brain” Uses AI and Optimization to Revolutionize Logistics
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 13, 2020 · Databases

Introduction to ClickHouse: Features, Installation, Performance Testing, and Comparison

This article introduces ClickHouse, an open‑source column‑oriented OLAP database, detailing its key features, appropriate use cases, installation steps, performance benchmark queries, and how it compares with other columnar storage solutions while highlighting its adoption by major internet companies.

Big DataClickHouseColumnar Database
0 likes · 10 min read
Introduction to ClickHouse: Features, Installation, Performance Testing, and Comparison
Tencent Cloud Middleware
Tencent Cloud Middleware
Aug 12, 2020 · Big Data

How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling

This article explains how Tencent Cloud CKafka works, describes the challenges of traditional open‑source data‑flow solutions, and demonstrates a Serverless Function approach—complete with architecture diagrams and code examples—to achieve low‑cost, auto‑scaling Kafka‑to‑Elasticsearch pipelines.

Big DataCKafkaElasticsearch
0 likes · 12 min read
How Serverless Functions Can Replace Traditional Kafka Data Pipelines for Lower Cost and Easier Scaling
IT Architects Alliance
IT Architects Alliance
Aug 12, 2020 · Big Data

Introduction to Confluent KSQL for Real-Time Stream Processing

This article introduces Confluent KSQL, a SQL‑based real‑time stream processing engine for Kafka, covering its architecture, stream vs table concepts, query lifecycle, Docker‑based setup, DDL commands, example joins, windowed aggregations, connectors, and its advantages and limitations.

Big DataDockerKSQL
0 likes · 9 min read
Introduction to Confluent KSQL for Real-Time Stream Processing
Architects' Tech Alliance
Architects' Tech Alliance
Aug 11, 2020 · Big Data

Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices

This article provides an extensive summary of data middle platform concepts, covering data aggregation, collection tools, offline and real‑time development, data governance, service layers, warehouse construction, and operational practices, illustrating how enterprises build and manage a unified data ecosystem.

Big DataData Middle PlatformData Warehouse
0 likes · 27 min read
Comprehensive Overview of Data Middle Platform Architecture, Components, and Practices
Ctrip Technology
Ctrip Technology
Aug 6, 2020 · Big Data

Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse

This article shares the practical experience and thinking behind Ctrip's vacation data governance project, covering team efficiency optimization, demand sorting, data domain definition, warehouse layering, unified dimension modeling, metric standardization, and the overall benefits of a centralized data governance framework.

Big DataCtripData Warehouse
0 likes · 17 min read
Data Governance Practices and Model Design in Ctrip Vacation Data Warehouse
Youku Technology
Youku Technology
Aug 6, 2020 · Big Data

Alibaba Entertainment Data Platform: The Journey Ahead

The presentation outlines how Alibaba's entertainment data platform has evolved to meet the real‑time, low‑cost, and scalable analytics demands of campaigns such as Double 11 and 618, detailing its architecture, real‑time processing, pre‑computed data cubes, practical design choices, and lessons learned from implementation challenges.

Big DataReal-time Analytics
0 likes · 1 min read
Alibaba Entertainment Data Platform: The Journey Ahead
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2020 · Big Data

An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases

This article provides a comprehensive overview of Apache Kylin, covering its background, core OLAP concepts, technical architecture, installation steps, cube-building methods, real‑world enterprise deployments, and resources for further learning, illustrating how it enables sub‑second query performance on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
An Introduction to Apache Kylin: Architecture, Core Concepts, Installation, and Enterprise Use Cases
21CTO
21CTO
Aug 1, 2020 · Big Data

Mastering User Profiling: A Comprehensive Big Data Blueprint

This article explains how enterprises can leverage massive raw and business data to build detailed user profiles, covering tag types, data architecture, development modules, project phases, key deliverables, and a real-world e‑commerce case study.

Big DataData WarehouseETL
0 likes · 22 min read
Mastering User Profiling: A Comprehensive Big Data Blueprint
DataFunTalk
DataFunTalk
Aug 1, 2020 · Big Data

User Profiling Methodology and Engineering Solutions

This article explains the fundamentals of user profiling in the big data era, covering tag types, data architecture, development modules, a step‑by‑step implementation process, a practical e‑commerce case study, table design strategies, and both quantitative and qualitative profiling methods.

Big DataETLmachine learning
0 likes · 22 min read
User Profiling Methodology and Engineering Solutions

How Pandemic Data Visualization Evolved: From John Snow’s Cholera Map to Modern COVID Dashboards

This article traces the history and development of pandemic data visualization—from 19th‑century cholera maps and early 2000s SARS charts to sophisticated COVID‑19 dashboards—while outlining five essential design principles that make such visualizations clear, engaging, and impactful.

Big DataCOVID-19Design Principles
0 likes · 13 min read
How Pandemic Data Visualization Evolved: From John Snow’s Cholera Map to Modern COVID Dashboards
Tencent Cloud Developer
Tencent Cloud Developer
Jul 30, 2020 · Big Data

Cost Governance Practices in Youzan's Data Middle Platform

Youzan's data middle platform faced cost growth outpacing business due to low utilization and storage inefficiencies; they applied utilization standards, containerization, COS storage migration, offline task optimization, and fine-grained cost-billing, achieving a 12% compute boost, 17% batch savings, 80% storage cost cut, and over 25% overall cost reduction.

Big DataCloud ComputingContainerization
0 likes · 24 min read
Cost Governance Practices in Youzan's Data Middle Platform
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 30, 2020 · Big Data

Understanding Bucket Sampling Queries in Hive

This article explains Hive's bucket sampling syntax, demonstrates how to use the TABLESAMPLE clause with various bucket parameters, provides concrete SQL examples, and clarifies the underlying hash‑based mechanism that determines which rows are returned.

Big DataBucket SamplingHive
0 likes · 4 min read
Understanding Bucket Sampling Queries in Hive
Tencent Cloud Developer
Tencent Cloud Developer
Jul 29, 2020 · Big Data

Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics

To handle a gaming company's million‑QPS log stream, the team built a hot‑cold Tencent Cloud Elasticsearch cluster with ILM‑driven tiering, scaled CPU/heap, reduced shard count via shrink and replica tweaks, tuned Logstash‑Kafka pipelines, and employed COS snapshots and searchable snapshots, achieving stable performance and lower cost.

Big DataElasticsearchILM
0 likes · 29 min read
Case Study: Optimizing Tencent Cloud Elasticsearch for High‑Volume Game Log Analytics
MaGe Linux Operations
MaGe Linux Operations
Jul 28, 2020 · Big Data

How Leading Chinese Companies Scale Elasticsearch for Billions of Orders

This article surveys how major Chinese tech firms such as JD.com, Ctrip, Didi, and 58.com deploy and evolve Elasticsearch clusters to handle massive order data, log analysis, real‑time monitoring, and security tasks, detailing architecture choices, shard strategies, multi‑cluster designs, and performance optimizations.

Big DataElasticsearchOrder Management
0 likes · 11 min read
How Leading Chinese Companies Scale Elasticsearch for Billions of Orders
Xianyu Technology
Xianyu Technology
Jul 28, 2020 · Operations

ShenTan: Automated Fault Localization System for Online Services

ShenTan is an automated fault‑localization platform for online services that quickly (under five seconds) pinpoints server‑side issues with developer‑level accuracy by aggregating real‑time metrics, applying a decision‑tree model enriched by expert knowledge and dynamic thresholds, and presenting results through an integrated alert and visualization system, while planning broader endpoint coverage and multi‑tenant support.

Big DataFault LocalizationOperations
0 likes · 12 min read
ShenTan: Automated Fault Localization System for Online Services
dbaplus Community
dbaplus Community
Jul 26, 2020 · Big Data

How Prometheus Powers Scalable Monitoring for Massive Big Data Clusters

Facing thousands of nodes in expanding big‑data clusters, the author evaluates legacy monitoring stacks, selects Prometheus + Alertmanager + Grafana, and details its architecture, custom exporters, real‑time alerts, self‑healing mechanisms, and visual dashboards that now support ten large clusters and dozens of services.

AlertmanagerBig DataGrafana
0 likes · 11 min read
How Prometheus Powers Scalable Monitoring for Massive Big Data Clusters
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 23, 2020 · Big Data

Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management

This article provides an extensive overview of Apache Kafka, covering its use cases, key concepts such as ISR, AR, HW, LEO, and LW, message ordering, the roles of partitioners, serializers and interceptors, producer and consumer client architecture, offset handling, multithreaded consumption, and topic partition management.

Big DataKafkaMessage queue
0 likes · 16 min read
Comprehensive Kafka FAQ: Uses, Architecture, Offsets, and Partition Management
dbaplus Community
dbaplus Community
Jul 22, 2020 · Databases

How to Optimize Real‑Time Vector Tile Services for Millions of Features with PostgreSQL & PostGIS

This article explains how to efficiently browse and render millions of GIS features in real‑time vector tiles using PostgreSQL and PostGIS, covering background challenges, several thinning algorithms, their implementation steps, limitations, advantages, and a practical example with a 3‑million‑point dataset.

Big DataData DilutionGIS
0 likes · 8 min read
How to Optimize Real‑Time Vector Tile Services for Millions of Features with PostgreSQL & PostGIS
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 22, 2020 · Big Data

Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers

This article explains Kafka's fundamental architecture, including the roles of producers, brokers, and consumers, key concepts such as topics, partitions, replicas, ISR, and controller, as well as detailed mechanisms of producer client structure, interceptors, serializers, partitioners, and consumer group rebalancing strategies.

Big DataDistributed SystemsKafka
0 likes · 22 min read
Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 22, 2020 · Big Data

Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More

This article surveys the rapidly evolving big data landscape by reviewing a wide range of Apache projects—including Hadoop, Spark, Flink, HBase, Kudu, Impala, Kafka, and others—detailing their core components, architectures, strengths, and typical use‑cases for building distributed data platforms.

ApacheBig DataData Processing
0 likes · 20 min read
Exploring the Apache Big Data Ecosystem: Hadoop, Spark, Flink, and More
Tencent Cloud Developer
Tencent Cloud Developer
Jul 21, 2020 · Big Data

Scaling Tencent Meeting Video Stream Quality Analysis with Tencent Cloud Elasticsearch

Facing explosive growth and massive video‑stream quality data, Tencent Meeting migrated its custom Lucene‑based analysis engine to Tencent Cloud Elasticsearch, which delivered over 1 million writes per second, automatic sharding, reduced latency from hours to seconds, and sustained 99.99% availability, proving a high‑performance, scalable solution for large‑scale video conferencing.

Big DataCloud ComputingElasticsearch
0 likes · 16 min read
Scaling Tencent Meeting Video Stream Quality Analysis with Tencent Cloud Elasticsearch
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 19, 2020 · Big Data

An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem

This article explains Hive's role as a Hadoop‑based data warehouse, its integration with HBase, the advantages and drawbacks of that combination, introduces Apache Phoenix as a high‑performance SQL layer on HBase, and describes the open‑source NewSQL database Lealone, providing practical usage scenarios and performance comparisons.

Big DataData WarehouseHBase
0 likes · 9 min read
An Overview of Hive, HBase Integration, Apache Phoenix, and Lealone in the Big Data Ecosystem
Ctrip Technology
Ctrip Technology
Jul 16, 2020 · Big Data

Design and Architecture of the User Profiling System at Ctrip Business Travel

This article describes the concept, tag taxonomy, data flow architecture, and Lambda‑based query service design of Ctrip Business Travel's user profiling system, highlighting how batch and real‑time processing with Spark, Flink, Hive, MongoDB and Redis enable precise marketing, risk control and personalized services.

Big DataCtripdata pipeline
0 likes · 12 min read
Design and Architecture of the User Profiling System at Ctrip Business Travel
Architect
Architect
Jul 15, 2020 · Big Data

Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms

This article explains how Flink uses task slots to partition TaskManager resources, the benefits of slot sharing, the interaction between Scheduler, SlotPool, and ResourceManager, and the internal classes such as LogicalSlot, PhysicalSlot, and SlotSharingManager that enable resource isolation and sharing in stream processing jobs.

Big DataFlinkTask Slot
0 likes · 6 min read
Understanding Flink Task Slots, Resource Allocation, and Slot Sharing Mechanisms
Youzan Coder
Youzan Coder
Jul 15, 2020 · Big Data

Design and Implementation of Youzan ABTest System for Data‑Driven Growth

Youzan created an internal A/B testing platform—combining Java/Node SDKs, a real‑time data pipeline, and a metadata‑driven workflow—to enable data‑driven product iteration, granular traffic allocation, automated logging, statistical analysis, and scalable growth insights across its merchant services, while planning further automation and integration.

A/B testingBig DataExperiment Platform
0 likes · 19 min read
Design and Implementation of Youzan ABTest System for Data‑Driven Growth
Huolala Tech
Huolala Tech
Jul 15, 2020 · Big Data

How to Build Smart, Scalable Data Tracking Solutions for Comprehensive Analytics

This article explores the fundamentals, common schemes, pain points, and a smart end‑to‑end solution for data tracking (埋点), offering practical guidelines, architectural diagrams, and a concrete example to help engineers implement comprehensive, controllable, and efficient event collection pipelines.

AnalyticsBig DataData Tracking
0 likes · 9 min read
How to Build Smart, Scalable Data Tracking Solutions for Comprehensive Analytics
58 Tech
58 Tech
Jul 13, 2020 · Big Data

Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management

This article presents a comprehensive design and implementation guide for a financial data warehouse, covering background needs, modeling methodology choices, a layered architecture, data quality monitoring, metadata management, naming and coding standards, and future development directions.

Big DataData QualityData Warehouse
0 likes · 11 min read
Design and Implementation of a Financial Data Warehouse: Architecture, Modeling, Quality Monitoring, and Metadata Management
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 12, 2020 · Big Data

Design and Implementation of Ozone Data Exploration Service (Recon Server)

This article explains the design of a data exploration service for large‑scale distributed storage systems, detailing metadata synchronization, index reconstruction, aggregation tables, node‑level statistics, a user console, and the transition from checkpoint‑based snapshots to delta updates using RocksDB WAL in Hadoop Ozone Recon Server.

Big DataDelta UpdatesOzone
0 likes · 9 min read
Design and Implementation of Ozone Data Exploration Service (Recon Server)
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 9, 2020 · Big Data

How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication

This article explains how ZooKeeper functions as a distributed coordination service for HBase, detailing its role in master and RegionServer fault tolerance, log splitting, META table location tracking, and replication management, illustrating the underlying ZNode structures and failover mechanisms.

Big DataDistributed CoordinationHBase
0 likes · 7 min read
How ZooKeeper Supports HBase: Coordination, Fault Tolerance, Log Splitting, META Table Management, and Replication
Sohu Tech Products
Sohu Tech Products
Jul 8, 2020 · Big Data

Optimizing Workflow in Data Warehouse Construction: A Layered Task‑Instance Approach

The article analyzes data‑warehouse workflow scenarios, explains core concepts such as OLAP, multidimensional modeling and layer architecture, reviews existing workflow engines like Azkaban, Oozie and Airflow, and proposes a task‑and‑instance layered optimization that simplifies dependency configuration, improves collaboration, and supports complex scheduling in modern big‑data environments.

Big DataETLTask Scheduling
0 likes · 21 min read
Optimizing Workflow in Data Warehouse Construction: A Layered Task‑Instance Approach