Tagged articles

3697 articles

Page 29 of 37

Oct 14, 2019 · Operations

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

This article shares a practical case study of implementing AIOps in an online‑education company, covering the background pain points of massive monitoring data, the designed architecture with real‑time processing and machine‑learning pipelines, and the challenges and opportunities of intelligent operations.

Big DataIT Operationsaiops

0 likes · 14 min read

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

JD Retail Technology

Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka

0 likes · 13 min read

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

Big Data Technology & Architecture

Oct 14, 2019 · Big Data

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

This article presents a comprehensive analysis of Spark PageRank performance, detailing the algorithm's basics, the original example code, and four key optimizations—caching with checkpointing, memory‑efficient data structures, handling data skew, and maximizing executor and driver resource usage—backed by experimental results and practical recommendations.

Big DataCacheCheckpoint

0 likes · 18 min read

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Installing and Configuring Alibaba Canal for MySQL Binlog Capture

This guide explains how to download, install, and configure Alibaba Canal—including extracting the package, setting up canal.properties, instance.properties, and instance.xml files, and tuning key parameters—to enable reliable MySQL binlog capture for big‑data pipelines.

Big DataCanalData Capture

0 likes · 13 min read

Installing and Configuring Alibaba Canal for MySQL Binlog Capture

Big Data Technology & Architecture

Oct 13, 2019 · Big Data

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

This guide introduces the canal‑kafka integration package, outlines its constraints, and provides a step‑by‑step tutorial with Maven dependencies and Java source code for a SimpleCanalClient, a Kafka producer, and a server class, enabling a functional demo of Canal to Kafka data streaming.

Big DataCanalData Integration

0 likes · 8 min read

Building a Simple Canal-to-Kafka Demo with Maven Dependencies and Java Code

58 Tech

Oct 10, 2019 · Big Data

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

This article describes how 58.com’s commercial engineering team redesigned its real‑time feature‑mining pipeline—replacing a minute‑level Spark Streaming framework with Flink—to achieve sub‑second latency, higher throughput, stronger fault‑tolerance, and end‑to‑end exactly‑once semantics for user‑profile generation in the second‑hand‑car recommendation scenario.

Big DataExactly-OnceFlink

0 likes · 14 min read

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

Sohu Tech Products

Oct 9, 2019 · Databases

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

This article explains HBase’s data model and provides comprehensive table‑design strategies—including column‑descriptor options, row‑key best practices, high‑vs‑wide table trade‑offs, region splitting and pre‑splitting techniques—to help achieve optimal performance and scalability in large‑scale NoSQL workloads.

Big DataColumn FamilyHBase

0 likes · 16 min read

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

Big Data Technology & Architecture

Oct 9, 2019 · Big Data

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big DataCheckpointFlink

0 likes · 10 min read

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

Alibaba Cloud Infrastructure

Oct 9, 2019 · Cloud Computing

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

The 2019 Yunqi Conference Cloud Network Forum gathered over two hundred network enthusiasts to review a decade of Alibaba data‑center networking evolution, explore emerging technologies such as AI, big data, and programmable chips, and outline the next ten years of high‑performance, data‑centric cloud networking.

Big DataHigh‑Performance Networkingnetwork architecture

0 likes · 9 min read

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

dbaplus Community

Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase

0 likes · 17 min read

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

Big Data Technology & Architecture

Oct 8, 2019 · Big Data

Handling Deprecated Flink API: Converting Legacy TypeInformation to DataTypes

After Flink 1.9 deprecated the legacy Type API in favor of DataTypes, users encounter missing schema TypeInformation methods, and this article explains the root cause and provides a code solution to convert legacy types using TypeConversions and register a TableSink.

Big DataDataTypesFlink

0 likes · 2 min read

Handling Deprecated Flink API: Converting Legacy TypeInformation to DataTypes

Architects' Tech Alliance

Oct 7, 2019 · Industry Insights

How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions

The article traces Google’s decade‑long impact on the evolution of the PC Web era, its pioneering technologies in search, email, infrastructure, big data, cloud computing, and mobile, explaining how its philosophy both propelled and missed commercial opportunities across each wave of internet innovation.

Big DataCloud ComputingGoogle

0 likes · 11 min read

How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions

Big Data Technology & Architecture

Oct 3, 2019 · Big Data

Data Development Interview Tips and Career Guidance

This article offers practical advice for data development job interviews, explaining why Java is essential, comparing Java and Python, outlining required backend framework knowledge, discussing the role of SQL and data warehousing, and addressing work‑life concerns such as overtime and company size choices.

Big DataData EngineeringInterview

0 likes · 4 min read

Data Development Interview Tips and Career Guidance

Programmer DD

Sep 29, 2019 · Big Data

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

This article explores whether it is technically feasible to place all 1.4 billion Chinese users into one WeChat group, analyzing population statistics, message volume, CPU processing limits, network bandwidth, storage requirements, and cost implications with supporting calculations and references.

Big DataDistributed SystemsNetwork Bandwidth

0 likes · 12 min read

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

Architects Research Society

Sep 28, 2019 · Artificial Intelligence

Data Mining and Machine Learning: Concepts, Process, and Software Catalog

This article explains the fundamentals of data mining and machine learning, outlines the knowledge discovery process and typical analytical tasks, and provides an extensive alphabetically ordered list of software tools used for these technologies.

AIBig DataSoftware

0 likes · 7 min read

Data Mining and Machine Learning: Concepts, Process, and Software Catalog

Xueersi Online School Tech Team

Sep 27, 2019 · Big Data

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

This article explains how Apache Kylin, an open‑source distributed analytics engine built on Hadoop/Spark, achieves sub‑second OLAP query performance through pre‑computed cubes, a layered cuboid generation algorithm, bitmap‑based distinct counting, dimension optimization techniques, and tight integration with HBase for storage and fast SQL querying.

Apache KylinBig DataCube

0 likes · 15 min read

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

JD Retail Technology

Sep 27, 2019 · Big Data

How to Become a Spark Committer: The Journey of JD’s Zheng Ruifeng

The article chronicles JD engineer Zheng Ruifeng’s path to becoming a Spark Committer, highlighting his early involvement, key contributions to Spark’s ML and GraphX components, the community’s scale, and his vision for future improvements in the big‑data platform.

Apache SparkBig DataCommitter

0 likes · 6 min read

How to Become a Spark Committer: The Journey of JD’s Zheng Ruifeng

Meituan Technology Team

Sep 26, 2019 · Big Data

Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University

Meituan’s big‑data team and Tsinghua’s Electronic Engineering Department have launched a master‑level, credit‑bearing course that blends theory with 24 hours of hands‑on training, showcases Meituan’s real‑world data infrastructure and applications, and aims to create a recurring bridge between academia and industry while recruiting top talent.

Big DataCommercial ApplicationMeituan

0 likes · 6 min read

Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University

Big Data Technology & Architecture

Sep 25, 2019 · Big Data

Designing and Using Global Secondary Indexes in Apache Phoenix

This article explains how Apache Phoenix implements global secondary indexes using separate HBase tables, demonstrates index creation and data synchronization with example SQL, and provides design guidelines to optimize query latency and avoid full‑table scans in big‑data environments.

Big DataHBasePhoenix

0 likes · 4 min read

Designing and Using Global Secondary Indexes in Apache Phoenix

Huawei Cloud Developer Alliance

Sep 25, 2019 · Cloud Computing

How Huawei’s OceanConnect IoT Platform Powers Smart Cities, Connected Cars, and More

This article provides a comprehensive overview of Huawei's OceanConnect IoT platform, detailing its market positioning, key application scenarios such as connected vehicles and smart cities, core product capabilities, deployment options, security measures, and underlying software architecture.

Big DataCloud ComputingConnected Car

0 likes · 12 min read

How Huawei’s OceanConnect IoT Platform Powers Smart Cities, Connected Cars, and More

dbaplus Community

Sep 24, 2019 · Big Data

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

The presentation explains how Weibo leverages big‑data technologies, user profiling, and social‑first advertising models to drive commercial growth, detailing data‑driven product development, real‑time and offline data warehouses, scientific experiments, and case studies that illustrate the impact on revenue and user engagement.

AdvertisingBig DataData Warehouse

0 likes · 24 min read

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

Alibaba Cloud Developer

Sep 24, 2019 · Artificial Intelligence

How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time

Gaode’s engineering team presents a semi‑supervised deep‑learning framework that models road networks, extracts traffic, routing, deviation and heatmap features, and combines LSTM with ResNet to accurately identify dynamic road‑closure events, enabling both offline and real‑time detection with high confidence and business‑aligned validation.

Big DataLSTMResNetSemi-supervised Learning

0 likes · 12 min read

How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time

Snowball Engineer Team

Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData AnalysisData Catalog

0 likes · 12 min read

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

Alibaba Cloud Developer

Sep 24, 2019 · Big Data

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.

Backend ArchitectureBig DataDistributed Systems

0 likes · 10 min read

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Big Data Technology & Architecture

Sep 23, 2019 · Big Data

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

This article describes Meituan’s large‑scale OLAP requirements, how Apache Kylin was integrated to meet them, the architectural solutions, performance benchmarks against other engines, and future work, providing practical insights for building stable, precise, and high‑performance analytics platforms.

Apache KylinBig DataData Warehouse

0 likes · 20 min read

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

Big Data Technology & Architecture

Sep 22, 2019 · Databases

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

This article explains how Alibaba Cloud's BDS migration service enables continuous, high‑performance migration of HBase clusters—including schema, full data, and incremental sync—across version upgrades, hardware changes, network migrations, and cross‑region scenarios, while ensuring stability and minimal impact on live workloads.

Alibaba CloudBDSBig Data

0 likes · 10 min read

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

Big Data Technology & Architecture

Sep 21, 2019 · Big Data

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

This tutorial explains how to run Apache Flink jobs on Kubernetes by building Docker images, deploying JobManager and TaskManager components with Kubernetes manifests, configuring high‑availability with ZooKeeper and HDFS, and using SavePoints and scaling techniques to manage and extend Flink streaming applications.

Big DataDockerFlink

0 likes · 14 min read

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

Beike Product & Technology

Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing

0 likes · 10 min read

Understanding DStream Construction and Execution in Spark Streaming

Suning Technology

Sep 20, 2019 · Big Data

How Suning’s Big Data Engine Powers Smart Retail Transformation

Suning’s big‑data center, built on a 30‑year retail evolution and leveraging technologies like AI, cloud, and IoT, showcases how integrated data platforms and robust security can drive smart retail, improve services for 600 million users, and create a new competitive edge.

AIBig DataCloud Computing

0 likes · 6 min read

How Suning’s Big Data Engine Powers Smart Retail Transformation

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization

0 likes · 24 min read

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Big Data Technology & Architecture

Sep 19, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

This article demonstrates how to develop a real‑time ETL job using Apache Flink, covering project setup, Kafka as a source, custom bucket assigners for HDFS, checkpointing, savepoints, and deployment on YARN to achieve exactly‑once processing guarantees.

Apache FlinkBig DataExactly-Once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

FunTester

Sep 19, 2019 · Operations

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Over the next decade, rapid advances in IoT, AI, big data, and pervasive automation such as cognitive RPA will transform DevOps practices, driving more integrated, intelligent testing and continuous delivery pipelines, while organizations mature their digital transformation journeys to meet increasingly complex, data‑driven operational demands.

AIBig DataIoT

0 likes · 8 min read

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Efficient Ops

Sep 18, 2019 · Databases

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

The article analyzes how the DBA job market is shrinking as traditional enterprises shift away from legacy systems, cloud adoption reshapes responsibilities, and DBAs face limited advancement unless they transition to architecture or data‑analytics roles, highlighting the growing risk and low reward of staying in pure DBA work.

Big DataDBADatabase Administration

0 likes · 7 min read

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

Big Data Technology & Architecture

Sep 18, 2019 · Big Data

Understanding Flink Checkpoint Mechanism and Configuration

This article explains Flink's checkpoint mechanism, its execution flow, common configuration options, and the benefits and considerations of incremental checkpoints using the RocksDB state backend, providing practical code examples and YAML settings for reliable stream processing.

Big DataCheckpointFlink

0 likes · 12 min read

Understanding Flink Checkpoint Mechanism and Configuration

Youzan Coder

Sep 18, 2019 · Big Data

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

The article proposes using Newton’s law of cooling to score DMP user transactions, assigning higher weights to recent purchases that decay exponentially over time, deriving a cooling constant from boundary conditions, and normalizing the resulting heat‑based scores through log‑scaling and a sigmoid‑like mapping to a 0‑100 range.

Big DataDMPNewton cooling law

0 likes · 4 min read

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

DataFunTalk

Sep 17, 2019 · Artificial Intelligence

Machine Learning for Personalized Education Paths – Case Study and Reflections

This lecture explores how machine learning can generate individualized learning pathways for students by building knowledge dependency graphs, defining optimization goals, and leveraging historical data to rank candidate routes, while reflecting on data, model, business, and demand challenges in AI-driven education.

AIBig DataKnowledge Graph

0 likes · 10 min read

Machine Learning for Personalized Education Paths – Case Study and Reflections

Big Data Technology & Architecture

Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataFlow

0 likes · 22 min read

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

Big Data Technology & Architecture

Sep 15, 2019 · Big Data

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

This article presents a comprehensive collection of Flink interview questions covering fundamental concepts, advanced topics, and source‑code details to help candidates prepare effectively for Flink‑related technical interviews.

Apache FlinkBig DataData Processing

0 likes · 6 min read

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

Big Data Technology & Architecture

Sep 14, 2019 · Big Data

Comparison of Open-Source OLAP Engines for Real-Time Data Warehousing

This article reviews the concepts, criteria, and characteristics of major open‑source OLAP engines—including Hive, HAWQ, Spark SQL, Presto, Kylin, Impala, Druid, Greenplum, and ClickHouse—providing guidance on selecting the most suitable solution for various big‑data analytics scenarios.

Big DataData WarehouseOLAP

0 likes · 19 min read

Comparison of Open-Source OLAP Engines for Real-Time Data Warehousing

Big Data Technology & Architecture

Sep 13, 2019 · Big Data

Differences and Relationship Between HBase and Hive in Big Data Architecture

The article explains that HBase and Hive occupy distinct roles in big‑data systems—HBase handles real‑time random queries on massive detail data, while Hive provides batch‑oriented SQL‑based processing on HDFS—and describes how they are typically combined in a data pipeline.

Batch processingBig DataData Architecture

0 likes · 5 min read

Differences and Relationship Between HBase and Hive in Big Data Architecture

iQIYI Technical Product Team

Sep 12, 2019 · Artificial Intelligence

AI Technology Practice and Application in Entertainment

The iQiyi Technology Salon’s AI Technology Practice and Application series explains how AI reshapes entertainment by automating video and audio production, optimizing short‑video flows, enabling intelligent search, and leveraging big‑data analytics for behavior analysis, intent recognition, and personalized recommendations, supported by iQiyi’s robust AI platform.

AI technologyBig DataEntertainment Industry

0 likes · 7 min read

AI Technology Practice and Application in Entertainment

Big Data Technology & Architecture

Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

Big DataData PlatformETL

0 likes · 9 min read

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

Tencent Cloud Developer

Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationDistributed Systems

0 likes · 16 min read

YARN Practice and Technical Evolution at Kuaishou

DataFunTalk

Sep 10, 2019 · Big Data

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

The article explains why businesses must embrace the rapid, non‑linear growth of data and machine‑learning technologies, illustrating how data volume and richer information can drive exponential business value, improve competitiveness, and create sustainable positive feedback loops across various industry scenarios.

AIBig DataDigital Transformation

0 likes · 13 min read

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

Tencent Cloud Developer

Sep 9, 2019 · Databases

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Tencent engineers improved Elasticsearch’s high‑concurrency write path, reducing the time to load ten million records from eighteen to fifteen minutes—a 20 % speed boost—earning thanks from Elastic’s CEO and showcasing the company’s broader open‑source contributions and strategic cloud‑search partnership.

Big DataElasticsearchOpen Source

0 likes · 6 min read

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Alibaba Cloud Developer

Sep 9, 2019 · Big Data

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

This article explains how unstructured data—comprising documents, images, audio, video and more—now dominates over 80% of all data, outlines its characteristics and challenges, compares it with structured data, and showcases real-world AI applications such as ImageNet, intelligent customer service and smart security, while proposing a roadmap for building a unified unstructured‑data asset.

Big Datadata analyticsmachine learning

0 likes · 15 min read

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

58 Tech

Sep 6, 2019 · Big Data

Architecture and Technical Implementation of the WMDA Data Analytics Platform

The article details WMDA's end‑to‑end data analytics architecture, covering zero‑event data collection, real‑time and offline processing pipelines built on Spark Streaming, Druid, Hadoop, Kettle, and TaskServer, and explains how these components collaborate to deliver comprehensive user behavior analysis.

Big DataDruidETL

0 likes · 11 min read

Architecture and Technical Implementation of the WMDA Data Analytics Platform

Big Data Technology & Architecture

Sep 5, 2019 · Databases

Understanding HBase Connection Management and Best Practices

The article explains why HBase client connections should not be pooled, describes common misuse patterns, and details how the heavyweight, thread‑safe Connection object internally manages connections to HMaster, RegionServers, and ZooKeeper, recommending a single shared Connection per application.

Big DataDatabaseHBase

0 likes · 10 min read

Understanding HBase Connection Management and Best Practices

Big Data Technology & Architecture

Sep 5, 2019 · Big Data

Applying Flink CEP for Complex Event Processing at Haolo Mobility

This article explains how Flink CEP, a complex event processing library for Apache Flink, is employed at Haolo Mobility to detect intricate patterns in endless data streams by modeling patterns as states and using pattern conditions for state transitions, illustrating its practical application in real‑world big‑data scenarios.

Big DataCEPFlink

0 likes · 2 min read

Applying Flink CEP for Complex Event Processing at Haolo Mobility

Big Data Technology & Architecture

Sep 4, 2019 · Big Data

Understanding Druid: Real‑time OLAP Architecture, Features, Ingestion, and Querying

This article provides a comprehensive overview of Apache Druid, covering its real‑time OLAP design, core features, six‑component architecture, segment storage model, data ingestion pipelines (including Tranquility and Kafka), native and SQL query interfaces, and practical tuning tips with code examples.

ApacheBig DataDruid

0 likes · 17 min read

Understanding Druid: Real‑time OLAP Architecture, Features, Ingestion, and Querying

Big Data Technology & Architecture

Sep 4, 2019 · Artificial Intelligence

Understanding the Relationship Between AI, Big Data, and Cloud Computing

This article explores the historical development of artificial intelligence, its interplay with big data and cloud computing, examines realistic expectations for AI applications, and explains how massive data and scalable cloud resources together drive modern AI advancements.

AIBig DataCloud Computing

0 likes · 13 min read

Understanding the Relationship Between AI, Big Data, and Cloud Computing

360 Tech Engineering

Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation

0 likes · 14 min read

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

Alibaba Cloud Developer

Sep 4, 2019 · Big Data

How Structured Big Data Storage Powers Modern Data Systems

This article explores the core components of data systems, the evolution toward lightweight, intelligent big data architectures, the distinction between primary and secondary storage, challenges of data replication, and how Alibaba Cloud's Tablestore implements advanced features such as storage‑compute separation, CDC, and multi‑model indexing for scalable, cost‑effective structured big data storage.

Big DataCDCCloud Services

0 likes · 24 min read

How Structured Big Data Storage Powers Modern Data Systems

DataFunTalk

Sep 3, 2019 · Big Data

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

This article explains how big data enhances machine learning by enabling finer-grained data characterization, improving confidence in statistical conclusions, and supporting smarter learning through multiple stages of model development, illustrated with concrete examples and a discussion of sample size dilemmas.

Big DataData Analysismachine learning

0 likes · 10 min read

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

360 Zhihui Cloud Developer

Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine

0 likes · 12 min read

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

Tongcheng Travel Technology Center

Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink

0 likes · 16 min read

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

Tencent Cloud Developer

Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataCloud ComputingData Warehouse

0 likes · 22 min read

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

Beike Product & Technology

Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization

0 likes · 10 min read

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

360 Smart Cloud

Aug 29, 2019 · Artificial Intelligence

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

At the 2019 World Artificial Intelligence Conference, the Ministry of Science and Technology announced ten national AI open‑innovation platforms, selecting 360 to lead the security‑brain platform, highlighting its role in AI‑driven cybersecurity, big‑data analytics, cloud and blockchain technologies.

360Big DataInformation Security

0 likes · 4 min read

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

58 Tech

Aug 29, 2019 · Information Security

Graph-Based Anomaly Detection Framework for Security Threats

The article presents a graph‑based anomaly detection architecture that tackles black‑market resource switching by constructing complex user‑traffic networks, mining graph similarities, and applying multi‑dimensional strategies to achieve high‑accuracy detection while meeting timeliness, performance, and interpretability requirements.

Anomaly DetectionBig DataInformation Security

0 likes · 8 min read

Graph-Based Anomaly Detection Framework for Security Threats

Xianyu Technology

Aug 28, 2019 · Big Data

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

Big Dataautomationdata pipeline

0 likes · 18 min read

Unified Search System Architecture and Automation for Multiple Business Scenarios

dbaplus Community

Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing

0 likes · 18 min read

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

Big Data Technology & Architecture

Aug 27, 2019 · Big Data

Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

This article explains how to build a data warehouse from scratch, covering its definition, system and collaboration layers, ETL requirements, data layering design, modeling steps, common challenges, and governance practices such as temporary table management and coding standards.

Big DataData WarehouseETL

0 likes · 13 min read

Building a Data Warehouse: Architecture, ETL, Layering, Modeling, and Governance

Big Data Technology & Architecture

Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataData Engineering

0 likes · 10 min read

Comprehensive Collection of Apache Flink Learning Resources

Big Data Technology & Architecture

Aug 25, 2019 · Big Data

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

This article recounts Tencent's journey from adopting Flink to building the Oceanus platform, detailing its architecture, product features, and a series of deep extensions—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, watermark idle detection, and log isolation—aimed at supporting trillion‑scale real‑time data processing.

Big DataFlinkOceanus

0 likes · 18 min read

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

Architects' Tech Alliance

Aug 24, 2019 · Big Data

Reimagining Big Data in a Post‑Hadoop World

The article analyzes the decline of Hadoop as the dominant big‑data platform, explains how cloud‑based services are replacing its complex on‑premises architecture, and outlines the lessons and future directions for enterprises navigating a post‑Hadoop landscape.

Big DataDistributed SystemsHadoop

0 likes · 12 min read

Reimagining Big Data in a Post‑Hadoop World

Youzan Coder

Aug 23, 2019 · Big Data

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Big DataData QualityFlink

0 likes · 11 min read

How to Build a Robust Event Logging Quality System with Real‑Time Validation

Qunar Tech Salon

Aug 22, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Big DataFair SchedulerLoad Simulation

0 likes · 23 min read

Big Data Technology Architecture

Aug 21, 2019 · Big Data

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

This article explains fundamental big‑data concepts by comparing offline (batch) and real‑time (stream) computing, distinguishing real‑time queries from ad‑hoc queries, clarifying OLTP versus OLAP workloads, and outlining the differences between row‑based and column‑based storage architectures.

Big DataColumn StorageOLAP

0 likes · 5 min read

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

Big Data Technology & Architecture

Aug 20, 2019 · Big Data

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

The article summarizes a 2019 Apache Flink Meetup in Shenzhen where OPPO’s big‑data platform lead explains how the company built a real‑time data warehouse using Flink SQL extensions, presents four key aspects of the evolution, application cases, and future directions.

Big DataFlinkOPPO

0 likes · 3 min read

OPPO’s Real‑Time Data Warehouse Construction with Apache Flink

Architects' Tech Alliance

Aug 20, 2019 · Big Data

Current State and Future Trends of Hadoop in the Big Data Landscape

Despite recent market turbulence and negative headlines, Hadoop's revenue continues to grow, driven by cloud migration, evolving storage solutions, and increasing adoption of related projects like Spark and Kafka, positioning it as a leading data‑lake technology.

Apache SparkBig DataData Lake

0 likes · 8 min read

Current State and Future Trends of Hadoop in the Big Data Landscape

21CTO

Aug 20, 2019 · Big Data

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

This article explains how Mogu’s advertising system designs and evolves a real‑time data pipeline—covering merchant and operation needs, data collection, cleaning, processing with Storm, Flink, and Kylin, and service guarantees—to enable high‑quality, low‑latency analytics for advertisers and the platform.

AdvertisingBig DataFlink

0 likes · 12 min read

How Mogu’s Advertising Platform Built a Real‑Time Data Pipeline with Storm, Flink, and Kylin

DataFunTalk

Aug 20, 2019 · Artificial Intelligence

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

This article explains why machine learning relies on big‑data statistical learning, illustrating human learning through induction and deduction, presenting case studies that highlight the limits of anecdotal reasoning, and introducing the law of large numbers and probabilistic trust as foundations for reliable AI models.

Big DataLearning Theorymachine learning

0 likes · 19 min read

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

Big Data Technology & Architecture

Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou

0 likes · 2 min read

Flink Application Scenarios and Scale at Kuaishou

Architects' Tech Alliance

Aug 18, 2019 · Big Data

Oracle Architecture and ASM Storage Configuration Overview

This article provides a comprehensive overview of Oracle database architecture, detailing memory, physical and logical structures, I/O characteristics of various files, differences between OLTP and OLAP workloads, and practical ASM configuration and storage optimization recommendations for high‑performance environments.

ASMBig DataDatabase Storage

0 likes · 12 min read

Oracle Architecture and ASM Storage Configuration Overview

Didi Tech

Aug 17, 2019 · Industry Insights

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

This article analyzes how Didi’s unique ride‑hailing scenario big data is applied to automotive finance, detailing the business model, asset‑side and full‑process risk challenges, data‑driven solutions, and future prospects for intelligent credit risk control in both enterprise and retail lending.

Big DataCredit ScoringDidi

0 likes · 14 min read

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

Youku Technology

Aug 15, 2019 · Big Data

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

Youku’s 2017 migration from an on‑premises Hadoop cluster to Alibaba Cloud MaxCompute delivered a unified, elastic data pipeline that cut compute and storage costs by roughly half, handled billions of daily log records, boosted performance and scalability, and empowered analysts with self‑service tools and a rich ecosystem.

Big DataData MigrationMaxCompute

0 likes · 12 min read

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

DataFunTalk

Aug 14, 2019 · Artificial Intelligence

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

The article explores how the rapid growth of the internet has created information overload, discusses the challenges of recommendation systems such as sparsity and timeliness, outlines a four‑step personalized content pipeline, and highlights the interdisciplinary nature of building effective AI‑driven recommendation solutions.

AIBig DataData Engineering

0 likes · 16 min read

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

Youzan Coder

Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

AnalyticsBig DataData Collection

0 likes · 19 min read

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

Architecture Digest

Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka

0 likes · 56 min read

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Amap Tech

Aug 13, 2019 · Artificial Intelligence

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

At the 2019 Alibaba Cloud Yunci Conference in Hangzhou, Gaode Technology presented a comprehensive technical forum covering visual intelligence, autonomous-driving perception, the evolution of its client and traffic-access architecture, fine-grained positioning, route-planning algorithms, and spatio-temporal data applications, featuring expert talks from Gaode and Alibaba specialists.

Big DataLocation‑based servicescloud-native

0 likes · 8 min read

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

Big Data Technology & Architecture

Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning

0 likes · 23 min read

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

Big Data Technology & Architecture

Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink

0 likes · 12 min read

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

DevOps Cloud Academy

Aug 11, 2019 · Big Data

Overview of MFS Distributed File System Architecture Similar to GoogleFS

The article explains the MFS distributed file system, detailing its four components—Master, Metalogger, Chunkserver, and Client—along with hardware recommendations, metadata handling, replication strategies, and FUSE‑based client mounting, providing a comprehensive guide to building a GoogleFS‑like storage cluster.

Big DataDistributed File SystemMFS

0 likes · 5 min read

Overview of MFS Distributed File System Architecture Similar to GoogleFS

360 Tech Engineering

Aug 9, 2019 · Information Security

Zhou Hongyi Highlights the Growing Threat of Cyber Warfare and the Need for Advanced Security Intelligence

In a Sanya digital summit speech, Zhou Hongyi warned that cyber warfare has become a major national‑level threat, outlined four key shifts in enterprise security, and described 360's big‑data security brain and future plans to build a nation‑wide defensive ecosystem.

APTBig DataCyber Warfare

0 likes · 5 min read

Zhou Hongyi Highlights the Growing Threat of Cyber Warfare and the Need for Advanced Security Intelligence

Big Data Technology & Architecture

Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube

0 likes · 20 min read

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

360 Quality & Efficiency

Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka

0 likes · 7 min read

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

vivo Internet Technology

Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging

0 likes · 36 min read

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

Ctrip Technology

Aug 7, 2019 · Big Data

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

The article describes how Ctrip's ticket front‑end team replaced a slow, manual log‑pulling process with a Flink‑based real‑time pipeline that streams Kafka data, indexes it in Elasticsearch, and enables second‑level log retrieval for automated scenario replay, dramatically reducing CI cycle time.

Automation testingBig DataElasticsearch

0 likes · 7 min read

Improving Log Replay Efficiency with Flink and Elasticsearch at Ctrip Ticket Frontend

dbaplus Community

Aug 6, 2019 · Databases

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

Big DataCtripData Warehouse

0 likes · 13 min read

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

Big Data Technology & Architecture

Aug 5, 2019 · Big Data

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

The article provides a comprehensive overview of recent Apache Spark advancements—including Delta Lake, Data Source V2, runtime optimizations, relational cache, cloud‑native challenges, AI integration via Project Hydrogen, and the anticipated features of Spark 3.0—highlighting how these innovations address modern data‑warehouse, cloud, and machine‑learning workloads.

Apache SparkBig DataData Warehouse

0 likes · 17 min read

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

Alibaba Cloud Developer

Aug 5, 2019 · Cloud Computing

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Alibaba’s senior tech expert Mu Jian explains how Tmall’s smart stores embody the new retail paradigm by leveraging cloud computing, big data, and digital tools to transform offline retail, enhance consumer experiences, streamline operations, and create integrated online‑offline ecosystems through cloud stores, cloud POS, and innovative marketing solutions.

Big DataCloud ComputingDigital Transformation

0 likes · 25 min read

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Big Data Technology & Architecture

Aug 4, 2019 · Big Data

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

This article compares Apache Kafka and Apache Pulsar, detailing Kafka's scalability challenges, Pulsar's architectural benefits, performance gains, multi‑tenant support, security features, and provides code examples and migration guidance for large‑scale streaming applications.

Apache PulsarBig DataDistributed Systems

0 likes · 11 min read

Apache Pulsar vs Apache Kafka: Architecture, Performance, and Advantages

Big Data Technology & Architecture

Aug 3, 2019 · Big Data

Understanding SparkEnv Initialization: Components and Their Setup

This article walks through the SparkEnv initialization process in Apache Spark, detailing how the driver and executor environments are created, the key components such as SecurityManager, RpcEnv, SerializerManager, BroadcastManager, MapOutputTracker, ShuffleManager, MemoryManager, BlockManager, MetricsSystem, and OutputCommitCoordinator are instantiated, and how the final SparkEnv instance is assembled and stored.

Big DataDistributed computingScala

0 likes · 13 min read

Understanding SparkEnv Initialization: Components and Their Setup

Suning Technology

Aug 2, 2019 · Big Data

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

At the 15th China (Nanjing) International Software Expo, SuNing's VP shared how the company applies big‑data analytics, the C2M model, and flexible manufacturing to personalize retail experiences, bridge online‑offline gaps, and drive data‑driven product development and supply‑chain efficiency.

Big DataC2Mdata driven

0 likes · 9 min read

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

Meituan Technology Team

Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop

0 likes · 24 min read

21CTO

Jul 31, 2019 · Artificial Intelligence

How JD Built a Scalable AI‑Powered Recommendation System

The article outlines JD’s evolution from rule‑based product suggestions in 2012 to a sophisticated, AI‑driven, multi‑screen personalized recommendation platform, detailing its product types, system architecture, data collection, offline and online computation, and the core recommendation engine that powers features like “Guess You Like.”

AIBig DataJD.com

0 likes · 14 min read

How JD Built a Scalable AI‑Powered Recommendation System

360 Tech Engineering

Jul 31, 2019 · Backend Development

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

This article explains how 360 Search processes billions of web pages daily, detailing its backend architecture, offline indexing, online retrieval, index organization, and relevance models that enable efficient search over a hundred‑billion‑scale web corpus.

Big DataDistributed SystemsHBase

0 likes · 21 min read

Design and Key Technologies of the 360 Search Engine for Billion‑Scale Web Retrieval

dbaplus Community

Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka

0 likes · 14 min read

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

Big Data Technology & Architecture

Jul 29, 2019 · Databases

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

This article provides an in‑depth technical comparison of Apache Kylin and Apache Doris, covering their system architectures, aggregation and detail data models, storage engines, data import processes, query execution, deduplication, metadata handling, performance, high availability, maintainability, usability, schema‑change capabilities, features, and community ecosystems.

Apache DorisApache KylinBig Data

0 likes · 21 min read

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations