Tagged articles
3697 articles
Page 29 of 37
Efficient Ops
Efficient Ops
Oct 14, 2019 · Operations

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

This article shares a practical case study of implementing AIOps in an online‑education company, covering the background pain points of massive monitoring data, the designed architecture with real‑time processing and machine‑learning pipelines, and the challenges and opportunities of intelligent operations.

Big DataIT Operationsaiops
0 likes · 14 min read
How AIOps Transforms IT Operations: Real-World Architecture and Lessons
JD Retail Technology
JD Retail Technology
Oct 14, 2019 · Databases

Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases

The article introduces JDNoSQL, a distributed column‑oriented key‑value store built on HDFS, outlines its core features, describes various business scenarios including real‑time ad computation, details the system architecture with Kafka and Flink, and presents table designs for ad impression and click statistics.

Big DataFlinkKafka
0 likes · 13 min read
Overview of JDNoSQL Platform and Its Real-Time Advertising Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 14, 2019 · Big Data

Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization

This article presents a comprehensive analysis of Spark PageRank performance, detailing the algorithm's basics, the original example code, and four key optimizations—caching with checkpointing, memory‑efficient data structures, handling data skew, and maximizing executor and driver resource usage—backed by experimental results and practical recommendations.

Big DataCacheCheckpoint
0 likes · 18 min read
Optimizing Spark PageRank: Cache, Checkpoint, Data Skew, and Resource Utilization
58 Tech
58 Tech
Oct 10, 2019 · Big Data

Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink

This article describes how 58.com’s commercial engineering team redesigned its real‑time feature‑mining pipeline—replacing a minute‑level Spark Streaming framework with Flink—to achieve sub‑second latency, higher throughput, stronger fault‑tolerance, and end‑to‑end exactly‑once semantics for user‑profile generation in the second‑hand‑car recommendation scenario.

Big DataExactly-OnceFlink
0 likes · 14 min read
Optimizing Real‑Time Feature Extraction at 58.com: Migrating from Spark Streaming to Flink
Sohu Tech Products
Sohu Tech Products
Oct 9, 2019 · Databases

HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization

This article explains HBase’s data model and provides comprehensive table‑design strategies—including column‑descriptor options, row‑key best practices, high‑vs‑wide table trade‑offs, region splitting and pre‑splitting techniques—to help achieve optimal performance and scalability in large‑scale NoSQL workloads.

Big DataColumn FamilyHBase
0 likes · 16 min read
HBase Table Design Strategies: Data Model, Column Descriptors, RowKey, Region and Performance Optimization
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 9, 2019 · Big Data

Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend

This article explains how Flink checkpoints persist state, compares the three built‑in state backends (MemoryStateBackend, FsStateBackend, RocksDBStateBackend), discusses their configurations, advantages, limitations, and provides guidance on selecting the appropriate backend for different big‑data streaming scenarios.

Big DataCheckpointFlink
0 likes · 10 min read
Choosing and Using Flink State Backends: MemoryStateBackend, FsStateBackend, and RocksDBStateBackend
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Oct 9, 2019 · Cloud Computing

The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019

The 2019 Yunqi Conference Cloud Network Forum gathered over two hundred network enthusiasts to review a decade of Alibaba data‑center networking evolution, explore emerging technologies such as AI, big data, and programmable chips, and outline the next ten years of high‑performance, data‑centric cloud networking.

Big DataHigh‑Performance Networkingnetwork architecture
0 likes · 9 min read
The Next Decade of Cloud Networking: Highlights from Alibaba Cloud Network Forum at Yunqi Conference 2019
dbaplus Community
dbaplus Community
Oct 8, 2019 · Big Data

How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases

This article shares a senior data‑platform engineer's hands‑on experience managing dozens of thousand‑node clusters, detailing nine common cluster problems and step‑by‑step solutions—including performance tuning, RPC fixes, HDFS cleanup, Hive metadata repair, Spark shuffle optimization, HBase region recovery, and Kafka bottleneck mitigation.

Big DataCluster ManagementHBase
0 likes · 17 min read
How to Master Large-Scale Cluster Management: 10 Real-World Troubleshooting Cases
Architects' Tech Alliance
Architects' Tech Alliance
Oct 7, 2019 · Industry Insights

How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions

The article traces Google’s decade‑long impact on the evolution of the PC Web era, its pioneering technologies in search, email, infrastructure, big data, cloud computing, and mobile, explaining how its philosophy both propelled and missed commercial opportunities across each wave of internet innovation.

Big DataCloud ComputingGoogle
0 likes · 11 min read
How Google’s Vision Drove the PC Web, Big Data, and Cloud Revolutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 3, 2019 · Big Data

Data Development Interview Tips and Career Guidance

This article offers practical advice for data development job interviews, explaining why Java is essential, comparing Java and Python, outlining required backend framework knowledge, discussing the role of SQL and data warehousing, and addressing work‑life concerns such as overtime and company size choices.

Big DataData EngineeringInterview
0 likes · 4 min read
Data Development Interview Tips and Career Guidance
Programmer DD
Programmer DD
Sep 29, 2019 · Big Data

Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive

This article explores whether it is technically feasible to place all 1.4 billion Chinese users into one WeChat group, analyzing population statistics, message volume, CPU processing limits, network bandwidth, storage requirements, and cost implications with supporting calculations and references.

Big DataDistributed SystemsNetwork Bandwidth
0 likes · 12 min read
Can 1.4 Billion People Share a Single WeChat Group? A Technical Deep‑Dive
Xueersi Online School Tech Team
Xueersi Online School Tech Team
Sep 27, 2019 · Big Data

Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries

This article explains how Apache Kylin, an open‑source distributed analytics engine built on Hadoop/Spark, achieves sub‑second OLAP query performance through pre‑computed cubes, a layered cuboid generation algorithm, bitmap‑based distinct counting, dimension optimization techniques, and tight integration with HBase for storage and fast SQL querying.

Apache KylinBig DataCube
0 likes · 15 min read
Design Principles and Architecture of Apache Kylin for Sub‑Second OLAP Queries
Meituan Technology Team
Meituan Technology Team
Sep 26, 2019 · Big Data

Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University

Meituan’s big‑data team and Tsinghua’s Electronic Engineering Department have launched a master‑level, credit‑bearing course that blends theory with 24 hours of hands‑on training, showcases Meituan’s real‑world data infrastructure and applications, and aims to create a recurring bridge between academia and industry while recruiting top talent.

Big DataCommercial ApplicationMeituan
0 likes · 6 min read
Big Data Technology: Commercial Applications and Practice – A Collaborative Course between Meituan and Tsinghua University
dbaplus Community
dbaplus Community
Sep 24, 2019 · Big Data

How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk

The presentation explains how Weibo leverages big‑data technologies, user profiling, and social‑first advertising models to drive commercial growth, detailing data‑driven product development, real‑time and offline data warehouses, scientific experiments, and case studies that illustrate the impact on revenue and user engagement.

AdvertisingBig DataData Warehouse
0 likes · 24 min read
How Weibo Turns Big Data into Revenue: Insights from a 2019 DAMS Talk
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 24, 2019 · Artificial Intelligence

How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time

Gaode’s engineering team presents a semi‑supervised deep‑learning framework that models road networks, extracts traffic, routing, deviation and heatmap features, and combines LSTM with ResNet to accurately identify dynamic road‑closure events, enabling both offline and real‑time detection with high confidence and business‑aligned validation.

Big DataLSTMResNetSemi-supervised Learning
0 likes · 12 min read
How Semi‑Supervised Deep Learning Detects Road Closures in Real‑Time
Snowball Engineer Team
Snowball Engineer Team
Sep 24, 2019 · Big Data

Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook

The article introduces Snowball's AIBO data middle platform, detailing its storage‑compute separation architecture, core capabilities such as data integration, catalog, tagging, analysis tools, micro‑service data APIs, and outlines future enhancements for security, lineage, and continuous business‑driven iteration.

Big DataData AnalysisData Catalog
0 likes · 12 min read
Snowball Data Middle Platform (AIBO): Architecture, Capabilities, and Future Outlook
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 24, 2019 · Big Data

Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing

Alibaba’s 10‑year‑old search engine combines data source aggregation, incremental and real‑time indexing, and online services through platforms like Tisplus, Bahamut, Maat, Ha3, Build Service and Drogo, illustrating a comprehensive architecture that powers 1688’s search capabilities across multiple engines and deployment pipelines.

Backend ArchitectureBig DataDistributed Systems
0 likes · 10 min read
Inside Alibaba’s 10‑Year Search Engine: Architecture, Data Flow, and Indexing
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 23, 2019 · Big Data

Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation

This article describes Meituan’s large‑scale OLAP requirements, how Apache Kylin was integrated to meet them, the architectural solutions, performance benchmarks against other engines, and future work, providing practical insights for building stable, precise, and high‑performance analytics platforms.

Apache KylinBig DataData Warehouse
0 likes · 20 min read
Applying Apache Kylin for Large‑Scale OLAP at Meituan: Architecture, Challenges, and Performance Evaluation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 22, 2019 · Databases

Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration

This article explains how Alibaba Cloud's BDS migration service enables continuous, high‑performance migration of HBase clusters—including schema, full data, and incremental sync—across version upgrades, hardware changes, network migrations, and cross‑region scenarios, while ensuring stability and minimal impact on live workloads.

Alibaba CloudBDSBig Data
0 likes · 10 min read
Alibaba Cloud BDS Service for Non‑Stop HBase Cluster Migration
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 21, 2019 · Big Data

Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide

This tutorial explains how to run Apache Flink jobs on Kubernetes by building Docker images, deploying JobManager and TaskManager components with Kubernetes manifests, configuring high‑availability with ZooKeeper and HDFS, and using SavePoints and scaling techniques to manage and extend Flink streaming applications.

Big DataDockerFlink
0 likes · 14 min read
Deploying Apache Flink on Kubernetes: A Step‑by‑Step Guide
Beike Product & Technology
Beike Product & Technology
Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing
0 likes · 10 min read
Understanding DStream Construction and Execution in Spark Streaming
Suning Technology
Suning Technology
Sep 20, 2019 · Big Data

How Suning’s Big Data Engine Powers Smart Retail Transformation

Suning’s big‑data center, built on a 30‑year retail evolution and leveraging technologies like AI, cloud, and IoT, showcases how integrated data platforms and robust security can drive smart retail, improve services for 600 million users, and create a new competitive edge.

AIBig DataCloud Computing
0 likes · 6 min read
How Suning’s Big Data Engine Powers Smart Retail Transformation
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 19, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article presents a comprehensive analysis of Meituan's Hadoop YARN fair scheduler, detailing its architecture, resource abstractions, scheduling workflow, performance bottlenecks, fine‑grained metrics, and a series of optimization techniques—including sorting improvements, job‑skip reduction, parallel queue sorting, and robust rollout strategies—to achieve high‑throughput, low‑latency scheduling for large‑scale offline, streaming, and machine‑learning workloads.

Big DataFair SchedulerPerformance Optimization
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
FunTester
FunTester
Sep 19, 2019 · Operations

Emerging Technologies Shaping DevOps and Software Testing in the Next Decade

Over the next decade, rapid advances in IoT, AI, big data, and pervasive automation such as cognitive RPA will transform DevOps practices, driving more integrated, intelligent testing and continuous delivery pipelines, while organizations mature their digital transformation journeys to meet increasingly complex, data‑driven operational demands.

AIBig DataIoT
0 likes · 8 min read
Emerging Technologies Shaping DevOps and Software Testing in the Next Decade
Efficient Ops
Efficient Ops
Sep 18, 2019 · Databases

Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path

The article analyzes how the DBA job market is shrinking as traditional enterprises shift away from legacy systems, cloud adoption reshapes responsibilities, and DBAs face limited advancement unless they transition to architecture or data‑analytics roles, highlighting the growing risk and low reward of staying in pure DBA work.

Big DataDBADatabase Administration
0 likes · 7 min read
Why the DBA Role Is Becoming a Narrowed, High‑Risk Career Path
Youzan Coder
Youzan Coder
Sep 18, 2019 · Big Data

Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling

The article proposes using Newton’s law of cooling to score DMP user transactions, assigning higher weights to recent purchases that decay exponentially over time, deriving a cooling constant from boundary conditions, and normalizing the resulting heat‑based scores through log‑scaling and a sigmoid‑like mapping to a 0‑100 range.

Big DataDMPNewton cooling law
0 likes · 4 min read
Applying Newton's Law of Cooling to Transaction Scoring in DMP User Profiling
DataFunTalk
DataFunTalk
Sep 17, 2019 · Artificial Intelligence

Machine Learning for Personalized Education Paths – Case Study and Reflections

This lecture explores how machine learning can generate individualized learning pathways for students by building knowledge dependency graphs, defining optimization goals, and leveraging historical data to rank candidate routes, while reflecting on data, model, business, and demand challenges in AI-driven education.

AIBig DataKnowledge Graph
0 likes · 10 min read
Machine Learning for Personalized Education Paths – Case Study and Reflections
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataFlow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 12, 2019 · Artificial Intelligence

AI Technology Practice and Application in Entertainment

The iQiyi Technology Salon’s AI Technology Practice and Application series explains how AI reshapes entertainment by automating video and audio production, optimizing short‑video flows, enabling intelligent search, and leveraging big‑data analytics for behavior analysis, intent recognition, and personalized recommendations, supported by iQiyi’s robust AI platform.

AI technologyBig DataEntertainment Industry
0 likes · 7 min read
AI Technology Practice and Application in Entertainment
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 11, 2019 · Big Data

Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan

This article reviews the evolution and key components of big data platforms at leading Chinese internet companies—Taobao, Didi, and Meituan—detailing their data sources, synchronization tools, storage layers, processing engines, and scheduling systems to provide practical guidance for building robust big data infrastructures.

Big DataData PlatformETL
0 likes · 9 min read
Big Data Technology and Architecture: Case Studies of Taobao, Didi, and Meituan
Tencent Cloud Developer
Tencent Cloud Developer
Sep 11, 2019 · Big Data

YARN Practice and Technical Evolution at Kuaishou

Jiaoxiao Fang’s talk details Kuaishou’s YARN deployment, covering its architecture, support for offline, real‑time and ML workloads, and recent enhancements such as event‑handling stability, refined preemption, high‑throughput parallel scheduling, shuffle‑caching for small I/O, plus plans for job protection and multi‑cluster resource utilization.

Big DataCluster OptimizationDistributed Systems
0 likes · 16 min read
YARN Practice and Technical Evolution at Kuaishou
DataFunTalk
DataFunTalk
Sep 10, 2019 · Big Data

Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning

The article explains why businesses must embrace the rapid, non‑linear growth of data and machine‑learning technologies, illustrating how data volume and richer information can drive exponential business value, improve competitiveness, and create sustainable positive feedback loops across various industry scenarios.

AIBig DataDigital Transformation
0 likes · 13 min read
Why We Should Ride the Big Data Carriage: Business Perspectives on Data Growth and Machine Learning
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2019 · Databases

Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%

Tencent engineers improved Elasticsearch’s high‑concurrency write path, reducing the time to load ten million records from eighteen to fifteen minutes—a 20 % speed boost—earning thanks from Elastic’s CEO and showcasing the company’s broader open‑source contributions and strategic cloud‑search partnership.

Big DataElasticsearchOpen Source
0 likes · 6 min read
Tencent Optimizes Elasticsearch High-Concurrency Write Performance, Cutting 10M Data Load Time by 20%
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 9, 2019 · Big Data

Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value

This article explains how unstructured data—comprising documents, images, audio, video and more—now dominates over 80% of all data, outlines its characteristics and challenges, compares it with structured data, and showcases real-world AI applications such as ImageNet, intelligent customer service and smart security, while proposing a roadmap for building a unified unstructured‑data asset.

Big Datadata analyticsmachine learning
0 likes · 15 min read
Unlocking the Power of Unstructured Data: From AI Breakthroughs to Business Value
58 Tech
58 Tech
Sep 6, 2019 · Big Data

Architecture and Technical Implementation of the WMDA Data Analytics Platform

The article details WMDA's end‑to‑end data analytics architecture, covering zero‑event data collection, real‑time and offline processing pipelines built on Spark Streaming, Druid, Hadoop, Kettle, and TaskServer, and explains how these components collaborate to deliver comprehensive user behavior analysis.

Big DataDruidETL
0 likes · 11 min read
Architecture and Technical Implementation of the WMDA Data Analytics Platform
360 Tech Engineering
360 Tech Engineering
Sep 4, 2019 · Big Data

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

Big DataDistributed QuerySQL Federation
0 likes · 14 min read
XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 4, 2019 · Big Data

How Structured Big Data Storage Powers Modern Data Systems

This article explores the core components of data systems, the evolution toward lightweight, intelligent big data architectures, the distinction between primary and secondary storage, challenges of data replication, and how Alibaba Cloud's Tablestore implements advanced features such as storage‑compute separation, CDC, and multi‑model indexing for scalable, cost‑effective structured big data storage.

Big DataCDCCloud Services
0 likes · 24 min read
How Structured Big Data Storage Powers Modern Data Systems
DataFunTalk
DataFunTalk
Sep 3, 2019 · Big Data

The Value of Big Data in Machine Learning: Detailed Illustration and Insights

This article explains how big data enhances machine learning by enabling finer-grained data characterization, improving confidence in statistical conclusions, and supporting smarter learning through multiple stages of model development, illustrated with concrete examples and a discussion of sample size dilemmas.

Big DataData Analysismachine learning
0 likes · 10 min read
The Value of Big Data in Machine Learning: Detailed Illustration and Insights
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Sep 3, 2019 · Big Data

Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong

This article details the design, implementation, and optimization of a Flink‑based real‑time computing platform at Tongcheng‑Elong, covering the evolution from Storm to Flink, support for FlinkSQL and FlinkStream, metric collection, logging, data lineage, savepoint management, and numerous stability fixes contributed back to the open‑source community.

Big DataData LineageFlink
0 likes · 16 min read
Practical Experiences and Lessons Learned in Building a Flink‑Based Real‑Time Computing Platform at Tongcheng‑Elong
Tencent Cloud Developer
Tencent Cloud Developer
Aug 30, 2019 · Big Data

How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing

The cloud+ community and Kuaishou hosted a big‑data technology salon where experts detailed the evolution, architecture, and practical deployments of Spark‑based cloud data warehouses, ElasticSearch, Yarn, and Flink, highlighting trends, optimization techniques, and future directions for enterprise data analytics.

Big DataCloud ComputingData Warehouse
0 likes · 22 min read
How Tencent Cloud Leverages Spark, ElasticSearch, and Flink for PB‑Scale Data Warehousing
Beike Product & Technology
Beike Product & Technology
Aug 29, 2019 · Big Data

TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project

This article introduces TiSpark—an extension of Spark that tightly integrates with TiDB/TiKV to enable high‑performance, scalable data synchronization and OLAP queries, details its architecture, key configuration, performance advantages over Spark SQL and Sqoop, and outlines its role in the Databus data‑integration platform.

Big DataData IntegrationPerformance Optimization
0 likes · 10 min read
TiSpark Integration with TiDB/TiKV for Efficient Data Synchronization and OLAP in the Databus Project
360 Smart Cloud
360 Smart Cloud
Aug 29, 2019 · Artificial Intelligence

360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain

At the 2019 World Artificial Intelligence Conference, the Ministry of Science and Technology announced ten national AI open‑innovation platforms, selecting 360 to lead the security‑brain platform, highlighting its role in AI‑driven cybersecurity, big‑data analytics, cloud and blockchain technologies.

360Big DataInformation Security
0 likes · 4 min read
360 Selected to Build a National New‑Generation AI Open Innovation Platform for a Security Brain
58 Tech
58 Tech
Aug 29, 2019 · Information Security

Graph-Based Anomaly Detection Framework for Security Threats

The article presents a graph‑based anomaly detection architecture that tackles black‑market resource switching by constructing complex user‑traffic networks, mining graph similarities, and applying multi‑dimensional strategies to achieve high‑accuracy detection while meeting timeliness, performance, and interpretability requirements.

Anomaly DetectionBig DataInformation Security
0 likes · 8 min read
Graph-Based Anomaly Detection Framework for Security Threats
Xianyu Technology
Xianyu Technology
Aug 28, 2019 · Big Data

Unified Search System Architecture and Automation for Multiple Business Scenarios

To avoid building separate search services for each Xianyu business, the team created a unified, generic search architecture based on Alibaba’s HA3 engine and a control layer that automates data dumping, indexing, query translation, and result ranking across five subsystems, enabling new services to be onboarded in minutes instead of weeks.

Big Dataautomationdata pipeline
0 likes · 18 min read
Unified Search System Architecture and Automation for Multiple Business Scenarios
dbaplus Community
dbaplus Community
Aug 27, 2019 · Big Data

How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming

This article explains how eBay’s Sherlock.IO monitoring platform processes billions of logs, events, and metrics daily using Flink Streaming jobs, detailing a metadata‑driven architecture, shared job strategies, Heartbeat‑based monitoring, job isolation, back‑pressure handling, and real‑world use cases such as Event Alerting, Eventzon, and Netmon.

Big DataFlinkReal-time Processing
0 likes · 18 min read
How eBay Scales Real‑Time Monitoring with Flink: Metadata‑Driven Streaming
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataData Engineering
0 likes · 10 min read
Comprehensive Collection of Apache Flink Learning Resources
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 25, 2019 · Big Data

Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink

This article recounts Tencent's journey from adopting Flink to building the Oceanus platform, detailing its architecture, product features, and a series of deep extensions—including UI redesign, JobManager failover, checkpoint handling, enhanced windows, LocalKeyBy, watermark idle detection, and log isolation—aimed at supporting trillion‑scale real‑time data processing.

Big DataFlinkOceanus
0 likes · 18 min read
Tencent Oceanus: Evolution, Productization, and Optimizations of Real‑Time Stream Computing with Flink
Architects' Tech Alliance
Architects' Tech Alliance
Aug 24, 2019 · Big Data

Reimagining Big Data in a Post‑Hadoop World

The article analyzes the decline of Hadoop as the dominant big‑data platform, explains how cloud‑based services are replacing its complex on‑premises architecture, and outlines the lessons and future directions for enterprises navigating a post‑Hadoop landscape.

Big DataDistributed SystemsHadoop
0 likes · 12 min read
Reimagining Big Data in a Post‑Hadoop World
Youzan Coder
Youzan Coder
Aug 23, 2019 · Big Data

How to Build a Robust Event Logging Quality System with Real‑Time Validation

This article outlines common event‑logging quality problems, a systematic registration and real‑time validation framework built on Flink, configurable rule syntax, explainable results, continuous monitoring, targeted optimizations, and an evaluation model that together form a comprehensive quality‑center for big‑data platforms.

Big DataData QualityFlink
0 likes · 11 min read
How to Build a Robust Event Logging Quality System with Real‑Time Validation
Qunar Tech Salon
Qunar Tech Salon
Aug 22, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Big DataFair SchedulerLoad Simulation
0 likes · 23 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
Big Data Technology Architecture
Big Data Technology Architecture
Aug 21, 2019 · Big Data

Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage

This article explains fundamental big‑data concepts by comparing offline (batch) and real‑time (stream) computing, distinguishing real‑time queries from ad‑hoc queries, clarifying OLTP versus OLAP workloads, and outlining the differences between row‑based and column‑based storage architectures.

Big DataColumn StorageOLAP
0 likes · 5 min read
Key Big Data Terminology: Offline vs Real-time Computing, Real-time vs Ad Hoc Queries, OLTP vs OLAP, Row vs Column Storage
DataFunTalk
DataFunTalk
Aug 20, 2019 · Artificial Intelligence

The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible

This article explains why machine learning relies on big‑data statistical learning, illustrating human learning through induction and deduction, presenting case studies that highlight the limits of anecdotal reasoning, and introducing the law of large numbers and probabilistic trust as foundations for reliable AI models.

Big DataLearning Theorymachine learning
0 likes · 19 min read
The Story of Machine Learning: Why Machines Can Learn and How Statistical Learning Makes It Possible
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 18, 2019 · Big Data

Flink Application Scenarios and Scale at Kuaishou

The article details how Kuaishou leverages Apache Flink for large‑scale stream processing, describing its application scenarios, cluster sizing, interval join optimization, RocksDB performance challenges, source throttling strategies, JobManager stability, frequent job failures, and platform‑wide improvements.

Big DataFlinkKuaishou
0 likes · 2 min read
Flink Application Scenarios and Scale at Kuaishou
Architects' Tech Alliance
Architects' Tech Alliance
Aug 18, 2019 · Big Data

Oracle Architecture and ASM Storage Configuration Overview

This article provides a comprehensive overview of Oracle database architecture, detailing memory, physical and logical structures, I/O characteristics of various files, differences between OLTP and OLAP workloads, and practical ASM configuration and storage optimization recommendations for high‑performance environments.

ASMBig DataDatabase Storage
0 likes · 12 min read
Oracle Architecture and ASM Storage Configuration Overview
Didi Tech
Didi Tech
Aug 17, 2019 · Industry Insights

How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management

This article analyzes how Didi’s unique ride‑hailing scenario big data is applied to automotive finance, detailing the business model, asset‑side and full‑process risk challenges, data‑driven solutions, and future prospects for intelligent credit risk control in both enterprise and retail lending.

Big DataCredit ScoringDidi
0 likes · 14 min read
How Didi’s Ride‑Sharing Data Transforms Automotive Finance Risk Management
Youku Technology
Youku Technology
Aug 15, 2019 · Big Data

Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights

Youku’s 2017 migration from an on‑premises Hadoop cluster to Alibaba Cloud MaxCompute delivered a unified, elastic data pipeline that cut compute and storage costs by roughly half, handled billions of daily log records, boosted performance and scalability, and empowered analysts with self‑service tools and a rich ecosystem.

Big DataData MigrationMaxCompute
0 likes · 12 min read
Youku's Migration from Hadoop to Alibaba Cloud MaxCompute: Benefits and Technical Insights
DataFunTalk
DataFunTalk
Aug 14, 2019 · Artificial Intelligence

Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions

The article explores how the rapid growth of the internet has created information overload, discusses the challenges of recommendation systems such as sparsity and timeliness, outlines a four‑step personalized content pipeline, and highlights the interdisciplinary nature of building effective AI‑driven recommendation solutions.

AIBig DataData Engineering
0 likes · 16 min read
Understanding Recommendation Systems: From Information Overload to Personalized AI Solutions
Youzan Coder
Youzan Coder
Aug 14, 2019 · Big Data

Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms

The guide explains how comprehensive data collection in big‑data platforms relies on a standardized event model, passive and code‑based embedding, multi‑platform SDKs, a log‑middleware layer, precise location tracking, and an embedding management platform that supports workflow, testing, quality monitoring, and scalable infrastructure for future enhancements.

AnalyticsBig DataData Collection
0 likes · 19 min read
Comprehensive Guide to Data Collection, Event Modeling, and Tracking in Big Data Platforms
Architecture Digest
Architecture Digest
Aug 14, 2019 · Big Data

Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model

Kafka is a distributed, partitioned, replicated messaging system originally developed by LinkedIn, offering high throughput, low latency, fault tolerance, and scalability; this article explains its core concepts, file storage design, partition replication, leader election, consumer groups, delivery guarantees, and operational considerations for big‑data pipelines.

Big DataDistributed SystemsKafka
0 likes · 56 min read
Kafka Overview: Architecture, Storage Mechanism, Replication, and Consumer/Producer Model
Amap Tech
Amap Tech
Aug 13, 2019 · Artificial Intelligence

2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)

At the 2019 Alibaba Cloud Yunci Conference in Hangzhou, Gaode Technology presented a comprehensive technical forum covering visual intelligence, autonomous-driving perception, the evolution of its client and traffic-access architecture, fine-grained positioning, route-planning algorithms, and spatio-temporal data applications, featuring expert talks from Gaode and Alibaba specialists.

Big DataLocation‑based servicescloud-native
0 likes · 8 min read
2019 Alibaba Cloud Yunci Conference – Gaode Technology Session (Sept 27)
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 12, 2019 · Big Data

Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)

This article explains how to troubleshoot and tune Spark SQL configuration parameters—covering exception‑related settings such as spark.sql.hive.convertMetastoreParquet, file‑ignore options, and partition verification, as well as performance‑focused tweaks like broadcast join thresholds, adaptive execution, and parquet schema merging—while providing a comprehensive parameter reference table.

Big DataHive MigrationParameter Tuning
0 likes · 23 min read
Spark SQL Parameter Tuning and Performance Optimization (Spark 2.3.2)
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 11, 2019 · Big Data

Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations

This article examines Flink’s industrial‑scale network stack, detailing the credit‑based flow control introduced in version 1.5, the refactored task‑IO thread collaboration, and serialization optimizations that together improve throughput and latency for large‑scale stream processing workloads.

Big DataCredit-based Flow ControlFlink
0 likes · 12 min read
Deep Dive into Flink’s Network Stack: Credit‑Based Flow Control and Thread Model Optimizations
DevOps Cloud Academy
DevOps Cloud Academy
Aug 11, 2019 · Big Data

Overview of MFS Distributed File System Architecture Similar to GoogleFS

The article explains the MFS distributed file system, detailing its four components—Master, Metalogger, Chunkserver, and Client—along with hardware recommendations, metadata handling, replication strategies, and FUSE‑based client mounting, providing a comprehensive guide to building a GoogleFS‑like storage cluster.

Big DataDistributed File SystemMFS
0 likes · 5 min read
Overview of MFS Distributed File System Architecture Similar to GoogleFS
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 8, 2019 · Big Data

Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization

This article provides an in‑depth overview of Apache Kylin’s pre‑computation architecture, data‑warehouse concepts, step‑by‑step cube creation from Hive tables, and advanced optimization techniques such as derived dimensions, aggregation groups, and HBase row‑key encoding to achieve sub‑second OLAP queries on massive datasets.

Apache KylinBig DataCube
0 likes · 20 min read
Comprehensive Guide to Apache Kylin: Architecture, Concepts, Cube Design and Optimization
360 Quality & Efficiency
360 Quality & Efficiency
Aug 8, 2019 · Big Data

An Introduction to Kafka: Architecture, Design Principles, and Common Issues

This article introduces Kafka, covering its definition, core concepts such as topics, partitions, offsets, producers and consumers, typical use cases, underlying design principles including message‑partition allocation and retention policies, processing mechanisms, and common troubleshooting questions for real‑world deployments.

Big DataDistributed MessagingKafka
0 likes · 7 min read
An Introduction to Kafka: Architecture, Design Principles, and Common Issues
vivo Internet Technology
vivo Internet Technology
Aug 7, 2019 · Big Data

Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management

The article gives a thorough overview of Apache Kafka, explaining its core concepts, architecture, deployment steps, monitoring tools, and offset management, including broker and topic structures, producer/consumer APIs, replication, leader election, consumer groups, offset committing, and practical configuration and troubleshooting guidance.

Big DataKafkaMessaging
0 likes · 36 min read
Understanding Apache Kafka: Concepts, Architecture, Deployment, Monitoring and Offset Management
dbaplus Community
dbaplus Community
Aug 6, 2019 · Databases

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

Big DataCtripData Warehouse
0 likes · 13 min read
How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2019 · Big Data

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

The article provides a comprehensive overview of recent Apache Spark advancements—including Delta Lake, Data Source V2, runtime optimizations, relational cache, cloud‑native challenges, AI integration via Project Hydrogen, and the anticipated features of Spark 3.0—highlighting how these innovations address modern data‑warehouse, cloud, and machine‑learning workloads.

Apache SparkBig DataData Warehouse
0 likes · 17 min read
Apache Spark Latest Technological Developments and Outlook for Spark 3.0+
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 5, 2019 · Cloud Computing

How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data

Alibaba’s senior tech expert Mu Jian explains how Tmall’s smart stores embody the new retail paradigm by leveraging cloud computing, big data, and digital tools to transform offline retail, enhance consumer experiences, streamline operations, and create integrated online‑offline ecosystems through cloud stores, cloud POS, and innovative marketing solutions.

Big DataCloud ComputingDigital Transformation
0 likes · 25 min read
How Tmall’s Smart Stores Are Redefining New Retail with Cloud and Data
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 3, 2019 · Big Data

Understanding SparkEnv Initialization: Components and Their Setup

This article walks through the SparkEnv initialization process in Apache Spark, detailing how the driver and executor environments are created, the key components such as SecurityManager, RpcEnv, SerializerManager, BroadcastManager, MapOutputTracker, ShuffleManager, MemoryManager, BlockManager, MetricsSystem, and OutputCommitCoordinator are instantiated, and how the final SparkEnv instance is assembled and stored.

Big DataDistributed computingScala
0 likes · 13 min read
Understanding SparkEnv Initialization: Components and Their Setup
Suning Technology
Suning Technology
Aug 2, 2019 · Big Data

How SuNing Uses Big Data to Revolutionize Retail Supply Chains

At the 15th China (Nanjing) International Software Expo, SuNing's VP shared how the company applies big‑data analytics, the C2M model, and flexible manufacturing to personalize retail experiences, bridge online‑offline gaps, and drive data‑driven product development and supply‑chain efficiency.

Big DataC2Mdata driven
0 likes · 9 min read
How SuNing Uses Big Data to Revolutionize Retail Supply Chains
Meituan Technology Team
Meituan Technology Team
Aug 1, 2019 · Big Data

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Meituan improved its custom Hadoop YARN Fair Scheduler by pre‑computing resource usage, filtering zero‑demand jobs, and parallelizing queue sorting, which reduced sorting time from 30 s to 5 s per minute, boosted container‑per‑second throughput to 50 k, enabled live roll‑backs, and prepared the system for clusters up to 10 k nodes and future scaling to hundreds of thousands.

Big DataFair SchedulerHadoop
0 likes · 24 min read
Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler
21CTO
21CTO
Jul 31, 2019 · Artificial Intelligence

How JD Built a Scalable AI‑Powered Recommendation System

The article outlines JD’s evolution from rule‑based product suggestions in 2012 to a sophisticated, AI‑driven, multi‑screen personalized recommendation platform, detailing its product types, system architecture, data collection, offline and online computation, and the core recommendation engine that powers features like “Guess You Like.”

AIBig DataJD.com
0 likes · 14 min read
How JD Built a Scalable AI‑Powered Recommendation System
dbaplus Community
dbaplus Community
Jul 30, 2019 · Big Data

Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?

With the surge in real‑time data from sensors and devices, choosing the right streaming engine is critical; this article compares Apache Spark and Apache Flink—examining their architectures, micro‑batch vs continuous processing, strengths, limitations, and use‑case suitability for Kafka‑driven pipelines.

Big DataFlinkKafka
0 likes · 14 min read
Spark vs Flink: Which Real‑Time Engine Should You Choose for Kafka Streams?
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 29, 2019 · Databases

Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations

This article provides an in‑depth technical comparison of Apache Kylin and Apache Doris, covering their system architectures, aggregation and detail data models, storage engines, data import processes, query execution, deduplication, metadata handling, performance, high availability, maintainability, usability, schema‑change capabilities, features, and community ecosystems.

Apache DorisApache KylinBig Data
0 likes · 21 min read
Comprehensive Comparison of Apache Kylin and Apache Doris: Architecture, Data Models, Storage, Query, and Operations