Tagged articles

3697 articles

Page 23 of 37

Feb 16, 2021 · Big Data

Understanding Presto: Architecture, Query Execution, and Youzan’s Practical Experience

This article explains Presto’s core architecture and low‑latency query execution process, describes how Youzan adopts Presto for various data‑platform scenarios, discusses the evolution of its deployment, and outlines the performance challenges and future enhancements such as Alluxio integration and session property management.

Big DataPerformance OptimizationYouzan

0 likes · 13 min read

Architect

Feb 15, 2021 · Big Data

Elasticsearch Optimization Practices for Large-Scale Data Queries

This article explains how to optimize Elasticsearch for cross‑month and multi‑year queries on billions of records, covering Lucene fundamentals, index and search performance tweaks, configuration settings, and practical testing results to achieve sub‑second response times.

Big DataElasticsearchOptimization

0 likes · 14 min read

Elasticsearch Optimization Practices for Large-Scale Data Queries

DataFunTalk

Feb 15, 2021 · Big Data

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

This article presents Meituan's use of Flink to enable incremental data warehouse production, covering the warehouse architecture, streaming data integration evolution, real-time OLAP applications, platform design, and future directions for unified stream‑batch processing.

Big DataFlinkIncremental Processing

0 likes · 11 min read

Flink-Driven Incremental Data Warehouse Production at Meituan: Architecture, Streaming Integration, and Future Plans

Architecture Digest

Feb 15, 2021 · Operations

ELK Stack Overview, Architecture, Installation and Configuration Guide (Version 7.7.0)

This article provides a comprehensive introduction to the ELK stack—Elasticsearch, Logstash, Kibana, and Filebeat—including its components, why it’s used for centralized log management, detailed architecture diagrams, step‑by‑step installation commands, configuration examples, and a practical Kafka‑based data pipeline demonstration.

Big DataELKElasticsearch

0 likes · 22 min read

ELK Stack Overview, Architecture, Installation and Configuration Guide (Version 7.7.0)

DataFunTalk

Feb 14, 2021 · Big Data

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

This talk presents NetEase's practical experience with Impala, covering its core architecture, new features in version 3.x, integration with Apache Iceberg, a custom management platform, profiling and statistics enhancements, as well as future plans involving Kubernetes, Alluxio caching and pre‑computation strategies.

Apache IcebergBig DataCluster Management

0 likes · 13 min read

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

DataFunTalk

Feb 13, 2021 · Databases

Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC

This article describes how the Didi HBase team tackled HBase’s weak availability and GC‑induced latency spikes by introducing a replication‑based client multi‑path read mechanism, configuring hedged reads, and adopting the Z Garbage Collector, and presents the resulting performance improvements and remaining challenges.

Big DataHBaseMulti-Path Read

0 likes · 11 min read

Improving HBase Availability and Reducing Latency Spikes with Replication‑Based Multi‑Path Reads and ZGC

DataFunTalk

Feb 12, 2021 · Big Data

Apache Flink at Kuaishou: Past, Present, and Future

Zhao Jianbo, head of Kuaishou's big data architecture team, presents an in‑depth overview of Apache Flink's adoption at Kuaishou, covering reasons for selection, development history, business data flows, technical innovations such as the Slimbase state engine, stability improvements, and future roadmap.

Apache FlinkBig DataKuaishou

0 likes · 16 min read

Apache Flink at Kuaishou: Past, Present, and Future

DataFunTalk

Feb 10, 2021 · Big Data

AirWorks Data Intelligence Platform: Architecture, Cloud‑Native Ingestion, and Financial Asset Management Use Case

The article presents Entropy Simplify's AirWorks data intelligence platform, detailing its three‑layer architecture, cloud‑native multi‑source data ingestion system, low‑code ETL capabilities, technical features such as multi‑engine cooperation and data‑skew handling, and a financial asset‑management case study.

Big DataETLFinancial Services

0 likes · 16 min read

AirWorks Data Intelligence Platform: Architecture, Cloud‑Native Ingestion, and Financial Asset Management Use Case

Alibaba Cloud Native

Feb 10, 2021 · Cloud Native

Accelerate AI and Big Data Workloads on Kubernetes with Fluid’s JindoRuntime

Fluid is an open‑source Kubernetes‑native engine that orchestrates and accelerates distributed datasets for AI and big‑data workloads, and this guide explains its core concepts, the JindoRuntime implementation, performance benefits, and step‑by‑step instructions to deploy and test JindoRuntime on a K8s cluster.

AIBig DataData Acceleration

0 likes · 14 min read

Accelerate AI and Big Data Workloads on Kubernetes with Fluid’s JindoRuntime

DataFunTalk

Feb 9, 2021 · Big Data

Design and Implementation of a Full‑Chain Marketing Data Product at NetEase Yanxuan

This article details NetEase Yanxuan's business background, market characteristics, data product requirements, and the end‑to‑end design of a full‑chain marketing data product, covering attribution, metric evaluation, analysis frameworks, scenario‑based recommendations, and practical Q&A for data‑driven growth.

Big DataData ProductMarketing Analytics

0 likes · 18 min read

Design and Implementation of a Full‑Chain Marketing Data Product at NetEase Yanxuan

dbaplus Community

Feb 9, 2021 · Operations

How Suning Integrated ClickHouse into a Full‑Link Monitoring Platform for Real‑Time OLAP Insights

This article explains how Suning's big‑data team incorporated ClickHouse into their end‑to‑end monitoring ecosystem, detailing the architecture, trace‑ID propagation, slow‑query tracking, MergeTree health checks, replica delay analysis, and the role of Chproxy in delivering comprehensive observability for high‑performance OLAP workloads.

Big DataClickHouseOLAP

0 likes · 15 min read

How Suning Integrated ClickHouse into a Full‑Link Monitoring Platform for Real‑Time OLAP Insights

DataFunTalk

Feb 8, 2021 · Big Data

Ozone: The Next‑Generation Distributed Storage System Aiming to Replace HDFS

This article explains how Apache Ozone, built on the HDDS layer, addresses the scalability, memory, and performance limitations of HDFS by splitting metadata services, using RocksDB, implementing fine‑grained locking, RAFT‑based HA, and offering rich APIs, while outlining current challenges and future roadmap.

Big DataHDDSHDFS

0 likes · 29 min read

Ozone: The Next‑Generation Distributed Storage System Aiming to Replace HDFS

Fangduoduo Tech

Feb 8, 2021 · Big Data

Why Build Your Own Data Lineage Engine? Lessons from Apache Atlas to Duo-Lineage

This article explains what data lineage is, why it is essential for data governance in large‑scale big‑data platforms, compares Apache Atlas with a custom solution, and details the technical choices, architecture, and performance optimizations behind the self‑built duo‑lineage system.

Apache AtlasBig DataData Lineage

0 likes · 14 min read

Why Build Your Own Data Lineage Engine? Lessons from Apache Atlas to Duo-Lineage

Efficient Ops

Feb 7, 2021 · Artificial Intelligence

How NLP Transforms Big Data Operations: Real-World AIOps Case Studies

This article explores the intersection of natural language processing and operations, outlines common text‑handling challenges, and presents three concrete AIOps case studies—log Q&A, anomaly detection, and ticket recommendation—while reflecting on a closed‑loop AI workflow and future research directions.

Big DataNLPaiops

0 likes · 9 min read

How NLP Transforms Big Data Operations: Real-World AIOps Case Studies

Architects' Tech Alliance

Feb 7, 2021 · Operations

Understanding the Essence and Implementation of Enterprise Digital Transformation

The article explains what digital transformation truly means for enterprises, outlines its three development stages, describes the core connection‑data‑intelligence framework, compares internal capability rebuilding with external ecosystem integration, and offers practical guidance on why and how companies should embark on digital transformation.

Big DataDigital TransformationOperations

0 likes · 24 min read

Understanding the Essence and Implementation of Enterprise Digital Transformation

DataFunTalk

Feb 7, 2021 · Big Data

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

This article, presented by Tencent senior engineer Du Li, details the current state of Flink SQL, compares Jar, Canvas, and SQL modes, introduces window‑function extensions, retract‑stream optimizations, and outlines future roadmap plans for cost‑based optimization and new features in the real‑time computing platform.

Big DataFlinkRetract Stream

0 likes · 19 min read

Optimizations and Extensions for Flink SQL in Tencent Real‑Time Computing Platform

Open Source Linux

Feb 7, 2021 · Big Data

Mastering Kafka: Core Concepts, Architecture, and High‑Performance Deployment

This comprehensive guide explains Kafka's role as a message system, detailing topics, partitions, producers, consumers, replication, controller, ZooKeeper coordination, performance optimizations like sequential writes and zero‑copy, and practical recommendations for hardware, configuration, and cluster deployment.

Big DataCluster DeploymentKafka

0 likes · 22 min read

Mastering Kafka: Core Concepts, Architecture, and High‑Performance Deployment

DataFunTalk

Feb 5, 2021 · Big Data

Design and Implementation of Beike's Data Management Platform (DMP)

This article details how Beike built a comprehensive Data Management Platform (DMP) that integrates user behavior and business data across multiple apps, outlines its five‑layer architecture, discusses data collection, processing, storage, real‑time profiling, and presents performance results and future optimization directions.

Big DataDMPData engineering

0 likes · 20 min read

Design and Implementation of Beike's Data Management Platform (DMP)

NetEase Yanxuan Technology Product Team

Feb 5, 2021 · Big Data

NetEase Yanxuan Data Task Governance Practice: Pre‑, In‑, and Post‑Operation Strategies

NetEase Yanxuan tackled data‑task governance by establishing pre‑operation guarantees, baseline‑driven in‑operation controls, and post‑operation interventions, delivering stable task output, reduced alarms, lineage awareness, rapid incident recovery, and reusable best‑practice products that earned the 2020 Technology Sharing Co‑building Award.

Baseline ManagementBig DataTask Operation

0 likes · 25 min read

NetEase Yanxuan Data Task Governance Practice: Pre‑, In‑, and Post‑Operation Strategies

ITFLY8 Architecture Home

Feb 4, 2021 · Big Data

Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics

This article provides a comprehensive overview of data middle platform concepts, covering data aggregation, ingestion tools, offline and real‑time development, scheduling, baseline control, heterogeneous storage, recommendation dependencies, data permissions, layered data architecture (ODS, DW, DWD, DWS, TDM, ADS), asset management, governance, service APIs, query and analysis services, as well as monitoring, alerting, and operational best practices for building robust big‑data solutions.

Big DataData WarehouseETL

0 likes · 25 min read

Unlocking Data Middle Platform: From Ingestion to Real‑Time Analytics

Full-Stack Internet Architecture

Feb 1, 2021 · Big Data

Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts

This article provides a comprehensive introduction to Apache Kafka, covering its distributed publish‑subscribe architecture, its key components such as brokers, topics, partitions, producers, consumers, and ZooKeeper, as well as its advantages, drawbacks, storage mechanisms, partition assignment strategies, and reliability guarantees for high‑throughput big‑data streaming.

Big DataDistributed SystemsMessage queue

0 likes · 20 min read

Kafka Overview: Architecture, Advantages, Disadvantages, and Core Concepts

DataFunTalk

Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data

0 likes · 21 min read

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

Top Architect

Feb 1, 2021 · Big Data

The Origin of Elasticsearch: From a Cooking App Prototype to a Distributed Search Engine

This article recounts how Shay Banon's early cooking‑app project led to the creation of Compass, the evolution of Apache Lucene, and ultimately the development of Elasticsearch—a powerful, distributed search platform built with extensive testing infrastructure and inspired by futuristic data‑interaction concepts.

Apache LuceneBig DataElasticsearch

0 likes · 9 min read

The Origin of Elasticsearch: From a Cooking App Prototype to a Distributed Search Engine

Architects' Tech Alliance

Jan 29, 2021 · Artificial Intelligence

Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies

This article provides a detailed introduction to machine learning, covering its definition, learning modes such as supervised, unsupervised and reinforcement learning, shallow versus deep learning, the full industry chain from AI chips to cloud and big‑data services, and the major open‑source frameworks and platforms driving the field.

AI chipsBig DataUnsupervised Learning

0 likes · 11 min read

Comprehensive Overview of Machine Learning: Types, Industry Chain, and Key Technologies

Big Data Technology & Architecture

Jan 28, 2021 · Big Data

Understanding Data Lakes: Definitions, Benefits, Architectures, and Technology Choices

Data lakes, emerging since 2020, are centralized repositories that store structured and unstructured data at any scale, offering flexible analytics, but require robust management to avoid becoming data swamps; this article explains definitions, advantages, typical architectures, and compares cloud and open‑source solutions such as AWS Lake Formation, Alibaba Cloud, Delta, Iceberg, and Hudi.

AnalyticsBig Datacloud storage

0 likes · 13 min read

Understanding Data Lakes: Definitions, Benefits, Architectures, and Technology Choices

JD Cloud Developers

Jan 28, 2021 · Big Data

How JD’s Energy Management Platform Leverages ClickHouse for Real‑Time OLAP at Scale

This article explains how JD’s Energy Management Platform uses ClickHouse as an MPP‑based OLAP engine to ingest, store, and provide multi‑dimensional real‑time analytics on energy consumption data, covering architecture decisions, data pipelines, replication, sharding, and a generic query interface.

Big DataClickHouseOLAP

0 likes · 12 min read

How JD’s Energy Management Platform Leverages ClickHouse for Real‑Time OLAP at Scale

Practical DevOps Architecture

Jan 28, 2021 · Operations

Step-by-Step Guide to Installing Zookeeper and Kafka on a Kubernetes Cluster

This tutorial walks through preparing three Kubernetes nodes, extracting and distributing Zookeeper, configuring its zoo.cfg and myid files, starting and verifying the Zookeeper ensemble, then installing Kafka, adjusting its server.properties, and finally launching Kafka across the cluster.

Big DataInstallationKafka

0 likes · 6 min read

Step-by-Step Guide to Installing Zookeeper and Kafka on a Kubernetes Cluster

dbaplus Community

Jan 27, 2021 · Big Data

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Facing a massive 1500‑node Flink 1.4.2 cluster handling over 12,000 tasks and 30 trillion daily events, we migrated to Flink 1.10, detailing new DDL/Catalog support, SQL enhancements, memory tuning, compatibility patches, extensive testing, and engine optimizations such as task‑load metrics and balanced sub‑task scheduling.

Big DataFlinkPerformance Optimization

0 likes · 13 min read

How We Upgraded a 1500-Node Flink Cluster to 1.10: Challenges and Solutions

Full-Stack Internet Architecture

Jan 27, 2021 · Big Data

Introduction to Hadoop: Architecture, HDFS, MapReduce, and YARN Overview

This article provides a comprehensive overview of Hadoop, covering its origins, core components such as HDFS, MapReduce, and YARN, their architectures, data storage and processing mechanisms, fault‑tolerance features, scheduling strategies, and practical optimization techniques for large‑scale distributed computing.

Big DataDistributed computingHDFS

0 likes · 33 min read

Introduction to Hadoop: Architecture, HDFS, MapReduce, and YARN Overview

Alibaba Cloud Developer

Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI IntegrationApache FlinkBig Data

0 likes · 21 min read

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

Architects' Tech Alliance

Jan 24, 2021 · Big Data

Outline of Distributed Storage Systems: HDFS, GlusterFS, OpenStack Swift, and Ceph

This article outlines the fundamental concepts and key issues of distributed storage, provides an overview of four open‑source distributed file systems—HDFS, GlusterFS, OpenStack Swift, and Ceph—and compares their functionalities, accompanied by illustrative slide images.

Big DataCephGlusterFS

0 likes · 2 min read

Outline of Distributed Storage Systems: HDFS, GlusterFS, OpenStack Swift, and Ceph

Architect

Jan 22, 2021 · Big Data

Understanding Kafka Topic Partitions, Producer Partitioning Strategies, and Consumer Assignment

This article explains how Kafka producers decide which partition to send messages to, how topic partition counts are configured, and how consumer groups assign partitions to instances using default range and round‑robin strategies, with code examples for illustration.

Big DataConsumerKafka

0 likes · 17 min read

Understanding Kafka Topic Partitions, Producer Partitioning Strategies, and Consumer Assignment

Full-Stack Internet Architecture

Jan 22, 2021 · Databases

An Overview of HBase: Architecture, Design Principles, and Performance Characteristics

This article provides a comprehensive introduction to HBase, covering its origins, column‑oriented NoSQL design, storage on HDFS, logical and physical structures, read/write workflows, performance optimizations, and common interview questions for big‑data engineers.

Big DataColumnar DatabaseHBase

0 likes · 24 min read

An Overview of HBase: Architecture, Design Principles, and Performance Characteristics

Didi Tech

Jan 22, 2021 · Big Data

Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned

Didi migrated HDFS to Hadoop 3.2 and implemented erasure coding—using XOR and Reed‑Solomon RS(6,3) striping—to replace three‑replica storage for cold data, building back‑ported clients, automated conversion tools, and cross‑datacenter backup pipelines, while addressing operational bugs and noting performance trade‑offs.

Big DataDidiHDFS

0 likes · 11 min read

Erasure Coding Practice in HDFS at Didi: Principles, Implementation, and Lessons Learned

DataFunTalk

Jan 22, 2021 · Big Data

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

This article presents ByteDance's real‑world use of Apache Flink, covering the platform's overall architecture, SQL extensions, custom connectors, UI‑driven SQL platform, performance optimizations such as window mini‑batch and custom windows, dimension‑table enhancements, checkpoint recovery improvements, stream‑batch integration, and upcoming roadmap items.

Apache FlinkBig DataByteDance

0 likes · 15 min read

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

Top Architect

Jan 18, 2021 · Big Data

Migrating Over 2 Billion MySQL Records to Google BigQuery Using Kafka

This article details a real‑world solution for migrating more than two billion MySQL records to Google BigQuery by streaming data through Kafka, employing partitioned tables, data filtering, and incremental migration to avoid downtime and reduce storage costs.

Big DataBigQueryData Migration

0 likes · 7 min read

Migrating Over 2 Billion MySQL Records to Google BigQuery Using Kafka

New Oriental Technology

Jan 18, 2021 · Information Security

Kafka Security Authentication and Authorization Configuration Guide (SASL/PLAIN and SASL/SCRAM)

This guide explains Kafka's authentication and authorization mechanisms, covering SASL/PLAIN and SASL/SCRAM setups, JAAS file creation, server property configuration, ACL management, and provides complete Java producer and consumer examples for secure communication.

AuthenticationAuthorizationBig Data

0 likes · 19 min read

Kafka Security Authentication and Authorization Configuration Guide (SASL/PLAIN and SASL/SCRAM)

Efficient Ops

Jan 17, 2021 · Big Data

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

This article introduces Kafka’s fundamental role as a messaging system, explains topics, partitions, producers, consumers, replicas, consumer groups, and the controller, and explores its cluster architecture, performance optimizations like sequential writes and zero-copy, providing a comprehensive overview for building scalable data pipelines.

Big DataDistributed SystemsMessage queue

0 likes · 11 min read

Understanding Kafka: Core Concepts, Architecture, and Performance Secrets

DataFunTalk

Jan 16, 2021 · Big Data

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

This article presents a detailed case study of NetEase Cloud Music’s real‑time analytics platform built on Kafka and Flink, covering background, architectural choices, platform‑level design, operational challenges, solutions such as the Magina framework, and a Q&A on reliability and monitoring.

Big DataFlinkKafka

0 likes · 11 min read

Practical Application of Flink + Kafka at NetEase Cloud Music: Architecture, Platform Design, and Lessons Learned

Programmer DD

Jan 16, 2021 · Artificial Intelligence

Can AI Really Predict Employee Work Status? Inside Baidu’s New Patent

The article examines Baidu’s newly filed patent for predicting employee work status, explaining its big‑data‑driven methodology, the company’s claim it’s a talent‑management tool, and the broader debate over workplace surveillance amid the ongoing 996 controversy.

AI predictionBaidu patentBig Data

0 likes · 4 min read

Can AI Really Predict Employee Work Status? Inside Baidu’s New Patent

Big Data Technology & Architecture

Jan 15, 2021 · Big Data

Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD

This article reviews the evolution, architecture, and key components of major Chinese big‑data platforms—including those of Taobao, Didi, Meituan, 360, Kuaishou, and JD—highlighting data ingestion, storage, processing engines, scheduling systems, and service‑oriented designs that underpin their large‑scale data operations.

Big DataData PlatformHadoop

0 likes · 14 min read

Evolution and Architecture of Major Chinese Big Data Platforms: Taobao, Didi, Meituan, 360, Kuaishou, and JD

DataFunTalk

Jan 15, 2021 · Big Data

Optimizing Apache Kylin for Meituan's Sales OLAP: From MapReduce to Spark and Resource Tuning

This article presents a detailed case study of how Meituan's in‑store dining sales team identified severe efficiency issues in their Apache Kylin‑based OLAP system, dissected the construction process, and applied a step‑by‑step optimization roadmap—including engine migration, dimension pruning, resource configuration, and Spark‑based layered building—to boost query performance and achieve near‑perfect SLA.

Apache KylinBig DataMeituan

0 likes · 16 min read

Optimizing Apache Kylin for Meituan's Sales OLAP: From MapReduce to Spark and Resource Tuning

Didi Tech

Jan 14, 2021 · Cloud Computing

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Didi’s Logi‑KafkaManager is a multi‑tenant Kafka cloud platform that consolidates dozens of clusters into a secure, isolated gateway‑driven service offering intuitive web‑based topic management, real‑time metrics visualization, automated diagnostics, quota governance and safe scaling, delivering high internal satisfaction and enterprise commercialization.

Big DataData SecurityKafka

0 likes · 17 min read

Design and Implementation of Didi's Logi‑KafkaManager Multi‑tenant Kafka Cloud Platform

Meituan Technology Team

Jan 14, 2021 · Big Data

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

Meituan built an SSD‑based application‑layer cache for Kafka that bypasses PageCache contention between real‑time and delayed jobs, classifies log segments across SSD and HDD, limits flush rates, and achieves up to 80% latency reduction while guaranteeing stable real‑time consumption.

Big DataKafkaLogSegment

0 likes · 19 min read

Design and Implementation of an SSD‑Based Application‑Layer Cache Architecture for Kafka in Meituan Data Platform

NetEase Smart Enterprise Tech+

Jan 14, 2021 · Big Data

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Yidun’s public-opinion monitoring platform transforms massive raw web data into a unified format by separating dynamic Groovy-script-driven cleaning from static processing, achieving real-time source integration, high throughput, scalability, and high availability while addressing format diversity, team coordination, and performance-flexibility trade-offs.

Big DataData cleaningETL

0 likes · 5 min read

How Yidun Achieves Real-Time, High-Performance Public-Opinion Data Cleaning with Groovy and JVM

Architects Research Society

Jan 13, 2021 · Fundamentals

Master Data Management (MDM): Concepts, Business Value, Technical Challenges, and Architectural Considerations

The article explains master data management (MDM) as a framework for creating a single, reliable source of truth, outlines its growing business relevance, discusses key technical challenges such as data governance and scalability, and explores next‑generation architectures involving graph databases, big data, and machine learning.

Big DataGraph DatabaseMaster Data Management

0 likes · 10 min read

Master Data Management (MDM): Concepts, Business Value, Technical Challenges, and Architectural Considerations

vivo Internet Technology

Jan 13, 2021 · Big Data

Statistical Monitoring Using Normal Distribution and Boxplot: Theory, Implementation, and API Design

The article explains the origin of the normal distribution, the central limit theorem, and how boxplots identify anomalies, then describes a Java‑based API that partitions data into five median‑centered levels using same‑period and year‑over‑year ratios to automatically detect and classify abnormal trends in daily metrics.

Anomaly DetectionBig DataBoxplot

0 likes · 11 min read

Statistical Monitoring Using Normal Distribution and Boxplot: Theory, Implementation, and API Design

dbaplus Community

Jan 11, 2021 · Databases

Why eBay Switched Its Ad Analytics from Druid to ClickHouse – A Deep Dive

eBay’s ad data platform, originally built on a custom SQL engine and later migrated to Druid, was re‑engineered to use ClickHouse, highlighting challenges such as massive data volume, atomic offline replacements, schema design, compression, and operational simplifications, and demonstrating performance and scalability gains for advertisers.

Ad AnalyticsBig DataClickHouse

0 likes · 18 min read

Why eBay Switched Its Ad Analytics from Druid to ClickHouse – A Deep Dive

Big Data Technology & Architecture

Jan 11, 2021 · Big Data

Evolution of a Real‑Time Data Warehouse Architecture and Practical Lessons

This article recounts the author’s journey building a real‑time data warehouse using Flink, Kafka, Redis, and ClickHouse, describing the initial batch‑oriented setup, successive architectural evolutions, challenges with wide tables and dimension data, and the final OLAP‑centric solution with secondary caching.

Big DataClickHouseFlink

0 likes · 9 min read

Evolution of a Real‑Time Data Warehouse Architecture and Practical Lessons

DataFunSummit

Jan 10, 2021 · Big Data

Business Model and Digital Transformation of Internet Consumer Finance: A Case Study of CMB’s Flash Loan

The article analyzes the business architecture, value proposition, channels, revenue model, core resources, and digital transformation of internet consumer finance using China Merchants Bank’s fast‑approval “Flash Loan” as a case study, highlighting the role of big data, AI, and cloud computing in modern retail lending.

Big DataBusiness ModelDigital Transformation

0 likes · 13 min read

Business Model and Digital Transformation of Internet Consumer Finance: A Case Study of CMB’s Flash Loan

Architects Research Society

Jan 9, 2021 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

This article explains the purpose, semantics, and design of Apache Kafka’s transaction API, detailing how it enables exactly‑once processing for stream‑processing applications, the role of transaction coordinators and logs, Java API usage, performance considerations, and best‑practice guidance.

Apache KafkaBig DataJava API

0 likes · 19 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Guidance

Amap Tech

Jan 8, 2021 · Industry Insights

How AI‑Driven Data Mining Revives POI Freshness: A Deep Dive into Expired POI Detection

This article examines the technical evolution of POI expiration detection, covering attribute‑based, behavior‑based, and human‑place relationship mining methods, their machine‑learning models, and how they collectively improve map freshness and user experience at scale.

AIBig DataMap Freshness

0 likes · 17 min read

How AI‑Driven Data Mining Revives POI Freshness: A Deep Dive into Expired POI Detection

21CTO

Jan 7, 2021 · Big Data

How Kuaishou Built a Scalable Big Data Service Platform to Eliminate Redundant Development

This article explains Kuaishou's data service platform, detailing the background challenges of high development barriers and duplicated work, the platform's architecture and key technologies such as configuration‑driven development, multi‑mode APIs, data acceleration, and high‑availability mechanisms, and concludes with future directions.

Big DataData AccelerationData Platform

0 likes · 12 min read

How Kuaishou Built a Scalable Big Data Service Platform to Eliminate Redundant Development

360 Tech Engineering

Jan 7, 2021 · Big Data

Overview of the Qirin Big Data Platform Architecture and Core Modules

The article introduces the Qirin big data platform—a one‑stop solution covering resource management, metadata, data ingestion, task development, interactive querying, and self‑service analysis—detailing its modular architecture, typical processing workflow, and future development plans for enterprise‑wide data services.

Big DataData PlatformMetadata

0 likes · 11 min read

Overview of the Qirin Big Data Platform Architecture and Core Modules

vivo Internet Technology

Jan 6, 2021 · Big Data

How HyperLogLog Estimates Cardinality in Massive Data Sets

This article explains the cardinality‑counting problem behind DAU/MAU and unique visitor metrics, compares naïve solutions like Set, Bitmap and Bloom filter, introduces big‑data algorithms such as Linear Counting, LogLog and HyperLogLog, and shows how Redis implements HyperLogLog with dense and sparse storage optimizations.

Big DataCardinalityHyperLogLog

0 likes · 17 min read

How HyperLogLog Estimates Cardinality in Massive Data Sets

DataFunTalk

Jan 6, 2021 · Big Data

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

This article presents Didi's three‑year experience with Presto, detailing its architecture, low‑latency design, large‑scale deployment, extensive Hive compatibility work, resource isolation, Druid connector integration, usability enhancements, stability engineering, performance tuning, and future directions for the ad‑hoc query engine.

Big DataDistributed SystemsDruid Connector

0 likes · 17 min read

Didi's Presto Engine: Architecture, Optimizations, and Operational Practices

dbaplus Community

Jan 5, 2021 · Big Data

How Ctrip Built a Scalable Unified Log Framework for Payment Data

Facing massive, heterogeneous logs from numerous payment services, Ctrip’s data team designed a unified logging framework that extends log4j2, streams logs via Kafka to HDFS using a customized Camus pipeline, partitions and stores data in ORC for efficient Hive analysis, while addressing format, storage, and performance challenges.

Big DataCamusHadoop

0 likes · 16 min read

How Ctrip Built a Scalable Unified Log Framework for Payment Data

58 Tech

Jan 4, 2021 · Big Data

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

This article describes how a fast‑growing company built a layered real‑time data warehouse on Flink, detailing the evolution from a simple 1.0 pipeline to a 2.0 architecture with ODS, DWD and ADS layers, dimension joins, exactly‑once sinks, HDFS partitioning, monitoring, and future improvements.

Big DataETLFlink

0 likes · 14 min read

Building a Real‑Time Data Warehouse with Flink: Architecture, Implementation and Lessons Learned

Alibaba Cloud Developer

Jan 4, 2021 · Databases

Why Cloud‑Native Distributed Databases Are the Future of Enterprise Data

The article reviews the evolution of database systems driven by cloud computing, big‑data demands and distributed architectures, highlights Alibaba Cloud’s cloud‑native offerings such as PolarDB and AnalyticDB, and discusses trends, security, and best practices for modern enterprise data platforms.

Alibaba CloudBig Datacloud-native

0 likes · 14 min read

Why Cloud‑Native Distributed Databases Are the Future of Enterprise Data

DataFunTalk

Jan 3, 2021 · Artificial Intelligence

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

This article details the evolution of iQIYI's machine learning platform—from its early Javis‑based deep‑learning system to three major versions that introduced visual workflow, distributed scheduling, auto‑tuning, large‑scale training support, model management, and online prediction—while sharing practical lessons and a real anti‑cheat use case.

Big DataModel ManagementPlatform

0 likes · 13 min read

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

Java Backend Technology

Jan 2, 2021 · Information Security

Why Your Personal Data Is Worthless: The Dark Reality of Big Data Privacy Leaks

The article exposes how the promise of big‑data convenience masks rampant privacy violations—from celebrity photo leaks and app data sales to weak legal penalties—illustrating that ordinary users’ personal information has become a cheap commodity with little protection.

Big DataChinaData Protection

0 likes · 6 min read

Why Your Personal Data Is Worthless: The Dark Reality of Big Data Privacy Leaks

DataFunTalk

Dec 31, 2020 · Artificial Intelligence

Introduction to Graph Neural Networks and Their Applications in Recommendation Systems

This article introduces graph neural networks, explains their underlying sampling and aggregation mechanisms, and demonstrates how they are applied in large‑scale recommendation scenarios such as video and content feeds at Tencent, highlighting practical results and lessons learned.

Artificial IntelligenceBig DataGraphSAGE

0 likes · 10 min read

Introduction to Graph Neural Networks and Their Applications in Recommendation Systems

Tencent Cloud Developer

Dec 30, 2020 · Big Data

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

This article analyzes the challenges of traditional monolithic big‑data architectures, explains how Tencent Cloud EMR integrates Alluxio for compute‑storage separation, presents detailed performance benchmarks showing 20‑50% bandwidth reduction and 5‑40% query speedup, and outlines the specific tuning measures applied.

AlluxioBig DataCloud Computing

0 likes · 10 min read

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

JD Tech Talk

Dec 30, 2020 · Databases

Architecture and Application Practice of JD Urban Spatio-Temporal Data Engine (JUST)

The presentation details the design, implementation, and real‑world applications of the JD Urban Spatio‑Temporal Data Engine (JUST), a distributed, scalable database that handles massive, complex spatio‑temporal data with novel storage, indexing, and query techniques, demonstrating high performance and ease of use across smart‑city scenarios.

Big DataDatabaseGIS

0 likes · 26 min read

Architecture and Application Practice of JD Urban Spatio-Temporal Data Engine (JUST)

Alibaba Cloud Developer

Dec 29, 2020 · Fundamentals

What Are the 10 Tech Trends Shaping the Post-Pandemic Era?

Alibaba DAMO Academy outlines ten pivotal technology trends for 2021, ranging from third‑generation semiconductors and quantum computing to AI‑driven drug discovery, cloud‑native IT, data‑intelligent agriculture, and smart city operation centers, highlighting how these innovations will drive post‑pandemic growth.

Artificial IntelligenceBig DataQuantum Computing

0 likes · 9 min read

What Are the 10 Tech Trends Shaping the Post-Pandemic Era?

Youzan Coder

Dec 28, 2020 · Big Data

How Youzan’s BI Platform Turns Massive Data into Interactive Visual Insights

This article explains the design, features, and technical implementation of Youzan’s BI platform, covering its target users, visualization workflow, supported chart types, filtering, permissions, drill‑down, calculated fields, SQL generation logic, and future development directions.

AnalyticsBIBig Data

0 likes · 20 min read

How Youzan’s BI Platform Turns Massive Data into Interactive Visual Insights

Alibaba Terminal Technology

Dec 28, 2020 · Big Data

Unlocking Massive-Scale User Behavior Analysis: From Funnels to Intelligent Links

This talk explores how to conduct user behavior analysis on massive data sets, compares existing analytics tools, and presents Alibaba Dataworks' end‑to‑end solution—including funnel and link visualizations, a big‑data processing architecture, and future intelligent link capabilities—to uncover and resolve user‑experience issues efficiently.

Alibaba CloudBig DataData visualization

0 likes · 16 min read

Unlocking Massive-Scale User Behavior Analysis: From Funnels to Intelligent Links

Big Data Technology & Architecture

Dec 28, 2020 · Big Data

Implementing Historical Slowly Changing Dimension (Chain) Tables with PL/pgSQL

This article explains the concept of historical chain (slowly changing dimension) tables in data warehousing, demonstrates how to create source and target tables, provides a PL/pgSQL stored procedure to handle inserts, updates, and deletions, and shows step‑by‑step testing with sample SQL scripts.

Big DataPL/pgSQLSlowly Changing Dimension

0 likes · 10 min read

Implementing Historical Slowly Changing Dimension (Chain) Tables with PL/pgSQL

dbaplus Community

Dec 27, 2020 · Big Data

How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip

This article details how Ctrip's senior engineering manager leveraged ClickHouse to build a high‑availability, sub‑second response data platform handling nearly 700 billion rows, describing the motivations, architecture, data synchronization processes, performance gains, challenges, and practical recommendations for large‑scale analytics.

Big DataClickHouseData Architecture

0 likes · 28 min read

How ClickHouse Powers a 700 B‑Row Real‑Time Data Platform at Ctrip

Architect

Dec 27, 2020 · Big Data

Optimizing Billion‑Scale Hive Queries: Partitioning, Indexing, Bucketing, Active‑User Segmentation, and Data Structure Refactoring

This article walks through the challenges of querying a 300‑billion‑row Hive table, analyzes why traditional partitioning, indexing, and bucketing fall short, and presents a practical solution that combines active‑user segmentation and a redesigned array‑based data model to cut query time from hours to minutes.

Big DataData PartitioningHive

0 likes · 10 min read

Optimizing Billion‑Scale Hive Queries: Partitioning, Indexing, Bucketing, Active‑User Segmentation, and Data Structure Refactoring

DataFunTalk

Dec 27, 2020 · Information Security

Evolution of 58.com Risk Control Architecture: From Early Stages to Intelligent Auditing

This talk outlines 58.com’s risk control evolution, detailing the platform’s four development stages, the challenges of fraud, fake traffic, and content abuse, and how architecture, algorithms, and operational strategies have been refined to achieve high‑throughput, intelligent auditing.

AIBig DataInformation Security

0 likes · 12 min read

Evolution of 58.com Risk Control Architecture: From Early Stages to Intelligent Auditing

Youzan Coder

Dec 25, 2020 · Big Data

Metadata Governance and Collection in a Data Asset Platform

The platform implements comprehensive metadata governance by extracting, standardizing, and ingesting basic, trend, resource, lineage, and task metadata from offline and real‑time systems via a Kafka‑based SDK, enabling unified storage, monitoring, alerts, and future automation to improve data asset visibility and quality.

Big DataData CollectionMetadata

0 likes · 18 min read

Metadata Governance and Collection in a Data Asset Platform

Big Data Technology & Architecture

Dec 24, 2020 · Big Data

Big Data Interview Questions and Solutions for Massive Data Processing

This article presents ten big‑data interview problems, each describing a scenario such as finding the most frequent IP, top‑K queries, word frequency counting under memory limits, and techniques like hashing, bitmap, trie, heap, and external sorting to solve them efficiently.

AlgorithmsBig DataData Processing

0 likes · 11 min read

Big Data Interview Questions and Solutions for Massive Data Processing

Big Data Technology & Architecture

Dec 24, 2020 · Big Data

Common Techniques for Processing Massive Data Sets

This article summarizes a range of practical methods—including Bloom filters, hashing, bit‑maps, heaps, bucket partitioning, database indexes, inverted indexes, external sorting, trie trees, and MapReduce—that are commonly used to handle, deduplicate, and query extremely large data volumes in big‑data applications.

Big DataHashingHeap

0 likes · 11 min read

Common Techniques for Processing Massive Data Sets

Code Ape Tech Column

Dec 23, 2020 · Fundamentals

Technical Concepts Illustrated Through Relationship Analogies

The article humorously maps various relationship scenarios to core IT concepts such as backup strategies, high‑availability mechanisms, scaling methods, security measures, cloud services, and big‑data techniques, providing an engaging overview of fundamental system design principles.

Big DataCloud ComputingScaling

0 likes · 8 min read

Technical Concepts Illustrated Through Relationship Analogies

dbaplus Community

Dec 22, 2020 · Big Data

How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours

This article details how eBay's ADI Hadoop team tackled a massive 10 PB, 10‑million‑file migration by optimizing DistCp with Fastcopy, load‑balancing, ACL handling, and failure recovery, ultimately completing the transfer within a two‑hour window while preserving cluster stability and performance.

Big DataDistcpHDFS

0 likes · 16 min read

How eBay Migrated 10 PB of HDFS Data Across Namespaces in Just 2 Hours

Architect

Dec 22, 2020 · Big Data

Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example

This article explains data warehouse fundamentals, reviews classic warehouse models such as ER, dimensional, Data Vault and Anchor, then dives deep into dimensional modeling concepts, star and snowflake schemas, and demonstrates a practical e‑commerce scenario with SQL examples and trade‑offs.

Big DataData WarehouseETL

0 likes · 11 min read

Dimensional Modeling in Data Warehousing: Concepts, Theory, and Practical Example

21CTO

Dec 21, 2020 · Big Data

5 Emerging Big Data Trends Shaping Business, Health, and Climate in 2021

This article outlines five key big‑data trends for 2021—including the rise of augmented analytics, the convergence of big data with blockchain, the growing importance of knowledge graphs, data‑driven health innovations, and climate‑focused analytics—highlighting their impact on organizations and future technological landscapes.

Big DataBlockchainKnowledge Graph

0 likes · 8 min read

5 Emerging Big Data Trends Shaping Business, Health, and Climate in 2021

Didi Tech

Dec 21, 2020 · Big Data

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

To overcome HBase’s weak availability and GC‑induced latency spikes, the DiDi team introduced a replication‑based client multi‑read (hedged‑read) mechanism and migrated to the Z Garbage Collector, which together dramatically cut maximum and 99.9th‑percentile latencies while keeping services online during region disruptions.

Big DataHBaseLow latency

0 likes · 12 min read

HBase Availability and Latency Optimizations: Replication‑Based Multi‑Read and ZGC Adoption

Full-Stack Internet Architecture

Dec 20, 2020 · Big Data

Using Flinkx for Data Synchronization in Sharded MySQL Environments

This article explains how to leverage Flinkx and Flink Stream API to create a unified data‑sync task that extracts data from sharded MySQL tables, splits the workload, and pushes it to an MQ cluster, while detailing the underlying InputFormat and Reader architecture.

Big DataFlinkFlinkX

0 likes · 8 min read

Using Flinkx for Data Synchronization in Sharded MySQL Environments

Python Crawling & Data Mining

Dec 19, 2020 · Big Data

Scrape and Analyze Bilibili’s “马保国” Videos with Python – A Complete Guide

This tutorial shows how to use Python to fetch data from Bilibili’s “马保国” channel via its public API, extract video metadata, clean and visualize 14,000 records, and generate insights such as top‑viewed videos and a comment word cloud.

Big DataBilibiliPython

0 likes · 5 min read

Scrape and Analyze Bilibili’s “马保国” Videos with Python – A Complete Guide

Youzan Coder

Dec 18, 2020 · Big Data

Design and Implementation of a Configurable Real-Time Rule Engine for Live‑Streaming Product Audits

The paper presents a configurable real‑time rule engine for live‑streaming product audits that decouples data aggregation from rule execution, uses QLExpress for dynamic conditions, supports Dubbo and HTTP sources, and enables safe gray‑release updates, cutting the rule‑change cycle from weeks to near‑real‑time.

Big DataQLExpressReal-time Data

0 likes · 8 min read

Design and Implementation of a Configurable Real-Time Rule Engine for Live‑Streaming Product Audits

Laiye Technology Team

Dec 18, 2020 · Big Data

Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem

This article provides a detailed, end‑to‑end description of Laiye Technology's BI ecosystem, covering its background, development stages, data acquisition, transmission, transformation, loading, modeling, storage layers, statistical analysis, real‑time metrics, visualization, and future challenges, illustrating how the company builds a scalable, cloud‑native data‑driven platform.

AnalyticsBIBig Data

0 likes · 22 min read

Comprehensive Overview of Laiye Technology's Business Intelligence Ecosystem

Alibaba Cloud Developer

Dec 17, 2020 · Big Data

Why GraphScope is Revolutionizing Large-Scale Graph Computing for AI and Big Data

GraphScope, an open‑source one‑stop platform from Alibaba DAMO Academy, unifies interactive queries, graph analytics, and graph learning on massive, rapidly evolving graphs, offering high‑performance distributed memory management, Gremlin optimization, and seamless Python integration to tackle real‑world AI and big‑data challenges.

Big DataDistributed SystemsPython

0 likes · 21 min read

Why GraphScope is Revolutionizing Large-Scale Graph Computing for AI and Big Data

MaGe Linux Operations

Dec 17, 2020 · Information Security

Mastering Apache Ranger: Secure Hadoop Data Access with Real‑World Examples

This guide explains Apache Ranger’s role as a centralized security framework for Hadoop, detailing its core features, architecture, workflow, policy creation, auditing, field‑level masking, row‑level filtering, and how to automate policy management via its REST API and Java code.

Apache RangerBig DataData access control

0 likes · 13 min read

Mastering Apache Ranger: Secure Hadoop Data Access with Real‑World Examples

Bitu Technology

Dec 16, 2020 · Big Data

Customizing Spark SQL with Macro‑Based Extensions for Column Exclusion and JSON Path Support

This article explains how Tubi customizes Spark SQL using lightweight macro‑based extensions to simplify column exclusion, JSON path queries, and other complex operations without modifying Spark's source code, detailing the two‑stage processing, example macros, and benefits for big‑data workloads.

Big DataCustom SQLMacros

0 likes · 9 min read

Customizing Spark SQL with Macro‑Based Extensions for Column Exclusion and JSON Path Support

macrozheng

Dec 15, 2020 · Big Data

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

Kafka can sustain millions of transactions per second by writing data sequentially to disk, leveraging memory‑mapped files, employing zero‑copy DMA transfers, and batching messages, each technique reducing I/O overhead and CPU involvement, which together enable its high‑throughput performance in big‑data pipelines.

Big DataHigh ThroughputKafka

0 likes · 11 min read

How Kafka Achieves Million‑TPS Through Sequential I/O, MMAP, and Zero‑Copy

Youzan Coder

Dec 15, 2020 · Industry Insights

How Youzan Built a Full‑Scale Data Cost Billing System: From SDK to Multi‑Dimensional Analysis

This article details Youzan's end‑to‑end construction of a unified data‑center cost billing system, covering background goals, multi‑type cost support, SDK‑based information collection, cost quantification for offline, real‑time and platform tools, full‑business coverage, multi‑dimensional analysis models, operational rollout, and future plans.

Big DataData PlatformIndustry Insights

0 likes · 19 min read

How Youzan Built a Full‑Scale Data Cost Billing System: From SDK to Multi‑Dimensional Analysis

Programmer DD

Dec 10, 2020 · Artificial Intelligence

Discover Didi’s 40+ Open‑Source Projects in AI, Big Data & Cloud

DiDi’s open‑source portfolio, now exceeding 40 projects, spans AI runtimes, speech recognition, traffic analytics, middleware, big‑data loaders, monitoring tools, mobile frameworks, and frontend libraries, offering developers ready‑to‑use solutions for edge AI, intelligent transportation, data processing, and system reliability.

Artificial IntelligenceBig DataMobile Development

0 likes · 23 min read

Discover Didi’s 40+ Open‑Source Projects in AI, Big Data & Cloud

Youzan Coder

Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance

0 likes · 5 min read

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

DataFunTalk

Dec 8, 2020 · Artificial Intelligence

Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges

This article presents a comprehensive overview of financial big‑data risk control models at Du Xiaoman, covering traditional scoring cards, AI‑driven time‑series and text processing, graph‑based networks, model interpretability, probability calibration, stability analysis, and the specific challenges introduced by the COVID‑19 pandemic.

Artificial IntelligenceBig DataCredit Scoring

0 likes · 14 min read

Financial Big Data Risk Control Models: Techniques, Applications, and COVID‑19 Challenges

Xianyu Technology

Dec 8, 2020 · Big Data

Supply-Demand Modeling and Category Optimization for the Idle Second-Hand Market

The article describes a supply‑demand modeling framework for the idle second‑hand market that extracts and structures product attributes, builds a decision‑tree‑based index from price, inventory, search‑hotspot and demand‑activation sub‑models, and uses the index to optimize category allocation, boost scarce supply, and drive overall growth.

Big DataProduct Modelingcategory optimization

0 likes · 7 min read

Supply-Demand Modeling and Category Optimization for the Idle Second-Hand Market

Tencent Cloud Developer

Dec 7, 2020 · Big Data

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

Elasticsearch 7.10 adds searchable snapshots, letting users query indices stored directly in remote repositories such as S3 or COS, which halves storage costs, decouples storage from compute, supports manual mounting and ILM cold‑phase policies, and promises future full storage‑compute separation without local caching.

Big DataData TieringElasticsearch

0 likes · 12 min read

Searchable Snapshots in Elasticsearch 7.10: Features, Usage, and Future Outlook

JavaEdge

Dec 5, 2020 · Big Data

How Kafka Chooses Its Partition Leaders: ZAB, Raft, and Controller Election Explained

This article explains the leader election mechanisms used in big‑data systems—ZAB in Zookeeper, Raft’s role‑based election, their drawbacks such as split‑brain and ZooKeeper overload, and how Kafka’s controller‑based design solves these issues with efficient partition leader selection.

Big DataKafkaRaft

0 likes · 7 min read

How Kafka Chooses Its Partition Leaders: ZAB, Raft, and Controller Election Explained

DataFunSummit

Dec 1, 2020 · Artificial Intelligence

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

This article explains how Flink enables end‑to‑end AI workflows through the AI Flow platform, covering the Lambda architecture background, AI task pipeline stages, the reasons for choosing Flink, AI Flow’s graph model, core services, integration with ML pipelines, and real‑world advertising recommendation use cases.

AI FlowAI PipelineBig Data

0 likes · 12 min read

Building an AI Ecosystem with Flink: AI Flow Architecture, Components, and Applications

Huawei Cloud Developer Alliance

Dec 1, 2020 · Databases

Why Time Series Databases Are Crucial for IoT and Cloud Monitoring

This article explains the fundamentals, application scenarios, key requirements, and open‑source options for time series databases, highlighting how GaussDB (For Influx) addresses high‑performance writes, massive timelines, low storage cost, and elastic scaling for IoT and cloud monitoring workloads.

Big DataGaussDBInfluxDB

0 likes · 10 min read

Why Time Series Databases Are Crucial for IoT and Cloud Monitoring

DataFunTalk

Nov 30, 2020 · Fundamentals

DataFunTalk Annual Conference – Full Program and Speaker Details

The DataFunTalk year‑end conference will be held online on December 19‑20, featuring over 90 speakers across multiple forums covering recommendation algorithms, knowledge graphs, AI, big data, security, and product development, with detailed session schedules, speaker bios, and registration information.

AIBig DataKnowledge Graph

0 likes · 76 min read

DataFunTalk Annual Conference – Full Program and Speaker Details

JD Tech Talk

Nov 30, 2020 · Big Data

Scalable Time Series Similarity Search in Big Data: Partitioning, Dimensionality Reduction, and LSH Approaches

This article examines the challenges of performing time‑series similarity queries on massive datasets and presents three scalable solutions—partition‑based indexing, dimensionality‑reduction using MinHash, and a combined approach with Locality Sensitive Hashing—to reduce computation while preserving similarity accuracy.

Big DataLSHMinhash

0 likes · 10 min read

Scalable Time Series Similarity Search in Big Data: Partitioning, Dimensionality Reduction, and LSH Approaches

ITFLY8 Architecture Home

Nov 28, 2020 · Fundamentals

What 19 Core Topics Every Software Architect Must Master

This article outlines a comprehensive knowledge framework for software architects, covering nineteen essential areas such as responsibilities, foundational concepts, internet system challenges, distributed caching, messaging, load balancing, performance testing, operating systems, algorithms, networking, database design, JVM internals, flash-sale systems, microservices, domain‑driven design, security, high‑availability, big data, and blockchain.

Big DataSoftware ArchitectureSystem Design

0 likes · 6 min read

What 19 Core Topics Every Software Architect Must Master

dbaplus Community

Nov 28, 2020 · Operations

How a Chinese City Bank Integrated DevOps, AI, and Big Data to Transform Operations

This case study details how a city‑bank leveraged DevOps and ITIL integration, AI‑driven monitoring, and Spark‑based big‑data analytics to build a unified development‑testing‑operations platform, improve service availability, shorten deployment cycles, and achieve near‑99.99% system uptime across its core banking services.

AIBig DataDevOps

0 likes · 17 min read

How a Chinese City Bank Integrated DevOps, AI, and Big Data to Transform Operations