Tag

distributed computing

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Jun 3, 2025 · Artificial Intelligence

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

AIAlibaba CloudKubeRay
0 likes · 17 min read
Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies
AntData
AntData
Dec 11, 2024 · Big Data

Flex: A Stream‑Batch Integrated Vectorized Engine for Flink

This article introduces Flex, a Flink‑compatible stream‑batch vectorized engine built on Velox and Gluten, explains the SIMD‑based execution model, details native operator optimizations, fallback mechanisms, correctness and usability improvements, and presents performance results and future development plans.

FlinkPerformanceSIMD
0 likes · 17 min read
Flex: A Stream‑Batch Integrated Vectorized Engine for Flink
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 29, 2024 · Big Data

How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray

The article details ByteDance's use of Ray and RayData to construct scalable audio and video data processing pipelines for multimodal AI models, addressing challenges of massive data volume, resource constraints, and fault tolerance through pipeline design, RayCore enhancements, and custom scheduling optimizations.

AIBig DataByteDance
0 likes · 16 min read
How ByteDance Builds Large-Scale Data Processing Pipelines for Multimodal Models with Ray
Baidu Geek Talk
Baidu Geek Talk
Nov 18, 2024 · Big Data

Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach

By replacing exponential row expansion with a data‑tagging strategy that encodes dimension combinations and aggregates at the user level, the authors cut Baidu Feed’s multi‑dimensional user‑count computation time from 49 to 14 minutes and shuffle size from 16 TB to 800 GB, enabling scalable analysis across dozens of dimensions for billions of daily users.

Big Data OptimizationHive SQLPerformance Tuning
0 likes · 12 min read
Optimizing Multi-Dimensional User Count Statistics in Big Data Computing: A Data Tagging Approach
IT Services Circle
IT Services Circle
Oct 23, 2024 · Fundamentals

World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1

A former Nvidia engineer, working through the GIMPS distributed project and leveraging thousands of GPUs across dozens of data centers, confirmed that 2^136279841−1—a 41,024,320‑digit Mersenne prime—is the largest known prime ever found, surpassing the previous record by over 1.6 million digits.

GIMPSGPU computingMersenne prime
0 likes · 7 min read
World’s Largest Known Prime Discovered Using GPUs: 2^136279841−1
360 Tech Engineering
360 Tech Engineering
Oct 15, 2024 · Artificial Intelligence

Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration

The article details the design and deployment of 360's AI Compute Center, covering GPU server selection, high‑performance networking, Kubernetes‑based cluster management, advanced scheduling, training and inference acceleration techniques, and a comprehensive AI development platform with visualization and fault‑tolerance features.

AI infrastructureGPU ClusterInference Acceleration
0 likes · 21 min read
Implementation and Optimization of 360 AI Compute Center: Infrastructure, Network, Kubernetes, and Training/Inference Acceleration
DataFunSummit
DataFunSummit
Aug 1, 2024 · Big Data

Deep Dive into Apache Spark SQL: Concepts, Core Components, and API

This article provides a comprehensive overview of Apache Spark SQL, covering its fundamental concepts such as TreeNode, AST, and QueryPlan, the distinction between logical and physical plans, the rule‑execution framework, core components like SparkSqlParser and Analyzer, as well as the Spark Session, Dataset/DataFrame, and various writer APIs, supplemented by a detailed Q&A session.

Apache SparkBig DataData Processing
0 likes · 19 min read
Deep Dive into Apache Spark SQL: Concepts, Core Components, and API
DataFunSummit
DataFunSummit
Jul 11, 2024 · Big Data

Design Principles of the Spark Core – DataFun Introduction to Apache Spark (Part 1)

This article provides a comprehensive overview of Apache Spark, covering its origins, key characteristics, core concepts such as RDD, DAG, partitioning and dependencies, the internal architecture including SparkConf, SparkContext, SparkEnv, storage and scheduling systems, as well as deployment models and the company behind the product.

Apache SparkBig DataData Processing
0 likes · 16 min read
Design Principles of the Spark Core – DataFun Introduction to Apache Spark (Part 1)
Tencent Cloud Developer
Tencent Cloud Developer
May 29, 2024 · Artificial Intelligence

Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games

Tencent’s Game Social Algorithm Team presents a Spark‑based distributed network embedding framework that recursively partitions hundred‑billion‑edge game graphs into manageable subgraphs, runs node2vec locally, and fuses results, enabling efficient link prediction and node classification across multiple games within hours.

Big DataSparkdistributed computing
0 likes · 7 min read
Distributed Network Embedding Algorithm for Billion‑Scale Graph Data in Tencent Games
JD Tech
JD Tech
Mar 18, 2024 · Artificial Intelligence

High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization

The article describes how JD’s advertising team tackled the high‑concurrency, low‑latency challenges of online recommendation inference by designing a distributed graph heterogeneous computing framework, optimizing GPU kernel launches with TensorBatch, deep‑learning compiler techniques, and a multi‑stream GPU architecture, achieving significant throughput and latency improvements.

AI inferenceGPU optimizationdeep learning compiler
0 likes · 14 min read
High‑Performance Inference Architecture: Distributed Graph Heterogeneous Computing Framework and GPU Multi‑Stream Optimization
DataFunTalk
DataFunTalk
Jan 29, 2024 · Artificial Intelligence

PAI‑ChatLearn: A Flexible Large‑Scale RLHF Training Framework for Massive Models

The article introduces PAI‑ChatLearn, a flexible and high‑performance framework developed by Alibaba Cloud's PAI team that supports full‑pipeline RLHF training for large models, explains the evolution of parallel training strategies, details the framework’s architecture and configuration, and showcases performance results and practical usage examples.

AI FrameworkLarge Model TrainingPAI-ChatLearn
0 likes · 17 min read
PAI‑ChatLearn: A Flexible Large‑Scale RLHF Training Framework for Massive Models
JD Retail Technology
JD Retail Technology
Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

GPU AccelerationOnline InferenceRecommendation systems
0 likes · 13 min read
Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration
Architects' Tech Alliance
Architects' Tech Alliance
Dec 23, 2023 · Artificial Intelligence

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The article outlines the accelerating demand for high‑performance computing driven by AI, AR/VR, biotech and other workloads, examines the limits of Moore's law, and presents emerging solutions such as advanced chip architectures, chiplet integration, near‑memory/in‑memory computing, and distributed xPU‑based systems for scalable, efficient compute.

AI accelerationChipletchip architecture
0 likes · 11 min read
Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems
HomeTech
HomeTech
Nov 24, 2023 · Backend Development

Implementing Task Scheduling and Distributed Processing with Celery and Redis in Python

This article explains how to use Celery together with Redis to manage and execute periodic and asynchronous tasks in Python, covering basic concepts, architecture, configuration steps, single‑worker and multi‑worker setups, distributed processing strategies, and practical considerations for reliable task execution.

CeleryPythonRedis
0 likes · 8 min read
Implementing Task Scheduling and Distributed Processing with Celery and Redis in Python
政采云技术
政采云技术
Sep 19, 2023 · Big Data

Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication

This article explains core concepts and practical solutions for handling massive datasets that cannot fit into memory, covering batch processing, distributed sorting, bitmap indexing, hash‑based lookups, top‑K extraction, and deduplication techniques with code examples and multi‑machine strategies.

Big Databitmap indexingdeduplication
0 likes · 18 min read
Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication
Baidu Geek Talk
Baidu Geek Talk
Aug 28, 2023 · Cloud Native

Baidu Search Vertical Offline Computing System Architecture Evolution

Baidu's search vertical offline computing system evolved through four stages—from a fragmented pre‑2018 processing setup to a unified business framework, then serverless functions, and finally a data‑intelligent architecture with multi‑layer abstraction, graph and multi‑language engines, achieving 5‑10× efficiency gains and dramatically reducing failures.

Baidu SearchCloud NativeDAG Processing
0 likes · 23 min read
Baidu Search Vertical Offline Computing System Architecture Evolution
DataFunSummit
DataFunSummit
Aug 25, 2023 · Big Data

Big Data Meets Cloud Native: Tencent's Cloud‑Native Big Data Architecture, Challenges, and Practices

This article explores how Tencent integrates big data with cloud‑native technologies, detailing the evolution, opportunities, challenges, the peak‑range (FengLuan) architecture, engine and scheduling layers, mixed‑workload strategies, runtime optimizations, and future directions for large‑scale data platforms.

Big DataCloud NativeKubernetes
0 likes · 17 min read
Big Data Meets Cloud Native: Tencent's Cloud‑Native Big Data Architecture, Challenges, and Practices
DataFunTalk
DataFunTalk
Aug 22, 2023 · Artificial Intelligence

Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment

This article explains how the Ray distributed computing engine simplifies the design, deployment, and operation of complex cloud‑native distributed systems—illustrated through an AutoML service example—by detailing system complexity, Ray’s core concepts, resource customization, runtime environments, monitoring, and ecosystem integrations.

AIAutoMLCloud Native
0 likes · 26 min read
Building Complex Distributed Systems with Ray: An AutoML Case Study and Cloud‑Native Deployment
Architects' Tech Alliance
Architects' Tech Alliance
Aug 2, 2023 · Fundamentals

Emerging Trends in Digital Infrastructure: Beyond Moore's Law, Chiplet, and Compute‑in‑Memory

The article surveys recent digital‑infrastructure trends, explaining why traditional Moore's Law scaling is slowing, describing More‑Moore and Beyond‑CMOS approaches, and detailing new chip architectures such as DSA, 3D stacking, Chiplet, compute‑in‑memory, and distributed xPU‑centric systems that together address the growing compute demands of AI, AR/VR, and bio‑pharma workloads.

AI accelerationCompute-in-MemoryMoore's law
0 likes · 11 min read
Emerging Trends in Digital Infrastructure: Beyond Moore's Law, Chiplet, and Compute‑in‑Memory
DataFunSummit
DataFunSummit
Aug 2, 2023 · Big Data

Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies

This article discusses the challenges of loop detection in financial risk control, presents distributed graph computing optimization techniques—including pruning, multi‑graph handling, and memory‑efficient algorithms—shows experimental results, and shares real‑world ArcNeural engine case studies and future directions.

ArcNeuralBig Datadistributed computing
0 likes · 13 min read
Loop Detection in Risk Control: Challenges, Distributed Graph Computing Optimizations, and ArcNeural Engine Case Studies