Tagged articles

100 articles

Page 1 of 1

May 23, 2026 · Artificial Intelligence

Designing Next‑Gen Recommendation and Search Systems with Agentic Architectures

The article analyzes cutting‑edge AI search and recommendation technologies—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB—detailing their architectural evolution, multi‑modal retrieval strategies, GPU acceleration gains, and measured performance improvements.

AI SearchAgentic RAGGPU Acceleration

0 likes · 5 min read

Designing Next‑Gen Recommendation and Search Systems with Agentic Architectures

DataFunSummit

May 17, 2026 · Artificial Intelligence

How Agentic Architecture Powers Next‑Generation Recommendation and Search Systems

The article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommender, Baidu's generative ranking model GRAB, and Elasticsearch‑based vector RAG—detailing their challenges, architectural evolutions, performance gains, and real‑world deployment results.

AI SearchAgentic RAGElasticsearch

0 likes · 6 min read

How Agentic Architecture Powers Next‑Generation Recommendation and Search Systems

Machine Heart

May 12, 2026 · Industry Insights

Guanglun Intelligence, Google, and NVIDIA Co‑Define Physical AI Simulation Standards

The article argues that as AI shifts from a compute‑driven to a data‑driven era, large‑scale physical simulation becomes the CUDA‑like foundation for physical AI, and details how global leaders—including NVIDIA, Google DeepMind, Disney Research, and China’s Guanglun Intelligence—are racing to set unified simulation standards through the open‑source Newton engine.

GPU AccelerationGuanglun IntelligenceIndustry standards

0 likes · 16 min read

Guanglun Intelligence, Google, and NVIDIA Co‑Define Physical AI Simulation Standards

DataFunTalk

May 5, 2026 · Artificial Intelligence

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

This article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB—detailing their architectural evolution, multimodal retrieval strategies, GPU acceleration, and measured performance gains.

AI SearchAgentic RAGGPU Acceleration

0 likes · 6 min read

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

DataFunSummit

May 4, 2026 · Artificial Intelligence

Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques

Alibaba Cloud AI Search tackles high‑concurrency, multimodal, and multi‑hop queries by evolving its Agentic RAG architecture from a single agent to a coordinated multi‑agent system that integrates planning, retrieval, and generation, leverages hybrid vector‑text‑DB‑graph recall, GPU‑accelerated indexing, quantization, NL2SQL, and multimodal search, with performance data and real‑world case studies.

AI SearchAgentic RAGAlibaba Cloud

0 likes · 6 min read

Inside Alibaba Cloud AI Search: Agentic RAG Architecture and Multi‑Agent Techniques

Machine Heart

May 4, 2026 · Artificial Intelligence

Mega MoE vs SonicMoE: Which Will Lead the Next AI Speed Race?

SonicMoE, a new ultra‑fast Mixture‑of‑Experts model from Tri Dao and Ion Stoica’s team, achieves peak throughput on Nvidia Blackwell GPUs, outperforms DeepSeek’s DeepGEMM, and introduces algorithmic redesigns that decouple activation memory from expert granularity while fusing I/O‑aware kernels for up to double the speed of existing MoE frameworks.

AI PerformanceBlackwellGPU Acceleration

0 likes · 12 min read

Mega MoE vs SonicMoE: Which Will Lead the Next AI Speed Race?

AI Architecture Path

May 2, 2026 · Artificial Intelligence

Warp Open‑Sources Its AI Terminal: GPU‑Accelerated UI, Agentic Development, 50K+ Stars

Warp, the AI‑native terminal built in Rust with a custom GPU‑accelerated UI framework, has been open‑sourced on GitHub, quickly surpassing 50 000 stars; the article details its development history, open‑source motivations, architecture, core features, installation options, and a comparative analysis with iTerm2 and Ghostty.

AI terminalAgentic Development EnvironmentGPU Acceleration

0 likes · 12 min read

Warp Open‑Sources Its AI Terminal: GPU‑Accelerated UI, Agentic Development, 50K+ Stars

AI Engineering

Apr 28, 2026 · Artificial Intelligence

Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2

The open‑source Insanely Fast Whisper CLI tool leverages Flash Attention 2 to accelerate OpenAI Whisper transcription by 19 times—cutting a 2.5‑hour audio from 31 minutes to just 98 seconds on an Nvidia A100—while preserving accuracy and adding multilingual, speaker‑diarization, and precise timestamp features.

CLI toolFlash Attention 2GPU Acceleration

0 likes · 4 min read

Insanely Fast Whisper speeds audio transcription 19× with Flash Attention 2

Bighead's Algorithm Notes

Apr 13, 2026 · Artificial Intelligence

FactorMiner: Tsinghua’s Self‑Evolving Agent with Skill and Experience Memory for Alpha Factor Mining

FactorMiner is a lightweight, flexible self‑evolving agent framework that combines a modular skill architecture with structured experience memory, using a Ralph loop to guide search, reduce redundancy, and build a diverse, high‑quality alpha factor library that outperforms baselines across A‑share and cryptocurrency markets while leveraging GPU‑accelerated evaluation.

Alpha Factor MiningExperience MemoryFactorMiner

0 likes · 13 min read

FactorMiner: Tsinghua’s Self‑Evolving Agent with Skill and Experience Memory for Alpha Factor Mining

IT Services Circle

Apr 9, 2026 · Operations

Exploring Warp: A GPU‑Accelerated Rust Terminal with Built‑in AI

This article introduces Warp, a Rust‑based modern terminal that leverages GPU acceleration, supports multiple shells, offers AI assistance via models like DeepSeek, and provides step‑by‑step installation, configuration, and usage guidance across Windows, macOS, and Linux.

GPU AccelerationRustTerminal

0 likes · 6 min read

Exploring Warp: A GPU‑Accelerated Rust Terminal with Built‑in AI

Old Zhang's AI Learning

Mar 27, 2026 · Artificial Intelligence

vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2

The March 2026 vLLM release bundle introduces four substantial upgrades—Semantic Router v0.2 Athena, NVIDIA Nemotron 3 Super, the parallel speculative decoding P‑EAGLE, and a completely re‑architected Model Runner V2—each backed by concrete benchmarks, architectural diagrams, and code examples that demonstrate how the engine evolves from a pure inference engine to a full‑stack AI serving platform.

GPU AccelerationModel Runner V2Nemotron-3-Super

0 likes · 17 min read

vLLM’s Four Major 2026 Updates: Semantic Router Athena, Nemotron 3 Super, P‑EAGLE, and Model Runner V2

AI Engineering

Mar 11, 2026 · Artificial Intelligence

Run Claude Code Locally with Qwen 3.5 to Skip Anthropic API Costs

This guide shows how to replace Anthropic's API by running a local Qwen 3.5 model with llama.cpp, configuring Claude Code via ANTHROPIC_BASE_URL, and includes hardware checks, build steps, model download, server launch, speed‑fix tips, and usage instructions for secure, cost‑free development.

Anthropic APIClaude CodeGPU Acceleration

0 likes · 8 min read

Run Claude Code Locally with Qwen 3.5 to Skip Anthropic API Costs

HyperAI Super Neural

Feb 27, 2026 · Artificial Intelligence

SeaCast Delivers 15-Day Ocean Forecasts in 20 Seconds Using a Graph Neural Network

SeaCast, a graph‑neural‑network model developed by a European research team, generates 15‑day, 1/24° regional ocean forecasts in just 20 seconds on a single GPU, outperforming the traditional MedFS CPU‑based system in both speed and accuracy across multiple ocean variables.

GPU AccelerationGraph Neural NetworkMachine Learning

0 likes · 15 min read

SeaCast Delivers 15-Day Ocean Forecasts in 20 Seconds Using a Graph Neural Network

Network Intelligence Research Center (NIRC)

Jan 25, 2026 · Artificial Intelligence

RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism

RecFlow, a new inference engine from Beijing University of Posts and Telecommunications and Meituan, tackles the resource mismatch of DLRM models by coordinating embedding and DNN operators at the intra‑SM level and introducing interference‑aware adaptive scheduling and incremental batching, achieving up to 9.34× higher throughput on RTX 3090.

DLRMFine-grained parallelismGPU Acceleration

0 likes · 7 min read

RecFlow Breaks DLRM Inference Bottleneck with Fine-Grained GPU Parallelism

Design Hub

Jan 17, 2026 · Artificial Intelligence

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

The article reviews Black Forest Labs' FLUX.2 Klein model, highlighting its sub‑second 1024×1024 image generation, low‑VRAM requirements, four‑step inference speedups, and competitive quality versus SD3 and Midjourney V6, while also sharing Midjourney‑style prompt examples for creative design.

AI image generationFLUX.2GPU Acceleration

0 likes · 8 min read

FLUX.2 Klein Generates Images in Under a Second and Unlocks Midjourney‑Style Prompts

Data Party THU

Dec 20, 2025 · Artificial Intelligence

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

This guide walks you through 20 fundamental PyTorch concepts—including tensor creation, operations, autograd, model building, data loading, GPU acceleration, and best‑practice tricks—providing clear code snippets and step‑by‑step explanations so you can quickly prototype, train, and deploy neural networks.

GPU AccelerationPyTorchTensor Operations

0 likes · 16 min read

Master 20 Essential PyTorch Concepts: From Tensors to Model Deployment

DataFunSummit

Dec 19, 2025 · Artificial Intelligence

How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads

This article surveys cutting‑edge AI techniques—including Alibaba Cloud's Agentic RAG for multimodal search, Huawei Noah's LLM‑enhanced recommendation evolution, and Baidu's generative ranking (GRAB) for ads—detailing their architectures, optimization tricks, performance gains, and real‑world deployment results.

AI SearchAgentic RAGGPU Acceleration

0 likes · 9 min read

How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads

Big Data Technology & Architecture

Dec 10, 2025 · Big Data

What’s New in Apache Spark 4.0? Deep Dive into 2025 Core Updates

The 2025 release of Apache Spark 4.0 brings a comprehensive overhaul—including default ANSI SQL mode, full SQL scripting support, a new Real‑Time streaming mode, adaptive query execution, dynamic memory management, and GPU‑accelerated MLlib—significantly boosting performance, reliability, and developer productivity across big‑data workloads.

Apache SparkBig DataGPU Acceleration

0 likes · 9 min read

What’s New in Apache Spark 4.0? Deep Dive into 2025 Core Updates

Alibaba Cloud Big Data AI Platform

Nov 6, 2025 · Artificial Intelligence

How GPU‑Accelerated NN‑Descent Boosts Vector Search Speed by Up to 13×

This article explains how unstructured multimedia data is transformed into vectors for similarity search, introduces GPU parallelism and the NN‑Descent algorithm to replace traditional HNSW indexing in OpenSearch, and presents benchmark results showing up to a thirteen‑fold speed improvement while maintaining comparable recall.

GPU AccelerationNN-DescentOpenSearch

0 likes · 12 min read

How GPU‑Accelerated NN‑Descent Boosts Vector Search Speed by Up to 13×

21CTO

Oct 20, 2025 · Artificial Intelligence

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

World Labs unveiled RTFM, a real‑time frame model that runs on a single H100 GPU, generating persistent, interactive 3D worlds from 2D images without explicit 3D representations, highlighting the growing computational demands of generative world models and their potential to reshape AI-driven spatial intelligence.

3D generationDiffusion TransformerGPU Acceleration

0 likes · 9 min read

Real-Time Frame Model (RTFM): Single‑GPU World Model Redefines 3D Generation

Efficient Ops

Oct 14, 2025 · Artificial Intelligence

Unlock High‑Throughput LLM Inference with vLLM: Install, Run, and Optimize

This guide explains what vLLM is, how its PagedAttention architecture boosts LLM throughput, provides step‑by‑step installation commands, showcases core examples for text generation, chat, embedding and classification, and details advanced performance features such as quantization, LoRA support, and distributed parallelism.

GPU AccelerationLLM inferencePython

0 likes · 8 min read

Unlock High‑Throughput LLM Inference with vLLM: Install, Run, and Optimize

Raymond Ops

Sep 17, 2025 · Cloud Native

Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit

This guide walks through checking the system environment, installing the NVIDIA Docker plugin, configuring Docker to use the NVIDIA runtime, verifying GPU access, deploying the NVIDIA device plugin in a Kubernetes cluster, creating a GPU‑enabled pod, and testing GPU‑accelerated video processing with FFmpeg.

Container ToolkitDockerGPU

0 likes · 12 min read

Enable GPU Acceleration in Docker and Kubernetes with NVIDIA Toolkit

Data STUDIO

Sep 8, 2025 · Artificial Intelligence

CuPy vs NumPy: Achieving Over 10× Speedup with GPU Acceleration

The article explains how replacing NumPy with the GPU‑compatible CuPy library can dramatically accelerate array computations, walks through installation prerequisites, demonstrates benchmark scripts showing up to ten‑fold speed improvements, discusses data type effects, custom kernels, and hybrid CPU‑GPU workflows for large‑scale data processing.

CUDACuPyGPU Acceleration

0 likes · 21 min read

CuPy vs NumPy: Achieving Over 10× Speedup with GPU Acceleration

JavaEdge

Jun 30, 2025 · Artificial Intelligence

How GPULlama3.java Brings GPU‑Accelerated Llama 3 to Pure Java

GPULlama3.java, released by Manchester University's Beehive Lab, is the first native Java implementation of Llama 3 that leverages TornadoVM to automatically accelerate inference on GPUs without writing CUDA or native code, supporting NVIDIA, Intel and Apple Silicon back‑ends and modern Java 21 features.

AIGPU AccelerationJava

0 likes · 7 min read

How GPULlama3.java Brings GPU‑Accelerated Llama 3 to Pure Java

DataFunSummit

Jun 12, 2025 · Artificial Intelligence

How Alibaba Cloud’s AI Search Evolves with Agentic RAG and Multi‑Model Innovations

This article details Alibaba Cloud AI Search’s development journey, covering its dual product lines, the evolution of Agentic RAG technology, multi‑agent architectures, vector retrieval breakthroughs, GPU‑accelerated indexing, NL2SQL capabilities, deployment models, and future directions for AI‑driven search solutions.

AI SearchGPU AccelerationLarge Models

0 likes · 33 min read

How Alibaba Cloud’s AI Search Evolves with Agentic RAG and Multi‑Model Innovations

Architects' Tech Alliance

May 8, 2025 · Industry Insights

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

This article analyses the emergence of AI‑focused storage, detailing its ultra‑high bandwidth, concurrency, scale and low‑latency characteristics, the architectural shift from layered to fused designs, the specific performance and data‑management demands of training and inference, and a three‑phase roadmap for future storage innovations.

AI storageGPU AccelerationHigh Performance Computing

0 likes · 12 min read

How AI Storage Is Redefining Data‑Compute Synergy: Trends, Tech, and Roadmap

MaGe Linux Operations

Mar 16, 2025 · Cloud Native

How to Install NVIDIA Docker Plugin and Enable GPU Access in Kubernetes

This guide walks through checking the system environment, installing the NVIDIA Docker plugin, configuring Docker to use the NVIDIA runtime, verifying GPU access with Docker, deploying the NVIDIA device plugin on a Kubernetes cluster, and running GPU‑accelerated workloads in pods.

Container ToolkitDockerGPU

0 likes · 14 min read

How to Install NVIDIA Docker Plugin and Enable GPU Access in Kubernetes

DeWu Technology

Feb 17, 2025 · Artificial Intelligence

Optimizing Large Model Inference: High‑Performance Frameworks and Techniques

The article reviews high‑performance inference strategies for large language models such as Deepseek‑R1, detailing CPU‑GPU process separation, Paged and Radix Attention, Chunked Prefill, output‑length reduction, tensor‑parallel multi‑GPU scaling, and speculative decoding, each shown to markedly boost throughput and cut latency in real deployments.

AIGPU AccelerationSpeculative Decoding

0 likes · 22 min read

Optimizing Large Model Inference: High‑Performance Frameworks and Techniques

JD Cloud Developers

Feb 17, 2025 · Artificial Intelligence

Why DeepSeek Is Outpacing ChatGPT: Cost, Performance, and Local Deployment

The article compares DeepSeek with ChatGPT, highlighting DeepSeek’s superior performance in math and reasoning, lower cost, hardware requirements, and how to locally deploy the R1 model using Ollama and a GUI, illustrating its potential to reshape the AI landscape.

AI deploymentChatGPT comparisonDeepSeek

0 likes · 10 min read

Why DeepSeek Is Outpacing ChatGPT: Cost, Performance, and Local Deployment

Architect's Alchemy Furnace

Feb 5, 2025 · Artificial Intelligence

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

This article provides a comprehensive guide to locally deploying DeepSeek R1 models using Ollama on Windows and Linux, covering model variants, hardware requirements, installation steps, command‑line operations, visual client options, usage examples, performance tuning, and best‑practice recommendations for developers and enterprises.

AI modelDeepSeekDocker

0 likes · 10 min read

Deploy DeepSeek R1 Locally with Ollama: Step‑by‑Step Guide for Windows & Linux

Architect

Feb 2, 2025 · Artificial Intelligence

Deploying DeepSeek‑R1 Locally with Ollama and Accessing It via Spring Boot and Spring AI

This guide explains how to install Ollama, download and run the open‑source DeepSeek‑R1 language model locally, configure GPU acceleration, and integrate the model into a Spring Boot application using Spring AI to provide an API service for AI inference.

AI model deploymentDeepSeek-R1GPU Acceleration

0 likes · 12 min read

Deploying DeepSeek‑R1 Locally with Ollama and Accessing It via Spring Boot and Spring AI

DevOps

Jan 6, 2025 · Artificial Intelligence

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

This article reviews ten mainstream LLM deployment solutions—including WebLLM, LM Studio, Ollama, vLLM, LightLLM, OpenLLM, HuggingFace TGI, GPT4ALL, llama.cpp, and Triton Inference Server—detailing their technical characteristics, strengths, drawbacks, and example deployment workflows for both personal and enterprise environments.

AI inferenceGPU AccelerationLLM

0 likes · 16 min read

Ten Popular Large Language Model Deployment Engines and Tools: Features, Advantages, and Limitations

Alibaba Cloud Big Data AI Platform

Dec 18, 2024 · Artificial Intelligence

Can GPU Graph Algorithms Boost Vector Search Performance by 10×?

This article explains how OpenSearch's GPU‑accelerated vector search leverages parallel graph algorithms to achieve up to tenfold speed improvements over CPU solutions, detailing ANNS techniques, performance benchmarks, and practical GPU specifications for high‑QPS AI applications.

GPU AccelerationOpenSearchapproximate nearest neighbor

0 likes · 11 min read

Can GPU Graph Algorithms Boost Vector Search Performance by 10×?

360 Zhihui Cloud Developer

Dec 17, 2024 · Artificial Intelligence

Translate Foreign Videos into Chinese with Whisper, Ollama & FFmpeg

This guide shows how to automatically extract subtitles from English videos using OpenAI's Whisper, translate them into Chinese with a locally‑deployed Ollama large language model, and finally merge the bilingual subtitles back into the video using FFmpeg, all with GPU acceleration.

AIGPU AccelerationOllama

0 likes · 11 min read

Translate Foreign Videos into Chinese with Whisper, Ollama & FFmpeg

AntTech

Nov 29, 2024 · Artificial Intelligence

AI Inference with Trusted Execution Environment: HyperGPU and DistMSM Accelerated Zero‑Knowledge Proofs Win 2024 Financial Cipher Cup Innovation Award

The award‑winning solution combines a GPU‑accelerated TEE framework (HyperGPU) and a multi‑GPU zkSNARK acceleration scheme (DistMSM) to provide fast, privacy‑preserving AI inference proofs, earning the third‑place Innovation Team prize at the 2024 Financial Cipher Cup competition.

AIDistMSMFinancial Cipher

0 likes · 6 min read

AI Inference with Trusted Execution Environment: HyperGPU and DistMSM Accelerated Zero‑Knowledge Proofs Win 2024 Financial Cipher Cup Innovation Award

Baidu Geek Talk

Oct 22, 2024 · Big Data

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Baidu’s DATAPILOT platform combines natural‑language interaction with GPU‑accelerated Spark‑RAPIDS to turn complex, multi‑table SQL queries into seconds‑fast results, boosting ad‑revenue analysis efficiency by up to five‑fold while reducing infrastructure costs.

Apache SparkBaiduBig Data

0 likes · 10 min read

How Baidu’s DATAPILOT Uses NVIDIA RAPIDS to Supercharge SQL Analytics

Baidu Geek Talk

Oct 9, 2024 · Artificial Intelligence

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

This article analyzes Baidu's Baige 4.0 AI infrastructure, detailing its four‑layer architecture, XMAN 5.0 hardware, HPN network, BCCL communication library, and AIAK inference upgrades, and explains how these innovations address large‑model training and inference challenges while boosting performance, utilization, and cost efficiency.

AI infrastructureCluster ManagementGPU Acceleration

0 likes · 16 min read

How Baidu’s Baige 4.0 Architecture Redefines AI Compute Efficiency

DataFunSummit

Jun 17, 2024 · Artificial Intelligence

Strategies for Reducing Cost and Improving Efficiency in Recommendation Systems with Alibaba Cloud PAI‑Rec

This article discusses how Alibaba Cloud’s AI platform PAI‑Rec reduces recommendation system costs and boosts efficiency by optimizing training resources, leveraging FeatureStore, EasyRec and TorchEasyRec frameworks, detailing workflow stages, feature consistency, GPU acceleration, componentized model configuration, and practical deployment timelines.

AI PlatformFeature StoreGPU Acceleration

0 likes · 14 min read

Strategies for Reducing Cost and Improving Efficiency in Recommendation Systems with Alibaba Cloud PAI‑Rec

DataFunTalk

Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva

0 likes · 13 min read

Deploying Speech AI Services Quickly with NVIDIA Riva

Baidu Intelligent Cloud Tech Hub

Apr 24, 2024 · Artificial Intelligence

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

With AI training demands outgrowing single‑chip GPU clusters, this article explains how to construct and speed up heterogeneous AI clusters—combining GPUs, Kunlun, and Ascend chips—by addressing interconnect, distributed parallel strategies, and specialized acceleration suites to achieve high MFU and efficient large‑model training.

AI clusteringGPU AccelerationMFU

0 likes · 15 min read

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

Didi Tech

Apr 16, 2024 · Artificial Intelligence

Optimizing DSP Deep Model Latency by Externalizing Feature Processing with EzFeaFly

By externalizing feature processing with the EzFeaFly tool and feeding a dense index/value tensor directly to the GPU, the DSP platform decouples feature transformation from model inference, cutting instance usage by ~40%, reducing inference latency 70‑80%, and achieving over 60% end‑to‑end latency improvement while lowering costs.

DSPGPU AccelerationPython

0 likes · 11 min read

Optimizing DSP Deep Model Latency by Externalizing Feature Processing with EzFeaFly

Bilibili Tech

Mar 5, 2024 · Game Development

Bilibili Color Space Conversion Engine for Video Processing

Bilibili's color space conversion engine processes user‑uploaded videos with varied color parameters into a unified format, using layered filters, precomputed optimizations, CPU and CUDA implementations, handling transformations, quantization, chroma subsampling, matrix conversion, transfer functions, gamut and tone mapping, HDR dynamic metadata, and achieving high performance for millions of users.

GPU AccelerationHDRcolor space

0 likes · 19 min read

Bilibili Color Space Conversion Engine for Video Processing

DataFunTalk

Jan 31, 2024 · Artificial Intelligence

Introduction to NVIDIA TensorRT-LLM Inference Framework

TensorRT-LLM is NVIDIA's scalable inference framework for large language models that combines TensorRT compilation, fast kernels, multi‑GPU parallelism, low‑precision quantization, and a PyTorch‑like API to deliver high‑performance LLM serving with extensive customization and future‑focused enhancements.

GPU AccelerationLLM inferenceNVIDIA

0 likes · 12 min read

Introduction to NVIDIA TensorRT-LLM Inference Framework

JD Retail Technology

Jan 30, 2024 · Artificial Intelligence

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

The article details JD Retail's evolution from TensorFlow‑based sparse training to a custom high‑performance parameter server and a fully GPU‑accelerated, multi‑node, multi‑card synchronous training framework that leverages GPU‑RDMA, two‑level CPU‑DRAM/GPU‑HBM caching, and pipeline parallelism to overcome storage, I/O, and compute challenges of trillion‑parameter recommendation systems.

AI infrastructureGPU AccelerationParameter Server

0 likes · 12 min read

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

JD Retail Technology

Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

Deep Learning CompilerDistributed computingGPU Acceleration

0 likes · 13 min read

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

DataFunSummit

Nov 19, 2023 · Artificial Intelligence

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

Distributed EmbeddingGPU AccelerationHierarchical KV

0 likes · 13 min read

Overview of NVIDIA Merlin for Recommendation Systems

DaTaobao Tech

May 24, 2023 · Mobile Development

Understanding and Optimizing Mobile Page Performance and Jank

Effective mobile page performance requires identifying three jank types—screen tearing, frame drops, and long unresponsiveness—monitoring metrics such as response time, animation latency, idle time, and SM, understanding the CPU‑GPU rendering pipeline, and applying optimizations like hardware acceleration, transform‑based animations, reduced layout thrashing, task slicing, and GPU‑friendly techniques.

Browser RenderingGPU AccelerationJank

0 likes · 13 min read

Understanding and Optimizing Mobile Page Performance and Jank

Bilibili Tech

Apr 21, 2023 · Artificial Intelligence

Design and Optimization of Bilibili's Large-Scale Video Duplicate Detection System

Bilibili built a massive video‑duplicate detection platform that trains a self‑supervised ResNet‑50 feature extractor, removes black borders, and uses a two‑stage ANN‑plus‑segment‑level matching pipeline accelerated by custom GPU decoding and inference, boosting duplicate rejection 7.5×, recall 3.75×, and cutting manual misses from 65 to 5 per day.

GPU Accelerationdeep learningfeature extraction

0 likes · 19 min read

Design and Optimization of Bilibili's Large-Scale Video Duplicate Detection System

Baidu Intelligent Cloud Tech Hub

Apr 17, 2023 · Artificial Intelligence

How NVIDIA’s GPU‑Powered AI is Revolutionizing Drug Discovery and Genomics

The article outlines NVIDIA’s CLARA platform, BioNeMo framework, and GPU‑accelerated tools such as CLARA Parabricks and RAPIDS, demonstrating how AI and high‑performance computing dramatically speed up drug‑target identification, molecular generation, protein structure prediction, and high‑throughput DNA/RNA sequencing, with benchmarks showing up to 80‑fold acceleration.

AI drug discoveryBioNeMoCLARA

0 likes · 11 min read

How NVIDIA’s GPU‑Powered AI is Revolutionizing Drug Discovery and Genomics

DataFunSummit

Apr 9, 2023 · Artificial Intelligence

PGLBox: An Industrial-Scale GPU‑Accelerated Graph Learning Framework

This article introduces the development trends of graph learning frameworks, explains GPU acceleration techniques such as UVA and multi‑GPU pipelines, details the design of the PaddlePaddle Graph Learning (PGL) framework and its large‑scale engine PGLBox, and demonstrates how these technologies enable industrial‑grade graph representation learning with billions of nodes and edges.

GPU AccelerationPGLBoxPaddlePaddle

0 likes · 18 min read

PGLBox: An Industrial-Scale GPU‑Accelerated Graph Learning Framework

Tencent Cloud Developer

Mar 22, 2023 · Artificial Intelligence

How AngelPTM Cuts Large Model Training Costs with ZeRO-Cache Optimizations

This article analyzes Tencent's AngelPTM framework, detailing its ZeRO-Cache strategy, unified storage management, multi‑stream async execution, SSD tiered storage, and performance benchmarks that show up to 95% larger model capacity and over 44% speedup compared to community solutions.

AI infrastructureGPU AccelerationMemory Optimization

0 likes · 12 min read

How AngelPTM Cuts Large Model Training Costs with ZeRO-Cache Optimizations

Alibaba Cloud Big Data AI Platform

Mar 20, 2023 · Artificial Intelligence

How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration

This article details how Ximalaya’s AI Cloud adopted the open‑source HybridBackend framework to overcome sparse data access and distributed training bottlenecks, achieving multi‑GPU utilization gains, faster model training, and significant cost reductions across its recommendation services.

GPU AccelerationHybridBackendSparse Data

0 likes · 9 min read

How HybridBackend Supercharged Ximalaya’s Recommendation Engine with GPU Acceleration

Baidu Geek Talk

Mar 9, 2023 · Industry Insights

How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance

This article analyzes Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle‑optimized Stable Diffusion models, presenting benchmark comparisons, hardware‑specific speed and memory gains, and the underlying inference optimizations that enable low‑cost, high‑throughput AI‑generated image creation.

AI paintingAIGCGPU Acceleration

0 likes · 9 min read

How Baidu’s ERNIE‑ViLG 2.0 and PaddlePaddle Boost AI Painting Performance

DataFunSummit

Feb 15, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV‑CUDA: Reducing Complexity and Boosting Performance

This article examines how moving image pre‑ and post‑processing to GPU with NVIDIA's CV‑CUDA reduces software complexity, alleviates CPU bottlenecks, and delivers up to thirty‑fold throughput gains for computer‑vision workloads across training and inference pipelines.

CV-CUDAGPU AccelerationPerformance Optimization

0 likes · 17 min read

Accelerating Computer Vision Pipelines with CV‑CUDA: Reducing Complexity and Boosting Performance

DataFunTalk

Feb 11, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

This article explains how moving image preprocessing and post‑processing to GPU with the open‑source CV‑CUDA library dramatically reduces system complexity, eliminates CPU‑GPU bottlenecks, and delivers up to thirty‑fold performance gains for computer‑vision workloads across training and inference stages.

CV-CUDAGPU AccelerationPerformance Optimization

0 likes · 16 min read

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

Baidu Geek Talk

Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques

This article systematically examines the major performance bottlenecks in AI model training, explains the underlying hardware and software causes, and presents a comprehensive set of acceleration strategies—including data‑loading optimizations, compute‑side enhancements, communication tricks, and the AIAK‑Training suite—backed by real‑world case studies and quantitative results.

AI trainingAIAK-TrainingGPU Acceleration

0 likes · 33 min read

How to Supercharge AI Model Training: Bottlenecks and Cutting‑Edge Acceleration Techniques

Baidu Intelligent Cloud Tech Hub

Dec 27, 2022 · Artificial Intelligence

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

This article presents a comprehensive analysis of AI inference bottlenecks, explores industry acceleration techniques such as model simplification, operator fusion, and single‑operator optimization, and details Baidu Cloud's AIAK‑Inference suite with practical demos showing up to 90% latency reduction.

AI inferenceAIAK-InferenceBaidu Cloud

0 likes · 16 min read

How to Supercharge AI Inference: End‑to‑End Acceleration Strategies and Baidu’s AIAK‑Inference

Baidu Intelligent Cloud Tech Hub

Dec 22, 2022 · Artificial Intelligence

How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques

This article systematically analyzes the main performance bottlenecks in AI model training, explains why acceleration is essential, and presents current hardware‑ and software‑based solutions—including data‑loading optimizations, operator fusion, mixed‑precision and Tensor Core usage, as well as distributed communication strategies—followed by real‑world case studies of Baidu's AIAK‑Training suite that demonstrate significant speed‑ups.

AI trainingGPU AccelerationPerformance Optimization

0 likes · 31 min read

How to Supercharge AI Model Training: Bottlenecks and Acceleration Techniques

Architects Research Society

Nov 30, 2022 · Artificial Intelligence

A Comprehensive Overview of Machine Learning Tools and Libraries

An extensive survey ranks and compares a wide range of machine learning libraries and frameworks—both deep and shallow learning—detailing their languages, types, GPU acceleration, distributed computing capabilities, and typical academic and industrial applications, based on Google search popularity as of May.

Distributed computingGPU Accelerationdeep learning

0 likes · 20 min read

A Comprehensive Overview of Machine Learning Tools and Libraries

Bilibili Tech

Nov 8, 2022 · Industry Insights

BANG Engine: Multi‑Level Pipelines & GPU Acceleration for Faster Video Transcoding

To meet Bilibili’s demanding live and on‑demand video transcoding needs, the BANG engine combines a multi‑stage pipeline architecture, frame‑block and multi‑frame parallelism, SIMD‑based CPU acceleration, and TensorRT/TensorFlow GPU inference, offering configurable string‑based pipelines that dramatically increase throughput while simplifying integration.

BilibiliGPU AccelerationTensorRT

0 likes · 18 min read

BANG Engine: Multi‑Level Pipelines & GPU Acceleration for Faster Video Transcoding

Cloud Native Technology Community

Nov 7, 2022 · Cloud Computing

How Edge Computing Is Transforming Automotive Manufacturing

This article explores how edge computing, combined with cloud-native technologies, 5G, and GPU acceleration, enables real‑time data processing, intelligent inspection, digital twins, and autonomous driving in the automotive industry, outlining practical architectures, hardware choices, and deployment patterns.

5GAutomotiveCloud Native

0 likes · 19 min read

How Edge Computing Is Transforming Automotive Manufacturing

DataFunTalk

Oct 31, 2022 · Artificial Intelligence

NVIDIA Merlin HugeCTR: System Overview, Architecture, and Performance

This article introduces NVIDIA Merlin's HugeCTR recommendation system framework, covering its three main modules—NV Tabular, HugeCTR, and Triton—detailing model‑parallel embedding handling, CUDA kernel fusion, mixed‑precision training, hierarchical parameter server inference, Sparse Operation Kit for TensorFlow, performance benchmarks, and practical deployment considerations.

EmbeddingGPU AccelerationHugeCTR

0 likes · 19 min read

NVIDIA Merlin HugeCTR: System Overview, Architecture, and Performance

MaGe Linux Operations

Oct 7, 2022 · Fundamentals

Boost Python Performance 100× with Taichi: Real‑World Speedup Examples

Discover how importing the Taichi library can accelerate Python code by up to 100 times, with detailed examples ranging from prime counting and longest common subsequence dynamic programming to reaction‑diffusion simulations, including performance metrics, GPU support, and concise code snippets.

GPU AccelerationNumerical ComputingPython performance

0 likes · 8 min read

Boost Python Performance 100× with Taichi: Real‑World Speedup Examples

Qingyun Technology Community

Sep 15, 2022 · Cloud Computing

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

This article explores the rapid growth of video traffic, explains why transcoding is essential, compares CPU, GPU, and VPU hardware for video processing, details the FFmpeg software stack, describes the design of a cloud‑native transcoding cluster, its scheduling, shard‑transcoding technique, and presents performance test results.

Cloud ComputingDistributed SystemsGPU Acceleration

0 likes · 23 min read

How GPU, VPU, and CPU Accelerate Cloud Video Transcoding: Architecture and Best Practices

DataFunSummit

Sep 4, 2022 · Artificial Intelligence

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

This talk explores the challenges and opportunities of leveraging sparsity in machine learning models, covering fine‑grained and coarse‑grained sparsity, NVIDIA Ampere’s 2:4 structured sparsity, knowledge‑distillation techniques for converting unstructured to structured sparsity, and model compression strategies for generative adversarial networks.

GPU AccelerationGaNKnowledge Distillation

0 likes · 14 min read

Sparse Features in Machine Learning: Challenges, NVIDIA Ampere Structured Sparsity, Knowledge Distillation, and GAN Model Compression

Kuaishou Large Model

Aug 26, 2022 · Cloud Computing

Boost Cloud Rendering with NVIDIA GPU: Hardware Encoding & Decoding Using FFmpeg

This article explains how to leverage server‑side GPUs for hardware‑accelerated H.264 encoding and decoding with FFmpeg, covering installation, key API calls, format conversion to OpenGL textures, multi‑process considerations, and performance optimizations for cloud‑rendered visual effects.

CUDAGPU AccelerationNVIDIA

0 likes · 11 min read

Boost Cloud Rendering with NVIDIA GPU: Hardware Encoding & Decoding Using FFmpeg

Meituan Technology Team

Jul 6, 2022 · Artificial Intelligence

Engineering Practices for Large-Scale Deep Learning Models in Meituan Takeaway Advertising

The article details Meituan's engineering journey from small DNNs to hundred‑gigabyte deep learning models for food‑delivery ads, analyzing online latency and offline efficiency challenges and presenting distributed storage, CPU/GPU acceleration, OpenVINO, TensorRT, CodeGen, and data‑pipeline optimizations that dramatically improve throughput, memory usage, and sample‑building speed.

CPU accelerationGPU AccelerationOpenVINO

0 likes · 45 min read

Engineering Practices for Large-Scale Deep Learning Models in Meituan Takeaway Advertising

21CTO

Jun 6, 2022 · Fundamentals

How a Peking University Student Dominated Global EDA Competitions and Won ACM’s Top Student Award

Guo Zizheng, a Peking University Turing Class senior, secured first place in the ACM Student Research Competition, published eight first‑author EDA papers, pioneered GPU‑accelerated static timing analysis, and earned the prestigious Student May Fourth Medal, highlighting China's rising talent in chip design automation.

Academic AwardsEDAGPU Acceleration

0 likes · 7 min read

How a Peking University Student Dominated Global EDA Competitions and Won ACM’s Top Student Award

IT Services Circle

May 8, 2022 · Information Security

An Introduction to Hashcat: Features, Usage, and Command Options

This article introduces Hashcat, the world’s fastest password‑recovery tool, outlines its extensive feature set, provides the project’s GitHub address, and explains how to download, install, and run basic commands with common options for various hash types and attack modes.

Command LineGPU AccelerationHashcat

0 likes · 4 min read

An Introduction to Hashcat: Features, Usage, and Command Options

DataFunTalk

Apr 22, 2022 · Artificial Intelligence

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

This article presents a comprehensive overview of inference optimization methods—including model pruning, quantization, knowledge distillation, caching, instruction‑set acceleration, and operator fusion—and details a GPU‑centric parallel acceleration methodology with CUDA basics, performance‑analysis tools, theoretical limits, and practical case studies, all illustrated with real‑world examples from Tencent's intelligent dialogue products.

CachingGPU AccelerationKnowledge Distillation

0 likes · 18 min read

Inference Optimization Techniques and GPU Parallel Acceleration for Tencent Intelligent Dialogue Models

HomeTech

Feb 15, 2022 · Artificial Intelligence

Horovod Distributed Deep Learning Training: Architecture, Performance, and Kubernetes Deployment

This article provides a comprehensive overview of Horovod, Uber's open-source distributed deep learning framework, covering its architecture, communication mechanisms, performance benchmarks, and deployment on Kubernetes and Spark for accelerated multi-GPU training.

GPU AccelerationHorovodKubernetes

0 likes · 17 min read

Horovod Distributed Deep Learning Training: Architecture, Performance, and Kubernetes Deployment

Kuaishou Tech

Dec 17, 2021 · Cloud Computing

Cloud Rendering: Technology and Applications in Kuaishou's Special Effects Business

Cloud rendering leverages cloud computing to process and return rendered results to users, enabling Kuaishou to deliver advanced special effects by combining rendering engines with AI capabilities for improved performance and functionality.

5G applicationsAI integrationGPU Acceleration

0 likes · 7 min read

Cloud Rendering: Technology and Applications in Kuaishou's Special Effects Business

Alipay Experience Technology

Oct 13, 2021 · Artificial Intelligence

How ant‑tfjs Boosts Web AI Inference: WebGL, Wasm, and GPU Optimizations

This article examines high‑performance web computing for TensorFlow.js models, comparing tfjs and ant‑tfjs on WebGL, Wasm, and GPU backends, and details a series of optimizations—including pre‑encoding, shader handling, graph fusion, vectorization, and memory layout—that double inference speed on mobile devices.

Frontend AIGPU AccelerationTensorFlow.js

0 likes · 11 min read

How ant‑tfjs Boosts Web AI Inference: WebGL, Wasm, and GPU Optimizations

WeDoctor Frontend Technology

Sep 29, 2021 · Frontend Development

Boost Web Image Processing Speed 5‑6× with GPU.js: A Front‑End Guide

This article explains how GPU.js can accelerate large‑scale medical image windowing in web browsers, reducing latency and preventing crashes, while sharing practical code examples, common pitfalls, and performance benchmarks for both browser and Node environments.

GPU AccelerationWebGLgpu.js

0 likes · 9 min read

Boost Web Image Processing Speed 5‑6× with GPU.js: A Front‑End Guide

政采云技术

Sep 28, 2021 · Frontend Development

Browser Rendering: Reflow and Repaint

This article explains the browser rendering pipeline, the concepts of reflow and repaint, how they are triggered, their performance impact, and provides practical techniques such as minimizing layout thrashing, using GPU‑accelerated properties, and leveraging requestAnimationFrame to optimize front‑end performance.

Browser RenderingFrontendGPU Acceleration

0 likes · 17 min read

Alimama Tech

Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

Beam SearchGPU AccelerationRecommendation Systems

0 likes · 15 min read

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

MaGe Linux Operations

Jul 26, 2021 · Fundamentals

Boost NumPy Performance 10× with CuPy: GPU Acceleration Guide

This article explains how CuPy mirrors NumPy's API to run array and matrix operations on NVIDIA GPUs, providing step‑by‑step installation, code examples, and benchmark results that demonstrate speedups ranging from 10× to over 700× compared to CPU‑only NumPy.

CUDACuPyGPU Acceleration

0 likes · 5 min read

Boost NumPy Performance 10× with CuPy: GPU Acceleration Guide

Sohu Tech Products

Jul 7, 2021 · Frontend Development

Optimizing a React Progress Bar Component: From Inefficient Implementation to GPU‑Accelerated Performance

This article examines a common React progress‑bar component, explains why the naïve timer‑based implementation causes heavy reflows and repaints, and presents a GPU‑accelerated CSS animation solution that dramatically improves frame rates and reduces layout work.

FrontendGPU AccelerationPerformance Optimization

0 likes · 11 min read

Optimizing a React Progress Bar Component: From Inefficient Implementation to GPU‑Accelerated Performance

Architects Research Society

May 31, 2021 · Artificial Intelligence

Comprehensive Survey of Machine Learning Tools and Libraries

This article presents a detailed overview and ranking of numerous machine learning tools and libraries, distinguishing deep and shallow learning approaches, highlighting language support, GPU acceleration, and distributed computing capabilities, and provides insights into their academic and industrial usage.

Distributed computingGPU Accelerationshallow learning

0 likes · 9 min read

Comprehensive Survey of Machine Learning Tools and Libraries

Python Programming Learning Circle

May 19, 2021 · Operations

Introducing WSLg: Full Linux GUI Support on Windows 10

Microsoft’s recent preview of WSLg brings full Linux GUI support to Windows 10, enabling users to install and run desktop Linux distributions, launch IDEs and GUI applications with audio, microphone, and GPU acceleration, and seamlessly integrate development workflows across Windows and Linux environments.

GPU AccelerationIDE integrationLinux GUI

0 likes · 4 min read

Introducing WSLg: Full Linux GUI Support on Windows 10

ITPUB

Apr 29, 2021 · Operations

Run Linux GUI Apps on Windows 10 with WSLg: Full Guide and Architecture

Microsoft's WSLg update lets Windows 10 users install and run Linux desktop environments, IDEs, audio‑enabled apps, and GPU‑accelerated software directly on the host, with automatic backend services and a clear architecture overview.

Audio supportGPU AccelerationLinux GUI

0 likes · 5 min read

Run Linux GUI Apps on Windows 10 with WSLg: Full Guide and Architecture

DataFunTalk

Apr 28, 2021 · Big Data

Accelerating Apache Spark 3.0 with NVIDIA RAPIDS: Architecture, Performance Gains, and New Features

This article explains how NVIDIA's RAPIDS Accelerator leverages GPUs to speed up Apache Spark 3.0 workloads, detailing the underlying architecture, benchmark results on TPC‑DS and recommendation models, required configuration changes, supported operators, shuffle optimizations, and the enhancements introduced in versions 0.2 and 0.3.

Apache SparkBig DataGPU Acceleration

0 likes · 19 min read

Accelerating Apache Spark 3.0 with NVIDIA RAPIDS: Architecture, Performance Gains, and New Features

Alibaba Cloud Infrastructure

Apr 22, 2021 · Artificial Intelligence

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, processing over 1.07 million images per second on 8 NVIDIA A100 GPUs, setting multiple first‑place records and dramatically improving e‑commerce recommendation speed and overall AI workload efficiency.

AI inferenceAlibaba CloudGPU Acceleration

0 likes · 7 min read

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

iQIYI Technical Product Team

Feb 5, 2021 · Artificial Intelligence

Efficient General‑Purpose Frame Extraction for AI Video Inference Services

The paper presents a unified, high‑performance frame‑extraction framework that dynamically selects CPU or GPU decoding, leverages multithreaded and CUDA‑accelerated pipelines, keeps frames in memory, and achieves up to ten‑fold latency reductions for diverse AI video‑inference tasks.

AI video inferenceCPU optimizationGPU Acceleration

0 likes · 14 min read

Efficient General‑Purpose Frame Extraction for AI Video Inference Services

Suning Technology

Sep 17, 2020 · Artificial Intelligence

Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision

This article explores how 3D digital storebuilding integrates multiple visual sensors, GPU acceleration, and advanced camera calibration to create high‑precision, real‑time digital twins of retail spaces, enabling fine‑grained lifecycle management and immersive customer experiences.

3D ReconstructionGPU Accelerationcamera calibration

0 likes · 15 min read

Unlocking Retail Innovation: 3D Digital Storebuilding with Multi‑Camera Vision

WecTeam

Sep 11, 2020 · Frontend Development

Simulate Life with WebGL and Speed Up Pages Using CSS content-visibility

This week’s WecTeam Front‑end newsletter showcases a WebGL implementation of Conway’s Game of Life that treats the GPU as a parallel for‑loop accelerator, and introduces the CSS content-visibility:auto property, which lets browsers defer layout and rendering of off‑screen elements to dramatically improve initial page load performance.

Content-VisibilityConway's Game of LifeGPU Acceleration

0 likes · 2 min read

Simulate Life with WebGL and Speed Up Pages Using CSS content-visibility

Liangxu Linux

May 30, 2020 · Operations

WSL 2 Brings Linux GUI Apps and GPU Acceleration to Windows 10

Microsoft’s Windows Subsystem for Linux 2 adds a full Linux kernel, native GUI application support, GPU hardware acceleration for AI workloads, and a simplified installation command, marking a major step toward tighter Windows‑Linux integration in the upcoming Windows 10 2004 update.

GPU AccelerationGUISubsystem

0 likes · 5 min read

WSL 2 Brings Linux GUI Apps and GPU Acceleration to Windows 10

Alibaba Cloud Developer

Apr 16, 2020 · Artificial Intelligence

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

This article explains how the Mars framework enables parallel and distributed execution of core Python data‑science libraries—Numpy, Pandas, and Scikit‑Learn—while integrating with RAPIDS for GPU acceleration, and demonstrates its performance advantages through code examples and benchmark results.

GPU AccelerationMarsNumPy

0 likes · 16 min read

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

Architects' Tech Alliance

Dec 27, 2019 · Fundamentals

Survey of GPU-Accelerated HPC Applications Across Scientific Domains

This article surveys the rapid growth of GPU-accelerated high‑performance computing (HPC) applications driven by NVIDIA's ecosystem, detailing the most common scientific fields, the proportion of GPU‑supported tools, and the emerging role of AI as a primary growth engine.

AICUDAGPU

0 likes · 8 min read

Survey of GPU-Accelerated HPC Applications Across Scientific Domains

Alibaba Cloud Developer

Jun 12, 2019 · Artificial Intelligence

How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs

Alibaba engineers detail the PAISoar distributed training framework, showing how RDMA‑optimized hardware, Ring AllReduce algorithms, and user‑friendly APIs boost deep‑learning models—like the GreenNet CNN—to 101‑fold speedups on 128 GPUs, dramatically reducing training time from days to under a day.

AI infrastructureGPU AccelerationPAISoar

0 likes · 17 min read

How Alibaba’s PAISoar Accelerates Deep Learning: 101× Speedup on 128 GPUs

Alibaba Cloud Developer

Jan 17, 2019 · Artificial Intelligence

How Alibaba’s Mars Engine Brings Tensor‑Based Scientific Computing to Distributed Scale

Alibaba’s open‑source Mars engine extends NumPy‑style tensor operations to distributed environments, leveraging GPU acceleration, sparse matrices, and flexible scheduling to dramatically boost scientific and AI workloads beyond single‑machine limits.

Distributed computingGPU AccelerationMars

0 likes · 10 min read

How Alibaba’s Mars Engine Brings Tensor‑Based Scientific Computing to Distributed Scale

Alibaba Cloud Developer

Dec 18, 2018 · Databases

Inside Alibaba AnalyticDB: Architecture, Core Technologies, and Real‑Time Data Warehouse Innovations

This article provides an in‑depth technical overview of Alibaba's AnalyticDB, covering the challenges of massive real‑time analytics, the cloud‑native multi‑tenant architecture, data model, import/export capabilities, high‑performance SQL parser, the Xuanwu storage engine, Xihe compute engine, optimizer, GPU acceleration, and elastic scaling features.

AnalyticDBDistributed computingGPU Acceleration

0 likes · 38 min read

Inside Alibaba AnalyticDB: Architecture, Core Technologies, and Real‑Time Data Warehouse Innovations

MaGe Linux Operations

Nov 22, 2018 · Artificial Intelligence

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

This article explains how to speed up TensorFlow deep‑learning model training using single‑GPU acceleration, multi‑GPU parallelism, and distributed TensorFlow on Kubernetes, covering device placement, session parameters, synchronous vs asynchronous training modes, and practical code examples to improve performance and scalability.

GPU AccelerationTensorFlowdeep learning

0 likes · 10 min read

Accelerating TensorFlow Deep Learning: GPU & Distributed Training Techniques

Tencent TDS Service

Jul 12, 2018 · Artificial Intelligence

How to Engineer MobileNet for Efficient Image Classification on Mobile Devices

This article details the engineering of MobileNet V1 for image classification on mobile terminals, covering its depthwise separable convolution architecture, data collection and preprocessing, model training with transfer learning, TensorFlow Lite conversion, deployment on iOS/Android, and GPU acceleration techniques for faster inference.

GPU AccelerationMobile DeploymentMobileNet

0 likes · 19 min read

How to Engineer MobileNet for Efficient Image Classification on Mobile Devices

21CTO

Jan 29, 2018 · Fundamentals

How Tencent Cut Hundreds of Gigabytes of Bandwidth with Advanced Image Compression

This article reviews the evolution of image formats such as JPEG, WebP, HEVC, and Tencent's proprietary WXAM and SHARP, explains psychovisual JPEG optimization with Guetzli, details GPU‑accelerated performance tweaks, and shows how these techniques saved terabytes of bandwidth and reduced user download latency across Tencent's massive image platform.

Bandwidth ReductionGPU AccelerationGuetzli

0 likes · 14 min read

How Tencent Cut Hundreds of Gigabytes of Bandwidth with Advanced Image Compression

Tencent Architect

Jan 27, 2018 · Fundamentals

Advances in Image Compression: From JPEG to WebP, HEVC, WXAM, SHARP, and Guetzli Optimizations at Tencent TPS

The article reviews recent developments in image compression formats such as JPEG, WebP, HEVC, and Tencent's proprietary WXAM/SHARP, explains Guetzli's perceptual encoding, details extensive GPU‑based performance optimizations, and demonstrates how these techniques dramatically reduce bandwidth usage in Tencent's massive image storage platform.

GPU AccelerationGuetzliJPEG

0 likes · 13 min read

Advances in Image Compression: From JPEG to WebP, HEVC, WXAM, SHARP, and Guetzli Optimizations at Tencent TPS

Tencent IMWeb Frontend Team

Dec 7, 2017 · Frontend Development

How to Achieve Smooth 60 FPS Web Animations on Low‑End Devices

This article explains why 60 FPS is the benchmark for fluid web animations, shows how to measure frame rates with requestAnimationFrame, compares CSS and JavaScript animation performance on TV‑box hardware, and provides a step‑by‑step optimization guide using GPU acceleration, will‑change, and dev‑tools.

FrontendGPU AccelerationPerformance Optimization

0 likes · 16 min read

How to Achieve Smooth 60 FPS Web Animations on Low‑End Devices

MaGe Linux Operations

Apr 19, 2017 · Artificial Intelligence

Accelerate TensorFlow Deep Learning with GPU, Multi‑GPU, and Distributed Training

This article explains how to speed up TensorFlow deep‑learning model training by using a single GPU, configuring session parameters, assigning operations to specific devices, employing multi‑GPU parallelism, and leveraging distributed TensorFlow on Kubernetes, while also discussing synchronous versus asynchronous training modes and practical best practices.

GPU AccelerationParallelismTensorFlow

0 likes · 11 min read

Accelerate TensorFlow Deep Learning with GPU, Multi‑GPU, and Distributed Training

ITPUB

Sep 6, 2016 · Artificial Intelligence

Deep Learning Platforms: From Google’s DistBelief to Open‑Source MXNet and TensorFlow

The article reviews the evolution, challenges, and commercial and open‑source deep learning platforms—including DistBelief, COTS, Adam, MXNet, TensorFlow, and Petuum—while highlighting real‑world applications such as image recognition, recommendation, sentiment analysis, and crowd monitoring.

AI applicationsGPU AccelerationMXNet

0 likes · 10 min read

Deep Learning Platforms: From Google’s DistBelief to Open‑Source MXNet and TensorFlow

ITPUB

Jan 19, 2016 · Databases

Surprising PostgreSQL Features That Redefine What a Database Can Do

This article showcases seven remarkable PostgreSQL extensions—including multi‑master replication, Greenplum MPP OLAP, pg_shard/FDW sharding, PostGIS 3D GIS, GPU‑accelerated PG‑Strom, PipelineDB streaming, and the versatile FDW interface—illustrating how they enable high‑availability, massive analytics, geographic intelligence, and real‑time data processing.

Database ExtensionsFDWGIS

0 likes · 5 min read

Surprising PostgreSQL Features That Redefine What a Database Can Do