Tagged articles

48 articles

Page 1 of 1

May 18, 2026 · Industry Insights

GPU Landscape 2026: Three Dominants and a Growing Field of Challengers

In 2026 the GPU market has shifted from Nvidia's lone dominance to a competitive arena where Nvidia, AMD, and Intel vie with emerging Chinese players and cloud‑vendor chips, emphasizing architecture, energy efficiency, chiplet packaging, and software ecosystems over sheer core count.

AI accelerationAMDChiplet

0 likes · 10 min read

GPU Landscape 2026: Three Dominants and a Growing Field of Challengers

DataFunTalk

May 14, 2026 · Industry Insights

OpenAI’s Rapid Sprint: GPT‑5.6 Leaked and a $400 Subsidy to Oust Claude Code

Within three weeks of GPT‑5.5’s launch, internal code for GPT‑5.6 surfaced, prompting OpenAI to unveil a 2‑3× “ultrafast” mode and a two‑month free Codex offer worth $400 to lure Claude Code users, sparking a high‑speed AI competition with Anthropic’s Opus 4.7 Fast and highlighting a self‑reinforcing acceleration loop toward ASI.

AI accelerationClaude CodeCodex

0 likes · 7 min read

OpenAI’s Rapid Sprint: GPT‑5.6 Leaked and a $400 Subsidy to Oust Claude Code

HyperAI Super Neural

Apr 7, 2026 · Artificial Intelligence

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

MIT researchers introduce DRiffusion, a draft‑and‑refine parallel framework that uncovers intrinsic parallelism in diffusion models, delivering 1.4–3.7× speedup on three GPUs while preserving near‑lossless image quality across Stable Diffusion 2.1, SDXL and SD3 evaluated on MS‑COCO.

AI accelerationDRiffusionMS-COCO

0 likes · 14 min read

MIT’s DRiffusion Achieves 1.4–3.7× Faster Diffusion Sampling via Draft‑and‑Refine Parallelism

Baidu Intelligent Cloud Tech Hub

Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization

0 likes · 6 min read

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

Baidu Intelligent Cloud Tech Hub

Nov 4, 2025 · Artificial Intelligence

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

Baidu Baige’s enhanced veRL framework dramatically boosts video frame rates and resolution limits, cuts training time, reduces memory usage, and improves model accuracy by leveraging context parallelism and optimized attention on Ampere GPUs for multimodal mixed‑training scenarios.

AI accelerationContext ParallelismVideo processing

0 likes · 6 min read

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

Architects' Tech Alliance

Oct 11, 2025 · Artificial Intelligence

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

This article examines the architectural differences between Scale‑Out and Scale‑Up networking, compares PCIe, NVLink, UALink, Infiniband and RoCE, and explains why high‑bandwidth, low‑latency GPU interconnects like NVLink are essential for modern AI and HPC workloads.

AI accelerationGPU interconnectHigh Performance Computing

0 likes · 27 min read

Why NVLink Beats PCIe for AI: Deep Dive into GPU Interconnect Technologies

Architects' Tech Alliance

Sep 14, 2025 · Artificial Intelligence

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

The article analyzes Nvidia's 2023 Blackwell GPU series and GB200 NVL72 architecture, detailing their advanced 3‑4nm manufacturing, redesigned CUDA cores, next‑gen ray‑tracing and DLSS upgrades, massive compute and memory bandwidth gains, NVLink Gen5 improvements, and the diverse GB200 product configurations for high‑performance AI workloads.

AI accelerationBlackwell GPUGPU architecture

0 likes · 7 min read

Why Nvidia’s Blackwell GPUs Are Redefining AI Performance

Xiaohongshu Tech REDtech

Aug 19, 2025 · Artificial Intelligence

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

The paper introduces Single Trajectory Distillation (STD), a novel training framework that aligns full PF‑ODE trajectories from a fixed noisy state, uses a Trajectory Bank to cut training cost, and adds an Asymmetric Adversarial Loss to markedly improve style consistency and aesthetic quality while accelerating image and video style‑transfer diffusion models.

AI accelerationStyle Transferconsistency models

0 likes · 14 min read

How Single Trajectory Distillation Boosts Diffusion Model Speed and Style Quality

Architects' Tech Alliance

Jul 9, 2025 · Fundamentals

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

HBM5 introduces a 3D near‑memory computing architecture that vertically stacks DRAM dies and integrates compute units within the memory stack, dramatically boosting bandwidth, reducing data‑movement power, and delivering significant performance and energy‑efficiency gains for AI, high‑performance computing, and data‑center workloads.

AI accelerationHBM5Near-Memory Computing

0 likes · 8 min read

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

Architects' Tech Alliance

Jul 8, 2025 · Fundamentals

Why Modern Data Center Switches Are the Backbone of AI Scaling

This article explains how data‑center switches are classified, the key components and performance metrics of Ethernet switch chips, market growth trends, the shift from OEO to full‑optical OCS designs, and how RDMA technologies like InfiniBand and RoCEv2 enable the low‑latency networking essential for large‑scale AI training.

AI accelerationData Center NetworkingRDMA

0 likes · 12 min read

Why Modern Data Center Switches Are the Backbone of AI Scaling

Architects' Tech Alliance

Jun 9, 2025 · Artificial Intelligence

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

In March 2024 Nvidia unveiled the Blackwell GPU family and the GB200 NVL72 architecture, featuring 3‑4 nm processes, redesigned CUDA cores, next‑gen ray‑tracing, upgraded DLSS, massive FP16/FP8 compute gains, 8 TB/s memory bandwidth, and NVLink Gen5, while also presenting complex power, cooling, and packaging challenges for large‑scale AI deployments.

AI accelerationBlackwellGPU

0 likes · 6 min read

What Makes Nvidia’s Blackwell GPUs a Game-Changer for AI Performance?

Architects' Tech Alliance

Jun 6, 2025 · Artificial Intelligence

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

This article compares NVIDIA’s China‑specific B30 and high‑end H20 GPUs, detailing their CPU/CPU architecture updates, memory technologies, architectural differences, performance metrics, power and cooling characteristics, and price positioning, to help enterprises and developers choose the most suitable accelerator for AI and deep‑learning tasks.

AI accelerationB30GPU

0 likes · 13 min read

B30 vs H20: Which NVIDIA GPU Wins for AI Workloads and Budgets?

Architects' Tech Alliance

Apr 28, 2025 · Artificial Intelligence

NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance

NVLink, NVIDIA's high‑bandwidth interconnect introduced with the P100 GPU, replaces PCIe by offering significantly higher data rates and lower latency for GPU‑GPU and GPU‑CPU communication, and has evolved through multiple generations to support modern AI and high‑performance computing workloads.

AI accelerationGPU interconnectNVIDIA

0 likes · 9 min read

NVLink High‑Speed Interconnect: Architecture, Evolution, and Performance

AntTech

Mar 19, 2025 · Artificial Intelligence

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

At HPCA 2025, two standout papers—UniNDP, a unified compilation and simulation tool for near‑DRAM processing architectures, and WarpDrive, a GPU‑based fully homomorphic encryption accelerator leveraging Tensor and CUDA cores—demonstrate significant performance gains for AI workloads and privacy‑preserving computation.

AI accelerationFully Homomorphic EncryptionGPU

0 likes · 5 min read

Award-Winning HPCA 2025 Papers on Near‑DRAM Processing (UniNDP) and GPU‑Accelerated Fully Homomorphic Encryption (WarpDrive)

Architects' Tech Alliance

Mar 5, 2025 · Industry Insights

How DeepSeek’s Open‑Source Tools Are Supercharging AI Model Performance

DeepSeek’s Open‑Source Week unveiled five high‑performance projects—FlashMLA, DeepEP, DeepGEMM, DualPipe/EPLB, and 3FS—each delivering novel GPU optimizations, communication kernels, matrix‑multiplication libraries, parallelism strategies, and a distributed file system that together dramatically accelerate large‑scale AI training and inference workloads.

AI accelerationDeepSeekGPU optimization

0 likes · 9 min read

How DeepSeek’s Open‑Source Tools Are Supercharging AI Model Performance

DataFunTalk

Feb 26, 2025 · Artificial Intelligence

DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference

DeepGEMM is an open‑source FP8‑precision GEMM library that delivers up to 1350 TFLOPS on NVIDIA Hopper GPUs, offering JIT‑compiled, lightweight code (~300 lines) for dense and MoE matrix multiplication, with easy deployment, configurable environment variables, and performance advantages over CUTLASS for large AI models.

AI accelerationDeepGEMMFP8

0 likes · 7 min read

DeepGEMM: An Open‑Source FP8 GEMM Library for Efficient AI Model Training and Inference

Baidu Geek Talk

Jan 15, 2025 · Artificial Intelligence

Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)

Large‑model inference engines convert prompts into responses via a Prefill stage and an autoregressive Decoder, measured by TTFT and TPOT, and Baidu’s AIAK suite improves TPOT by separating tokenization, using static slot scheduling, and asynchronous execution, cutting token‑interval latency from ~35 ms to ~14 ms and boosting GPU utilization to about 75 % while also leveraging quantization and speculative execution for higher throughput.

AI accelerationGPU utilizationTPOT

0 likes · 10 min read

Understanding Large Model Inference Engines and Reducing Token Interval (TPOT)

DataFunSummit

Dec 4, 2024 · Artificial Intelligence

Accelerating Large Language Model Inference with the YiNian LLM Framework

This article presents the YiNian LLM framework, detailing how KVCache, prefill/decoding separation, continuous batching, PageAttention, and multi‑hardware scheduling are used to speed up large language model inference while managing GPU memory and latency.

AI accelerationGPUKVCache

0 likes · 20 min read

Accelerating Large Language Model Inference with the YiNian LLM Framework

Xiaohongshu Tech REDtech

Sep 19, 2024 · Artificial Intelligence

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

Target‑Driven Distillation (TDD) is a multi‑goal distillation method that flexibly selects short‑range target steps and decouples guidance during training, enabling 4‑to‑8‑step diffusion generation that preserves high‑resolution detail, works with LoRA, ControlNet, InstantID, and outperforms existing consistency distillation techniques in speed and quality.

AI accelerationdiffusion modelsdistillation

0 likes · 9 min read

Target-Driven Distillation (TDD): A Multi‑Goal Distillation Method for Accelerating Diffusion Models

ByteDance SYS Tech

Aug 12, 2024 · Cloud Native

How mGPU Enables Efficient GPU Sharing for AI Workloads

This article explains the mGPU solution that virtualizes NVIDIA GPUs for containers, detailing its driver architecture, compute and memory isolation mechanisms, performance benchmarks on ResNet‑50 inference, and how it boosts GPU utilization by over 50% for AI and high‑performance computing tasks.

AI accelerationCloud NativeGPU Sharing

0 likes · 10 min read

How mGPU Enables Efficient GPU Sharing for AI Workloads

Open Source Linux

Jul 16, 2024 · Artificial Intelligence

Can Quantum Computing Break AI’s Compute Barrier? Insights & Market Outlook

Quantum computing promises exponential parallelism and lower energy consumption, offering a disruptive solution to AI’s compute limits; with mature hardware‑software infrastructure, diverse technology roadmaps, emerging cloud platforms, and a market projected to grow from $4.7 billion in 2023 to over $8 trillion by 2035, it is poised to transform sectors such as finance, chemicals, and life sciences.

AI accelerationMarket ForecastQuantum Computing

0 likes · 4 min read

Can Quantum Computing Break AI’s Compute Barrier? Insights & Market Outlook

Architects' Tech Alliance

Jun 13, 2024 · Industry Insights

How Nvidia’s New Blackwell GPUs and NVLink Redefine AI Acceleration in 2024

The article analyzes Nvidia's latest AI‑focused hardware and software breakthroughs showcased at ComputeX 2024, detailing how GPU‑CPU hybrid architectures, new libraries, and high‑speed interconnects like NVLink dramatically boost performance while keeping power and cost growth modest.

AI accelerationBlackwellDGX

0 likes · 12 min read

How Nvidia’s New Blackwell GPUs and NVLink Redefine AI Acceleration in 2024

Rare Earth Juejin Tech Community

May 1, 2024 · Artificial Intelligence

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

Hyper‑SD introduces a trajectory‑segmented consistency distillation framework that combines trajectory‑preserving and trajectory‑reconstruction strategies, integrates human‑feedback learning and score distillation, and achieves state‑of‑the‑art low‑step image generation performance on both SD1.5 and SDXL models.

AI accelerationRLHFdiffusion models

0 likes · 10 min read

Hyper‑SD: Trajectory‑Segmented Consistency Model for Accelerating Diffusion Image Generation

DevOps Operations Practice

Apr 29, 2024 · Fundamentals

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

This article explains the basic functions of CPUs and GPUs, their advanced capabilities and real‑world applications, and compares their architectures, processing models, and roles in environments such as IoT, mobile devices, Kubernetes, and AI workloads.

AI accelerationCPUGPU

0 likes · 7 min read

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

Sohu Tech Products

Mar 27, 2024 · Artificial Intelligence

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

NVIDIA’s comprehensive LLM ecosystem combines the full‑stack NeMo Framework for data curation, distributed training, fine‑tuning, inference acceleration with TensorRT‑LLM and Triton, plus Retrieval‑Augmented Generation and Guardrails, enabling efficient, low‑latency, knowledge‑grounded model deployment across clusters.

AI accelerationNVIDIANeMo Framework

0 likes · 16 min read

NVIDIA NeMo Framework, TensorRT‑LLM, and RAG for Large Language Model Solutions

Volcano Engine Developer Services

Mar 7, 2024 · Artificial Intelligence

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

SDXL‑Lightning, a new diffusion‑based text‑to‑image model from ByteDance, uses Progressive Adversarial Distillation to cut inference steps to as few as 2 while maintaining high resolution and fidelity, offering ten‑fold speed gains, open‑source access, and compatibility with SDXL, ControlNet, and ComfyUI.

AI accelerationdiffusionmodel distillation

0 likes · 8 min read

How SDXL‑Lightning Generates High‑Quality Images in Just 2 Steps

Alibaba Cloud Big Data AI Platform

Feb 23, 2024 · Artificial Intelligence

How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud

PAI‑TorchAcc, an Alibaba Cloud AI platform accelerator, offers a seamless PyTorch interface that integrates HuggingFace models and employs LazyTensor‑based static graph conversion, multi‑strategy distributed training, and extensive GPU optimizations to dramatically boost throughput for 1B‑175B parameter models, surpassing PyTorch native and Megatron‑LM performance.

AI accelerationAlibaba CloudGPU optimization

0 likes · 13 min read

How PAI‑TorchAcc Supercharges Large‑Model Training on Alibaba Cloud

AntTech

Jan 9, 2024 · Artificial Intelligence

ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models

Ant Group’s newly open‑sourced ATorch library extends PyTorch with a layered architecture and automated resource‑aware strategies, boosting large‑model training efficiency up to 60% utilization, enhancing stability, and delivering significant throughput gains across multi‑node, multi‑GPU deployments.

AI accelerationLarge ModelsPyTorch

0 likes · 6 min read

ATorch: Ant Group’s Open‑Source Distributed Training Acceleration Library for Large‑Scale AI Models

Alibaba Cloud Native

Dec 30, 2023 · Artificial Intelligence

How to Accelerate Stable Diffusion with TensorRT on Alibaba Cloud ACK

This guide explains how to set up Alibaba Cloud's ACK environment, install the Cloud Native AI Suite, configure TensorRT, and run Stable Diffusion with dramatically reduced latency and memory usage, including detailed commands, performance metrics, and reproducible code snippets.

AI accelerationGPU inferenceStable Diffusion

0 likes · 7 min read

How to Accelerate Stable Diffusion with TensorRT on Alibaba Cloud ACK

Architects' Tech Alliance

Dec 23, 2023 · Artificial Intelligence

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

The article outlines the accelerating demand for high‑performance computing driven by AI, AR/VR, biotech and other workloads, examines the limits of Moore's law, and presents emerging solutions such as advanced chip architectures, chiplet integration, near‑memory/in‑memory computing, and distributed xPU‑based systems for scalable, efficient compute.

AI accelerationChipletDistributed computing

0 likes · 11 min read

Future Development Paths of Computing Power Technology (2023): Chip Architecture, Near‑Memory Computing, and Distributed xPU Systems

Architects' Tech Alliance

Sep 11, 2023 · Artificial Intelligence

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

The 2023 Open Acceleration Specification AI Server Design Guide details the hardware architecture, OAM module and UBB board specifications, cooling, management, fault diagnosis, and software platform needed to build high‑performance, scalable AI compute clusters for large‑model training.

AI accelerationOAMUBB board

0 likes · 10 min read

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

Architects' Tech Alliance

Sep 4, 2023 · Artificial Intelligence

Overview of AI Chip Types, Architectures, and Market Trends

The article explains the various AI‑capable chips such as CPUs, GPUs, FPGAs, NPUs, and TPUs, compares their performance and efficiency, describes heterogeneous CPU+xPU solutions, and provides market share data while highlighting the growing adoption of specialized AI accelerators.

AI accelerationAI chipsCPU

0 likes · 7 min read

Overview of AI Chip Types, Architectures, and Market Trends

Architects' Tech Alliance

Jul 26, 2023 · Artificial Intelligence

The Rise of Data Processing Units (DPUs): Market Trends, Architecture, and the 2023 DPU Summit

Since Nvidia's CEO highlighted DPUs as the third core chip for data centers in 2020, the technology has rapidly advanced, reshaping semiconductor markets, introducing CPU‑DPU‑GPU task partitioning, and driving a booming ecosystem highlighted by the 2023 DPU Summit in Beijing.

AI accelerationDPUData Processing Unit

0 likes · 8 min read

The Rise of Data Processing Units (DPUs): Market Trends, Architecture, and the 2023 DPU Summit

Architects' Tech Alliance

Mar 12, 2023 · Industry Insights

Who Leads the DPU Market? A Deep Dive into Global and Chinese Players

This article examines the highly concentrated DPU market, highlighting the dominant global vendors Nvidia, Broadcom and Intel, and provides a detailed analysis of emerging Chinese manufacturers—including NebulaX, Paratus, xFusion, Chiplet, and K2—covering their architectures, performance claims, and strategic positioning.

AI accelerationDPUData Processing Unit

0 likes · 12 min read

Who Leads the DPU Market? A Deep Dive into Global and Chinese Players

Architects' Tech Alliance

Feb 8, 2023 · Artificial Intelligence

Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios

This article provides a comprehensive overview of Computing‑in‑Memory technology, covering its definition, historical evolution, performance advantages over traditional von Neumann architectures, various technical classifications, storage‑media choices, market drivers, and its pivotal role in AI and big‑data workloads across edge, cloud and automotive domains.

AI accelerationBig DataMemory Architecture

0 likes · 17 min read

Computing‑in‑Memory (CiM) Technology: Concepts, History, Advantages, Classifications and Application Scenarios

Alibaba Cloud Big Data AI Platform

Sep 1, 2022 · Artificial Intelligence

How Uni‑Fold + Alibaba PAI Boost Protein Structure Prediction to 6.6k Amino Acids

DeepMind’s AlphaFold inspired Uni‑Fold, now accelerated with Alibaba Cloud’s PAI platform, can predict protein structures up to 6.6k amino acids—covering 99.992% of known sequences—delivering ten‑minute inference for SARS‑CoV‑2 spike trimers and setting new performance benchmarks for AI‑driven structural biology.

AI accelerationAlibaba PAIUni-Fold

0 likes · 7 min read

How Uni‑Fold + Alibaba PAI Boost Protein Structure Prediction to 6.6k Amino Acids

Baidu Geek Talk

Aug 31, 2022 · Artificial Intelligence

Baidu Intelligent Cloud Launches Cloud-native AI 2.0 to Accelerate AI Engineering

Baidu Intelligent Cloud’s new Cloud‑native AI 2.0 platform tackles AI engineering bottlenecks by offering hybrid‑parallel large‑model training, flexible GPU virtualization, and an AI Accelerate Kit that boosts training efficiency over 50 % and cuts inference latency up to 63 %, raising GPU utilization from ~13 % to about 50 %.

AIAI accelerationGPU virtualization

0 likes · 15 min read

Baidu Intelligent Cloud Launches Cloud-native AI 2.0 to Accelerate AI Engineering

Xiaohongshu Tech REDtech

Jul 23, 2022 · Mobile Development

Xiaohongshu Deploys On‑Device Super‑Resolution with Huawei HMS Core for High‑Quality Short Videos

Xiaohongshu, partnering with Huawei HMS Core, now runs on‑device super‑resolution for short videos, instantly upscaling 540p to 1080p and enhancing 720p content using GPU/NPU via HiAI, cutting bandwidth and stutter while keeping power use low across hundreds of Huawei devices.

AI accelerationAndroid NDKHuawei HMS Core

0 likes · 9 min read

Xiaohongshu Deploys On‑Device Super‑Resolution with Huawei HMS Core for High‑Quality Short Videos

Architects' Tech Alliance

Jul 4, 2022 · Industry Insights

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

The article provides a detailed technical analysis of NVIDIA's Hopper‑based H100 GPU, covering its 4 nm process, 800 billion transistors, GPC/TPC hierarchy, new FP8 Tensor Cores, Transformer engine, Tensor Memory Accelerator, and the resulting six‑fold performance jump over the previous A100 generation.

AI accelerationFP8GPU architecture

0 likes · 8 min read

Inside NVIDIA Hopper H100: Architecture, Performance, and AI Breakthroughs

Baidu Tech Salon

Jun 28, 2022 · Artificial Intelligence

How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends

The article presents a detailed technical review of Kunlun Chip's XPU‑R AI accelerator, covering its evolution from early FPGA prototypes to the current 7nm, 256 TOPS chip, the architectural choices that address AI workload demands, performance advantages over CPUs/GPUs, and the product ecosystem supporting diverse AI scenarios.

AI accelerationAI hardwareKunlun chip

0 likes · 20 min read

How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends

Alibaba Terminal Technology

Apr 28, 2022 · Artificial Intelligence

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

This article details the design and implementation of sparse computation in Alibaba’s MNN inference engine, covering weight sparsity techniques, block‑sparse layouts, performance benchmarks on MobileNet models versus XNNPack, and real‑world deployment cases that demonstrate significant speedups and memory savings on mobile CPUs.

AI accelerationMNNblock sparsity

0 likes · 16 min read

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

Architects' Tech Alliance

Dec 26, 2021 · Artificial Intelligence

DPU Industry Overview: Applications, Competitive Landscape, and Market Outlook

The 2021 China DPU Industry Development White Paper outlines DPU’s key use cases across data centers, cloud, security, AI, edge, storage and streaming, analyzes the competitive landscape among major vendors, and projects rapid global and Chinese market growth through 2025.

AI accelerationCloud ComputingDPU

0 likes · 12 min read

DPU Industry Overview: Applications, Competitive Landscape, and Market Outlook

Architects' Tech Alliance

May 20, 2021 · Cloud Computing

Hyper-Converged Data Center Network: Architecture, Benefits, and Future Trends

This report examines the role of data‑center networking in computational power, outlines factors driving full‑Ethernet evolution, and details the typical features, performance gains, and future directions of hyper‑converged data‑center network architectures.

AI accelerationHyper-ConvergedIPv6

0 likes · 12 min read

Hyper-Converged Data Center Network: Architecture, Benefits, and Future Trends

Alibaba Cloud Developer

Sep 25, 2019 · Artificial Intelligence

Alibaba Unveils Hanguang 800: The World's Fastest AI Chip Shattering Benchmarks

At the Hangzhou Cloud Expo, Alibaba introduced its first self‑developed AI processor, the Hanguang 800, which delivers up to 78,563 inferences per second on ResNet‑50—four times faster than leading chips—and demonstrates remarkable energy efficiency, powering internal services and upcoming AI cloud offerings.

AI ChipAI accelerationAlibaba

0 likes · 4 min read

Alibaba Unveils Hanguang 800: The World's Fastest AI Chip Shattering Benchmarks

Architects' Tech Alliance

Jun 20, 2019 · Industry Insights

Why the Open Compute Project Is Shaping the Future of Data Centers

The Open Compute Project, backed by leading tech giants, is driving open‑hardware standards, accelerating AI and edge computing, and fostering collaboration across data‑center operators, telecoms, and hardware vendors, as highlighted by the upcoming OCP China Day conference.

AI accelerationData CenterHardware Standards

0 likes · 11 min read

Why the Open Compute Project Is Shaping the Future of Data Centers

Architects' Tech Alliance

Feb 2, 2019 · Artificial Intelligence

An Overview of NVIDIA NVLink: Architecture, Topology, and Performance

This article explains NVIDIA's NVLink interconnect technology, covering its history, protocol layers, bandwidth advantages over PCIe, topologies such as the HGX-1/DGX-1 mesh, the NVSwitch extension, and performance gains for deep‑learning and high‑performance computing workloads.

AI accelerationGPU interconnectNVIDIA

0 likes · 7 min read

An Overview of NVIDIA NVLink: Architecture, Topology, and Performance

Tencent Cloud Developer

Aug 17, 2018 · Cloud Computing

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

In his 2018 Trusted Cloud Conference talk, Tencent FPGA expert Zhang Heng explained how the rapid growth of data and AI workloads drives data‑center and cloud operators to adopt FPGA acceleration for its high‑throughput, low‑latency, programmable performance, citing Tencent’s successes in image transcoding, content‑moderation, AI inference and gene‑sequencing, while outlining ecosystem challenges and future plans for scalable cloud‑FPGA services.

AI accelerationData CenterFPGA

0 likes · 18 min read

FPGA Acceleration: Exploration and Practice for Data Centers and Cloud Services

Tencent Architect

Jul 30, 2018 · Artificial Intelligence

Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record

Tencent’s intelligent machine‑learning platform achieved a world‑record by training AlexNet in 4 minutes and ResNet‑50 in 6.6 minutes on ImageNet, using large batch sizes, mixed‑precision, LARS optimization, hierarchical synchronization, gradient fusion, and pipeline I/O techniques to overcome accuracy and scalability challenges.

AI accelerationImageNetdeep learning

0 likes · 24 min read

Four‑Minute ImageNet Training: Tencent’s AI Platform Sets a New World Record