Tagged articles

44 articles

Page 1 of 1

May 30, 2026 · Artificial Intelligence

Why Do GPUs, FPGAs, and Carbon‑Based Brains Take Different Paths Under the Same Physical Laws?

Reiner Pope explains how the physical limits of logic gates, data movement costs, low‑precision arithmetic scaling, and clock synchronization shape the divergent architectures of GPUs, FPGAs, and emerging carbon‑based AI processors.

AI acceleratorshardware architecturelogic gates

0 likes · 8 min read

Why Do GPUs, FPGAs, and Carbon‑Based Brains Take Different Paths Under the Same Physical Laws?

Liangxu Linux

May 10, 2026 · Fundamentals

SOC vs MCU in Embedded Devices: Key Differences Explained

The article compares SOC and MCU for embedded systems, using analogies, performance and power benchmarks, development ecosystem contrasts, and cost considerations to show how each fits different application requirements and why choosing the right one matters.

MCUSOCcost analysis

0 likes · 6 min read

SOC vs MCU in Embedded Devices: Key Differences Explained

Architects' Tech Alliance

Apr 21, 2026 · Industry Insights

Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency

The article dissects how CXL emerged to address three fundamental data‑center bottlenecks—memory wall, resource islands, and cache‑incoherence—traces its technical evolution, compares the divergent strategies of Intel, AMD, Nvidia, Google, Alibaba Cloud, and Huawei, and evaluates CXL’s challenges, opportunities, and future ecosystem.

AI hardwareCXLData Center

0 likes · 29 min read

Why CXL Is the Only Interconnect That Can Solve the Memory Wall, Resource Islands, and Cache Inconsistency

AI Frontier Lectures

Jan 12, 2026 · Industry Insights

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

The article analyses the challenges of large‑language‑model inference, highlighting memory bandwidth and interconnect as the primary bottlenecks, and presents four research opportunities—high‑bandwidth flash, processing‑near‑memory, 3D memory‑logic stacking, and low‑latency interconnect—while evaluating current Nvidia solutions and proposing integrated architectural approaches.

3D stackingAI hardware researchLLM inference

0 likes · 22 min read

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

Architects' Tech Alliance

Sep 24, 2025 · Artificial Intelligence

How Huawei’s Ascend AI Chip Roadmap and Supernode Strategy Aim to Challenge Nvidia

At the 2025 Connect conference Huawei unveiled its Ascend AI chip roadmap and supernode strategy, detailing architectural innovations, ultra‑high‑bandwidth interconnects, open ecosystem initiatives and performance gains that together aim to rival Nvidia’s dominance in AI compute.

AI chipsHuaweihardware architecture

0 likes · 10 min read

How Huawei’s Ascend AI Chip Roadmap and Supernode Strategy Aim to Challenge Nvidia

Architects' Tech Alliance

Jul 9, 2025 · Fundamentals

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

HBM5 introduces a 3D near‑memory computing architecture that vertically stacks DRAM dies and integrates compute units within the memory stack, dramatically boosting bandwidth, reducing data‑movement power, and delivering significant performance and energy‑efficiency gains for AI, high‑performance computing, and data‑center workloads.

AI accelerationHBM5Near-Memory Computing

0 likes · 8 min read

How HBM5’s 3D Near‑Memory Architecture Revolutionizes AI and HPC Performance

Tencent Cloud Developer

Jul 8, 2025 · Artificial Intelligence

How GPUs Power AI: From Graphics to GPGPU Explained

This article explores how GPUs evolved from graphics accelerators to general‑purpose processors for AI, detailing the CPU‑GPU heterogeneous architecture, the CUDA programming workflow, compilation into fat binaries, kernel launch mechanics, hardware components, and the differences between SIMD and SIMT models, with performance comparisons and code examples.

AICUDAGPGPU

0 likes · 31 min read

How GPUs Power AI: From Graphics to GPGPU Explained

AI Cyberspace

May 20, 2025 · Artificial Intelligence

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

This article explains the scaling laws behind large language models, the explosive growth of model sizes and compute demands, and why modern AI infrastructure must adopt SuperNode and SuperPOD architectures that combine high‑bandwidth Scale‑Up networks with flexible Scale‑Out networking to overcome bandwidth, latency, and power challenges.

AI scalingSuperPoDdistributed training

0 likes · 42 min read

Why SuperNode and SuperPOD Are Critical for Scaling AI Models

Architects' Tech Alliance

Apr 23, 2025 · Artificial Intelligence

What Makes Huawei’s Ascend 920 AI Chip a Game-Changer? Deep Technical Breakdown

An in‑depth analysis of Huawei’s third‑generation Ascend 920 AI processor reveals its 6 nm process, 64 Da Vinci cores, advanced Cube Unit matrix engine, HBM‑PIM memory‑compute integration, high‑speed interconnects, performance benchmarks versus Nvidia H20, and the challenges and future directions for AI hardware.

AI chip industryAI processorHuawei Ascend 920

0 likes · 11 min read

What Makes Huawei’s Ascend 920 AI Chip a Game-Changer? Deep Technical Breakdown

Architects' Tech Alliance

Mar 30, 2025 · Industry Insights

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

The article analyzes the rapid growth of AI model memory and compute demands, the slow increase of chip memory capacity, and argues that memory bandwidth and energy consumption, rather than raw compute, will dominate AI chip design, emphasizing multi‑tenancy, DSA flexibility, and data‑flow optimization.

AI chipsDSAMemory Bandwidth

0 likes · 7 min read

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

Cognitive Technology Team

Mar 25, 2025 · Fundamentals

Understanding the Java Memory Model and Its Interaction with Hardware Memory Architecture

This article explains how the Java Memory Model defines the interaction between threads, thread stacks, and the heap, illustrates these concepts with diagrams and example code, and discusses how modern hardware memory architecture, caches, and CPU registers affect visibility and race conditions in concurrent Java programs.

ConcurrencyHeapJava

0 likes · 11 min read

Understanding the Java Memory Model and Its Interaction with Hardware Memory Architecture

Tencent Technical Engineering

Mar 21, 2025 · Fundamentals

Fundamentals of GPU Architecture and Programming

The article explains GPU fundamentals—from the end of Dennard scaling and why GPUs excel in parallel throughput, through CUDA programming basics like the SAXPY kernel and SIMT versus SIMD execution, to the evolution of the SIMT stack, modern scheduling, and a three‑step core architecture design.

CUDAGPUGPU programming

0 likes · 42 min read

Fundamentals of GPU Architecture and Programming

Baidu Geek Talk

Mar 5, 2025 · Cloud Computing

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

This article provides a comprehensive technical overview of GPU cloud server design, covering data‑processing pipelines, hardware topology, NUMA considerations, PCIe and proprietary interconnects, multi‑GPU communication strategies, virtualization approaches (BCC and BBC), DPU acceleration, and future trends for scaling up and out.

Cloud ComputingGPUPerformance Optimization

0 likes · 27 min read

Inside GPU Cloud Servers: Architecture, Interconnects, and Performance Secrets

Python Programming Learning Circle

Jan 6, 2025 · Fundamentals

Beyond Moore's Law: Software, Algorithms, and Architecture as New Performance Drivers

The article examines how, as Moore's Law ends, performance gains will increasingly rely on software optimization, algorithmic advances, and hardware architecture innovations, illustrated by matrix multiplication benchmarks and discussions of Dennard scaling, parallelism, and emerging technologies.

Moore's Lawhardware architectureperformance engineering

0 likes · 10 min read

Beyond Moore's Law: Software, Algorithms, and Architecture as New Performance Drivers

Architects' Tech Alliance

Jan 6, 2025 · Industry Insights

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

The article provides a detailed technical analysis of Nvidia’s new GB300 and B300 GPUs, comparing their performance, memory architecture, and power consumption to previous generations, and examines how these changes affect AI inference workloads, NVL72 accelerator systems, and the supply‑chain strategies of major cloud providers.

AI inferenceCloud ComputingGPU

0 likes · 12 min read

How Nvidia’s GB300 GPU Is Shaping AI Inference and Cloud Supply Chains

Architects' Tech Alliance

Oct 19, 2024 · Industry Insights

What Is an NPU and Why It’s Shaping the Future of AI PCs

The article explains what Neural Processing Units (NPUs) are, how they differ from CPUs and GPUs, their parallel architecture, the workloads they accelerate, their role in edge AI and AI‑enabled PCs, and why industry analysts expect NPU‑enabled devices to dominate the market by 2026.

AI PCAI acceleratorNPU

0 likes · 8 min read

What Is an NPU and Why It’s Shaping the Future of AI PCs

Architects' Tech Alliance

Sep 25, 2024 · Industry Insights

Why Modern Servers Matter: Architecture, Types, and the X86 vs ARM Battle

This article provides a comprehensive overview of servers, explaining what they are, their hardware components, classification by form factor, instruction set and processor count, and examines the market dynamics of X86 versus ARM architectures, supported by recent industry data and cost analyses.

ARMData Centerhardware architecture

0 likes · 15 min read

Why Modern Servers Matter: Architecture, Types, and the X86 vs ARM Battle

Architects' Tech Alliance

Aug 5, 2024 · Industry Insights

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

This article examines the AI compute chip ecosystem, covering GPU, FPGA, and ASIC technologies, market share trends, key performance metrics such as TOPS, power and die area, and provides a detailed overview of major global and Chinese vendors and their flagship products.

AI computeASICChinese AI chips

0 likes · 12 min read

What Drives the AI Compute Chip Market? GPUs, ASICs, and the Rise of Chinese Players

Refining Core Development Skills

Jun 14, 2024 · Fundamentals

Why Server Memory Modules Have More Chips Than Desktop Memory

The article explains that server memory modules contain more chips because they need ECC error‑correction, additional register and data buffer chips for RDIMM/LRDIMM designs, which increase chip count, improve signal integrity, and allow larger capacities.

ECCLRDIMMRDIMM

0 likes · 9 min read

Why Server Memory Modules Have More Chips Than Desktop Memory

Liangxu Linux

May 29, 2024 · Fundamentals

Understanding Linux Kernel Memory Management: Architecture, Address Spaces, and Allocation Strategies

This article provides a comprehensive overview of the Linux kernel memory management subsystem, covering its hardware architecture, address‑space layout, various memory zones, and the software structures for page and object allocation and reclamation.

Linuxaddress spacehardware architecture

0 likes · 8 min read

Understanding Linux Kernel Memory Management: Architecture, Address Spaces, and Allocation Strategies

Python Programming Learning Circle

May 28, 2024 · Fundamentals

Beyond Moore's Law: Leveraging Software, Algorithms, and Architecture for Future Performance Gains

With Moore's Law reaching its limits, a recent Science paper by MIT, Nvidia, and Microsoft researchers argues that future computing performance will rely on improvements in the software stack, algorithmic innovations, and hardware architecture, as demonstrated by performance engineering benchmarks and evolving hardware trends.

AlgorithmsMoore's LawPost-Moore Era

0 likes · 9 min read

Beyond Moore's Law: Leveraging Software, Algorithms, and Architecture for Future Performance Gains

Architects' Tech Alliance

May 1, 2024 · Industry Insights

How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance

The rapid growth of AI models is widening the gap between compute power and memory bandwidth, but the emerging Compute Express Link (CXL) interconnect offers lower latency, memory sharing, and flexible device topologies that can alleviate the memory‑wall bottleneck and reshape future data‑center architectures.

AI computeCXLData Center

0 likes · 10 min read

How CXL Can Break the AI Memory Wall and Boost Data‑Center Performance

DevOps Operations Practice

Apr 29, 2024 · Fundamentals

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

This article explains the basic functions of CPUs and GPUs, their advanced capabilities and real‑world applications, and compares their architectures, processing models, and roles in environments such as IoT, mobile devices, Kubernetes, and AI workloads.

AI accelerationCPUGPU

0 likes · 7 min read

Introduction to CPUs and GPUs: Functions, Advanced Features, and Key Differences

Linux Code Review Hub

Apr 8, 2024 · Fundamentals

Understanding Memory Semantics: Definitions, IB vs CXL, and Common Confusions

The article explains what memory semantics means in modern data‑center contexts, compares Infiniband and CXL definitions, clarifies load/store and DMA operations, and highlights the distinction between memory space and memory access with concrete examples and references.

CXLDMAInfiniBand

0 likes · 11 min read

Understanding Memory Semantics: Definitions, IB vs CXL, and Common Confusions

Architects' Tech Alliance

Mar 18, 2024 · Industry Insights

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

The article provides an in‑depth technical analysis of Nvidia’s NVLink C2C interconnect, comparing its latency, bandwidth, power efficiency, density and cost against traditional SerDes solutions and examining its role in building SuperChip architectures with Grace CPUs and Hopper GPUs.

GPUNVLinkcost analysis

0 likes · 12 min read

Why Nvidia’s NVLink C2C Is Redefining GPU‑CPU Interconnects

Architects' Tech Alliance

Feb 22, 2024 · Industry Insights

How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC

From traditional NICs to smart NICs, FPGA‑based DPUs and single‑chip DPU SoCs, this article analyzes the evolution of network adapters, their hardware capabilities, design challenges, and real‑world deployments by cloud providers such as AWS, Nvidia, Intel, Alibaba Cloud and Volcano Engine.

Cloud ComputingDPUData Center

0 likes · 16 min read

How DPU Technology is Transforming Cloud Data Centers: From NICs to SoC

Architects' Tech Alliance

Jan 7, 2024 · Industry Insights

Why Integrated Chiplet Architecture Is Shaping the Future of Semiconductors

The article explains the concept of integrated chips and chiplets, describes their architecture, the role of silicon interposers, outlines three main performance‑boosting pathways—scaling, new device materials, and chiplet integration— and highlights recent industry examples and standards that illustrate the emerging paradigm.

ChipletIntegrated Chiphardware architecture

0 likes · 13 min read

Why Integrated Chiplet Architecture Is Shaping the Future of Semiconductors

Architects' Tech Alliance

Oct 2, 2023 · Fundamentals

Resource‑Decoupled Data Center Architecture and Emerging Technologies (DPU, IPU, CXL)

The article explains the limitations of traditional server‑centric data centers, introduces resource‑decoupled architectures that separate compute, storage, and networking resources, and reviews key enabling technologies such as DPUs, IPUs, and the CXL interconnect, highlighting their roles in modern cloud and AI workloads.

CXLDPUIPU

0 likes · 11 min read

Resource‑Decoupled Data Center Architecture and Emerging Technologies (DPU, IPU, CXL)

Architects' Tech Alliance

Sep 11, 2023 · Artificial Intelligence

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

The 2023 Open Acceleration Specification AI Server Design Guide details the hardware architecture, OAM module and UBB board specifications, cooling, management, fault diagnosis, and software platform needed to build high‑performance, scalable AI compute clusters for large‑model training.

AI accelerationOAMUBB board

0 likes · 10 min read

Open Acceleration Specification AI Server Design Guide (2023): Architecture, OAM Modules, UBB Board, and System Design

Architects' Tech Alliance

Aug 14, 2023 · Fundamentals

How Many PCBs Does an AI Server Use? Detailed Breakdown of NVIDIA DGX A100

This report dissects the NVIDIA DGX A100 AI server, quantifying the PCB area and monetary value of its five hardware sections—GPU board, CPU motherboard, fans, storage, and power—revealing a total PCB consumption of 1.474 m² worth ¥15,321 per machine.

AI serverNVIDIA DGX A100PCB analysis

0 likes · 11 min read

How Many PCBs Does an AI Server Use? Detailed Breakdown of NVIDIA DGX A100

Architects' Tech Alliance

Jul 29, 2023 · Artificial Intelligence

AI Server Market Overview and Technical Architecture

The article provides a comprehensive analysis of the AI server market, detailing server hardware components, cost distribution, logical architecture, firmware, rapid market growth, competitive landscape, AI-driven heterogeneous computing, and future industry trends, while highlighting key vendors and deployment configurations.

AI serversCloud providersGPU

0 likes · 10 min read

AI Server Market Overview and Technical Architecture

Liangxu Linux

Jul 5, 2023 · Fundamentals

Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent

Modern CPUs use multi‑level cache memory to bridge the speed gap with main memory, relying on temporal and spatial locality, and employ the MESI protocol with states M, E, S, I to maintain coherence across cores, while techniques like store buffers and memory barriers mitigate latency and ordering issues.

CPUCache MemoryMESI

0 likes · 15 min read

Why CPUs Need Cache Memory and How the MESI Protocol Keeps It Consistent

Architects' Tech Alliance

May 5, 2023 · Industry Insights

Why AI ASICs Are Poised to Dominate the Future of AI Hardware

The article analyzes how leading vendors such as Google, Intel, IBM, Samsung, Nvidia and AMD are racing to develop AI ASICs, compares their architectures and performance, and projects a rapid rise in ASIC market share for both data‑center and edge AI workloads by 2025.

AI ASICGaudiPerformance Benchmark

0 likes · 13 min read

Why AI ASICs Are Poised to Dominate the Future of AI Hardware

Architects' Tech Alliance

Apr 30, 2023 · Industry Insights

Why Data Centers Need DPU: Comparing CPUs, GPUs, and Data Processing Units

The article explains how DPUs, as low‑power, high‑efficiency data‑processing units, complement CPUs and GPUs in modern data centers, reducing total cost of ownership while handling data movement, security, and analytics tasks more effectively than traditional processors.

CPUDPUData Center

0 likes · 9 min read

Why Data Centers Need DPU: Comparing CPUs, GPUs, and Data Processing Units

Architects' Tech Alliance

Jan 9, 2023 · Fundamentals

GPU Overview: Principles, Use Cases, Limitations, and Market Landscape

This article explains GPU fundamentals, describing its role as a graphics‑oriented co‑processor, the reasons for using GPUs and other accelerators, the tasks they excel at and those they cannot handle, and outlines current market trends and architectural trade‑offs.

GPUco‑processorhardware architecture

0 likes · 9 min read

GPU Overview: Principles, Use Cases, Limitations, and Market Landscape

Architects' Tech Alliance

Dec 8, 2022 · Fundamentals

How Does a Host Discover and Access PCIe Devices? A Step‑by‑Step Walkthrough

This article explains the PCIe architecture, the host's depth‑first enumeration process across buses, bridges and endpoints, demonstrates Linux lspci commands for inspecting devices, and details how PCIe memory accesses enable NVMe command submission and completion.

Device EnumerationLinuxMemory Access

0 likes · 12 min read

How Does a Host Discover and Access PCIe Devices? A Step‑by‑Step Walkthrough

Architects' Tech Alliance

Sep 28, 2022 · Fundamentals

Comprehensive Overview of Server Architecture, Industry Chain, and Market Trends (2022)

This article provides a detailed analysis of server hardware architectures, industry supply chain, cost structures, market share, and emerging CPU trends such as X86 dominance and ARM growth, while also offering downloadable resources and insights into China's domestic substitution policies.

CPUhardware architectureindustry trends

0 likes · 11 min read

Comprehensive Overview of Server Architecture, Industry Chain, and Market Trends (2022)

Baidu Tech Salon

Jul 4, 2022 · Artificial Intelligence

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview

Kunlun Chip’s XPU‑R architecture combines high‑performance SDNN and Cluster compute units, 512 GB/s GDDR6 memory, and PCIe 4.0 interconnect, supported by an LLVM‑based software stack, CUDA‑like programming model, and seamless PaddlePaddle integration, enabling efficient AI training and inference with significant cost and performance gains.

AI ChipPaddlePaddleProgramming Model

0 likes · 16 min read

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview

Architects' Tech Alliance

May 31, 2022 · Fundamentals

AMD’s Next‑Gen Navi 31 GPU Is Likely a Single‑Chip Design, Not a Multi‑Chiplet Monster

Recent analysis suggests that AMD’s upcoming top‑tier RDNA 3 GPU, the Navi 31, will abandon the rumored multi‑chiplet architecture in favor of a single, powerful compute die, reducing shader count and TFLOP ratings while still promising strong performance for gaming and data‑center workloads.

AMDGPUGaming

0 likes · 7 min read

AMD’s Next‑Gen Navi 31 GPU Is Likely a Single‑Chip Design, Not a Multi‑Chiplet Monster

Architects' Tech Alliance

Aug 4, 2021 · Cloud Computing

Edge Computing Hardware Architecture and Emerging Trends

The article examines edge computing hardware architecture, discussing diverse use cases, evolving server and processor trends—including ARM, Intel, Nvidia, AMD, FPGA, and DPU—open hardware standards, reliability, virtual networking, and storage innovations, highlighting how these developments shape the future of cloud and edge infrastructures.

ARMDPUFPGA

0 likes · 16 min read

Edge Computing Hardware Architecture and Emerging Trends

Liangxu Linux

Nov 4, 2020 · Fundamentals

How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?

This article explains the storage hierarchy from CPU registers and caches to RAM, SSD, and HDD, quantifies their speed differences (L1 cache vs. memory, SSD, HDD) and cost ratios, and provides Linux commands to inspect cache sizes, helping readers understand why each level exists and how they interact.

CPU cacheHDDSSD

0 likes · 14 min read

How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?

Architects' Tech Alliance

Oct 23, 2020 · Industry Insights

What Makes a SmartNIC Different from Traditional NICs? A Deep Dive into Leading Products

The article defines SmartNICs, outlines their key capabilities such as off‑loading processing to programmable hardware, compares major vendor implementations—including Broadcom, Nvidia/Mellanox, Intel, Xilinx, Netronome, and Pensando—and discusses market trends that position SmartNICs as the next wave of FPGA‑based acceleration for data‑center workloads.

Data CenterFPGANetwork Acceleration

0 likes · 14 min read

What Makes a SmartNIC Different from Traditional NICs? A Deep Dive into Leading Products

Architects' Tech Alliance

Apr 27, 2019 · Fundamentals

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

This article explains the fundamental differences between CPUs and GPUs, outlines key GPU specifications such as CUDA cores, memory capacity, bandwidth, and floating‑point precision, and reviews NVIDIA's major GPU series and architectural evolution for high‑performance and AI workloads.

CPUGPUNVIDIA

0 likes · 11 min read

Why GPUs Outperform CPUs: Core Parameters and Architecture Explained

Qunar Tech Salon

Mar 31, 2015 · Fundamentals

Understanding CPU Caches, Coherency Protocols, and Memory Models

This article provides a concise introduction to CPU cache architecture, explains read/write policies, describes cache coherency protocols such as MESI and its variants, and discusses how different memory models affect multi‑core consistency and performance.

CPU cacheCache CoherencyMESI Protocol

0 likes · 19 min read

Understanding CPU Caches, Coherency Protocols, and Memory Models