Artificial Intelligence 10 min read

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU

This article explains heterogeneous computing and compares major processing units—CPU, GPU, FPGA, ASIC, and NPU—highlighting their architectures, strengths, and typical use cases, especially in deep‑learning and AI workloads.

Architects' Tech Alliance

Mar 28, 2020

Heterogeneous Computing: Overview of CPU, GPU, FPGA, ASIC, and NPU

Heterogeneous Computing (Heterogeneous computing) refers to a system that combines processing units with different instruction sets and architectures. The evolution of computing has moved from single‑core to multi‑core and now to heterogeneous architectures; the first multi‑core processors appeared in 2006. Typical processing unit categories include CPU, GPU, DSP, ASIC, and FPGA.

1. About GPU

GPU (Graphics Processing Unit) is designed for parallel floating‑point operations. In the traditional von Neumann architecture, the CPU must fetch data from memory, decode instructions, and handle control flow for each operation, which limits performance for data‑intensive tasks such as deep learning. GPUs have many simpler cores, massive parallel pipelines, and higher memory bandwidth, making them far faster than CPUs for the massive matrix computations required by modern AI models.

In 2008, GPGPU (General‑Purpose GPU) emerged, allowing the graphics processor to perform non‑graphics computations. Because GPUs excel at SIMD (single‑instruction, multiple‑data) workloads where computation far outweighs data movement, they can dramatically outperform traditional x86 CPUs.

By 2010, heterogeneous computing that combines CPUs and GPUs became common, aiming to keep both processors busy and push overall system performance to new heights.

2. About FPGA

FPGA (Field‑Programmable Gate Array) is a semi‑custom device that contains a large number of configurable logic blocks and on‑chip memory. Users can program the interconnections to implement custom hardware pipelines, achieving high throughput and low latency. FPGA’s parallelism makes it suitable for protocol processing, data format conversion, and inference‑stage deep‑learning workloads where hardware pipelines can process one data item at a time with higher integer performance.

Compared with GPUs, FPGA lacks dedicated memory hierarchies, resulting in faster raw logic but higher design complexity. Its advantages are programmability, low power consumption, and the ability to implement custom interfaces such as high‑speed SERDES.

However, FPGA designs are more difficult for complex algorithms, and the overall performance is generally lower than ASICs fabricated with the same process.

3. About ASIC

ASIC (Application‑Specific Integrated Circuit) is a fully custom chip designed for a particular function. ASICs provide the best performance‑per‑watt and cost‑per‑unit at high volumes, but they require long development cycles and are not re‑programmable.

Examples include Google’s TPU, Cambricon’s AI chips, and Horizon’s BPU, which achieve 30‑80× speedups over CPU/GPU solutions by simplifying control logic and reducing chip area.

4. About NPU

NPU (Neural Processing Unit) mimics biological neural networks. Architectures such as IBM’s TrueNorth integrate memory, compute, and communication on a single die, eliminating the von Neumann bottleneck and enabling massive parallel spike‑based processing.

Research also explores nanoscale artificial neurons that can perform high‑speed unsupervised learning.

5. Simple Summary

CPU: General‑purpose control and arithmetic, low ALU area (~5% of chip), high clock speed, many caches, suitable for a wide range of tasks but limited parallel compute.

GPU: Specialized for floating‑point parallelism, thousands of cores, high memory bandwidth, delivers dozens to hundreds of times the compute of a CPU for suitable workloads.

FPGA: Lower clock speed but massive parallel pipelines and data‑parallelism; programmable hardware accelerates specific algorithms.

ASIC/NPU: Fixed‑function, highly optimized for AI inference or training, offering the best energy efficiency and performance for targeted applications.

Disclaimer: Thanks to the original author for the hard work. All reproduced articles are properly attributed; please contact us for any copyright concerns.

Recommended Reading: For more architecture‑related knowledge, see the “Full Collection of Architect’s Technical Materials (All)” e‑book (32 volumes).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning CPU GPU NPU heterogeneous computing FPGA ASIC

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.