Artificial Intelligence 14 min read

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Chips

The article reviews the two main development routes for artificial intelligence chips—traditional architectures that enhance CPU, GPU, and FPGA performance and emerging neuromorphic designs like IBM TrueNorth—detailing their structures, advantages, limitations, and industry adoption.

Architects' Tech Alliance

Dec 11, 2018

Overview of AI Chip Development Paths: CPU, GPU, FPGA, ASIC, and Neuromorphic Chips

Artificial intelligence chips currently follow two development paths: one continues the traditional computing architecture, accelerating hardware performance with three main chip types—GPU, FPGA, and ASIC—while CPUs remain indispensable; the other abandons the classic von Neumann model in favor of brain‑inspired neuromorphic structures, exemplified by IBM TrueNorth.

1. Traditional CPU

Since the 1960s, CPUs have evolved dramatically in form and design, yet their fundamental operation—comprising a controller and an arithmetic‑logic unit (ALU)—has remained stable. The internal structure is shown in the diagram below.

Traditional CPU internal structure (ALU module)

The ALU is the sole component that performs data calculations; other modules mainly ensure sequential instruction execution. While increasing CPU clock frequency can boost performance, deep‑learning workloads demand massive parallel data processing, which traditional CPUs struggle to provide, especially under power‑consumption constraints.

2. Parallel‑Accelerating GPU

GPUs were the first processors designed for parallel acceleration, offering higher speed than CPUs and more flexible programming than other accelerators.

Unlike CPUs, which follow a serial instruction flow, GPUs feature a highly parallel architecture with many ALUs, making them well‑suited for dense data processing. The structural comparison between CPU and GPU is illustrated below.

CPU and GPU structural comparison

GPU‑based programs can run tens to thousands of times faster than single‑core CPUs. The evolution of GPUs can be divided into three stages: pre‑1999 GPUs (geometry engines), 1999‑2005 GPUs (introducing programmable pipelines and limited programmability), and post‑2006 GPUs (supporting full programming environments such as CUDA and OpenCL).

Today, major companies like Google, Facebook, Microsoft, Twitter, and Baidu use GPUs for image, video, and audio analysis, as well as autonomous driving and VR/AR applications. However, GPUs excel at training deep‑learning models but are less efficient for inference on single inputs.

3. Semi‑Custom FPGA

FPGA evolved from programmable devices such as PAL, GAL, and CPLD. Users can load configuration files to define logic gates and interconnections, allowing the same FPGA to serve as a microcontroller, audio codec, or other specialized hardware.

FPGA can perform both data‑parallel and task‑parallel computation, often completing specific operations in a single clock cycle, which would require many cycles on a CPU.

Because FPGA logic is fixed after configuration, it eliminates the need for instruction decoding and shared memory, dramatically reducing power consumption per operation. Its flexibility also enables implementation of hardware control techniques that are difficult for CPUs or ASICs.

FPGA applications in AI

4. Fully‑Custom ASIC

Current AI workloads often rely on existing parallel chips (GPU, FPGA) to avoid the high cost and risk of designing dedicated ASICs. However, these general‑purpose chips have inherent performance and power limitations for deep‑learning tasks.

GPU limitations include: (1) insufficient parallel advantage during inference, (2) fixed hardware architecture that cannot adapt to evolving algorithms, and (3) lower energy efficiency compared to FPGA.

FPGA drawbacks include limited per‑unit compute capability, lower proportion of compute resources due to extensive routing, and higher cost at large scale.

As AI algorithms mature and the ASIC ecosystem develops, fully custom AI ASICs are emerging, offering optimal performance, power, and area for deep‑learning workloads. Representative companies and their chip projects are shown below.

Overview of AI‑specific chip development

When deep‑learning models stabilize, ASIC design can achieve full customization, delivering the best trade‑offs for performance, power, and silicon area.

5. Neuromorphic (Brain‑Inspired) Chips

Neuromorphic chips abandon the von Neumann architecture and adopt a brain‑like structure; IBM TrueNorth is a representative example. IBM treats storage cells as synapses, compute units as neurons, and interconnects as axons.

TrueNorth is fabricated with a 28 nm low‑power process, contains 5.4 billion transistors, and integrates 4 096 neural cores, consuming only 70 mW. It uses phase‑change memory (PCM) as experimental, non‑volatile synaptic elements compatible with CMOS.

Source: Tsinghua University 2018 AI Chip Research Report.

For the full report, follow the public account and reply with “2018AI芯片报告”.

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.