Artificial Intelligence 4 min read

High‑Efficiency Neural Network Computing Architectures and the Thinker AI Chip Family by Prof. Yin Shouyi

Prof. Yin Shouyi of Tsinghua University presented a reconfigurable, low‑bit quantized neural‑network architecture and the Thinker‑I, Thinker‑II, and Thinker‑S chips, demonstrating ultra‑low power consumption and high energy‑efficiency for AI deployment on edge devices.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
High‑Efficiency Neural Network Computing Architectures and the Thinker AI Chip Family by Prof. Yin Shouyi

In recent years, breakthroughs in deep learning have driven progress in machine vision, speech recognition, and natural language processing, sparking intense research interest and development enthusiasm. At the 2018 Hangzhou Yunqi Hardware Infrastructure Forum, Prof. Yin Shouyi from Tsinghua University shared the computing architecture for high‑efficiency neural networks, outlining applications, challenges, and solutions in AI systems.

Due to the massive memory access, storage, and compute demands of deep neural networks, power consumption has become a major obstacle for "Deploy AI Everywhere," limiting the widespread use of AI algorithms on mobile, wearable, and IoT devices.

To overcome these bottlenecks, Prof. Yin’s team conducted systematic research on low‑bit quantization methods, computing architectures, and circuit implementations for neural networks, proposing a reconfigurable architecture that supports high‑efficiency computation of low‑bitwidth networks.

They designed and developed the general‑purpose neural‑network computing chips Thinker‑I, Thinker‑II, and the speech‑recognition‑focused Thinker‑S.

Thinker‑I is fabricated in a 65 nm process, operates up to 200 MHz, and consumes 4 mW–447 mW, achieving a peak efficiency of 5.09 TOPS/W.

Thinker‑II uses a 28 nm process, runs from 20 MHz to 400 MHz, with power below 100 mW and an energy efficiency of 12 mW for face detection and recognition. It incorporates binary/ternary convolution optimizations and a hierarchical load‑balancing scheduler that improves resource utilization through a two‑level hardware‑software task dispatch.

Thinker‑S, also built on a 28 nm process, operates at 2 MHz–50 MHz with power between 0.2 mW and 5 mW, delivering a peak efficiency of 304 nJ/Frame. It features a binary convolution neural network combined with a user‑adaptive speech‑recognition framework, leveraging time‑domain data reuse, approximate computing, and weight regularization to greatly accelerate inference.

Prof. Yin noted that last year’s Turing Award winners John Hennessy and David Patterson declared that we are in a new golden age of computer architecture, interpreting AI as "AI = Architecture + Intelligence," and emphasizing that architectural innovation is the fundamental support for artificial intelligence.

low-power AIAI Hardwareneural network acceleratorreconfigurable architectureThinker chip
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.