High‑Efficiency Neural Network Computing Architectures and the Thinker AI Chip Family by Prof. Yin Shouyi
Prof. Yin Shouyi of Tsinghua University presented a reconfigurable, low‑bit quantized neural‑network architecture and the Thinker‑I, Thinker‑II, and Thinker‑S chips, demonstrating ultra‑low power consumption and high energy‑efficiency for AI deployment on edge devices.
In recent years, breakthroughs in deep learning have driven progress in machine vision, speech recognition, and natural language processing, sparking intense research interest and development enthusiasm. At the 2018 Hangzhou Yunqi Hardware Infrastructure Forum, Prof. Yin Shouyi from Tsinghua University shared the computing architecture for high‑efficiency neural networks, outlining applications, challenges, and solutions in AI systems.
Due to the massive memory access, storage, and compute demands of deep neural networks, power consumption has become a major obstacle for "Deploy AI Everywhere," limiting the widespread use of AI algorithms on mobile, wearable, and IoT devices.
To overcome these bottlenecks, Prof. Yin’s team conducted systematic research on low‑bit quantization methods, computing architectures, and circuit implementations for neural networks, proposing a reconfigurable architecture that supports high‑efficiency computation of low‑bitwidth networks.
They designed and developed the general‑purpose neural‑network computing chips Thinker‑I, Thinker‑II, and the speech‑recognition‑focused Thinker‑S.
Thinker‑I is fabricated in a 65 nm process, operates up to 200 MHz, and consumes 4 mW–447 mW, achieving a peak efficiency of 5.09 TOPS/W.
Thinker‑II uses a 28 nm process, runs from 20 MHz to 400 MHz, with power below 100 mW and an energy efficiency of 12 mW for face detection and recognition. It incorporates binary/ternary convolution optimizations and a hierarchical load‑balancing scheduler that improves resource utilization through a two‑level hardware‑software task dispatch.
Thinker‑S, also built on a 28 nm process, operates at 2 MHz–50 MHz with power between 0.2 mW and 5 mW, delivering a peak efficiency of 304 nJ/Frame. It features a binary convolution neural network combined with a user‑adaptive speech‑recognition framework, leveraging time‑domain data reuse, approximate computing, and weight regularization to greatly accelerate inference.
Prof. Yin noted that last year’s Turing Award winners John Hennessy and David Patterson declared that we are in a new golden age of computer architecture, interpreting AI as "AI = Architecture + Intelligence," and emphasizing that architectural innovation is the fundamental support for artificial intelligence.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.