Artificial Intelligence 14 min read

AI DSA: Architecture Features, Industry Trends, and Software Stack Challenges

The article summarizes Dr. Tang Shan's presentation on AI domain‑specific architectures, covering their background, the explosion of diverse AI hardware designs, and the significant software‑stack challenges that arise from fragmented tools and the need for full‑stack solutions.

DataFunTalk
DataFunTalk
DataFunTalk
AI DSA: Architecture Features, Industry Trends, and Software Stack Challenges

The talk by Dr. Tang Shan at the DataFunSummit AI Foundations Software Architecture Summit introduced the background and industry status of AI DSA (Domain‑Specific Architecture), explaining how the rapid growth of AI models such as GPT‑3 and AlphaFold has created an exponential increase in compute demand that general‑purpose processors can no longer satisfy.

Two main problems were highlighted: (1) the widening gap between computational needs and the limited performance gains of traditional CPUs due to the slowdown of Moore's Law, and (2) the diverse and complex AI workloads across cloud, edge, and device scenarios, which require different performance, precision, and cost trade‑offs.

Consequently, a variety of AI‑specific hardware has emerged. Traditional architectures augmented with DSA features (e.g., GPUs with Tensor Cores) coexist with truly domain‑specific designs such as Google’s TPU, NVIDIA’s GPU Tensor Core, Huawei’s DaVinci architecture, smartphone NPUs, Tesla’s FSD chip, Graphcore’s IPU, Cerebras’s wafer‑scale engine, CGRA‑based designs, and Tesla’s Dojo D1.

The software stack for AI DSA faces three major challenges:

Designing new architectures requires full‑stack support, including new instruction sets, programming models, compilers, libraries, and verification methodologies.

The ecosystem is highly fragmented: different vendors provide divergent hardware features and software implementations, leading to duplicated effort and difficulty in porting frameworks.

Ideal layered abstractions (a clean “cheese” model) are hard to achieve in practice; instead, developers often have to “poke holes” across layers, especially for critical operations like GEMM, memory management, and synchronization, which complicates optimization.

Efforts to mitigate fragmentation include developing common intermediate representations (IR) and unified interfaces to reduce the gap between hardware diversity and software portability.

In summary, AI DSA arose because general‑purpose chips cannot meet modern AI compute patterns; successful AI DSA requires a complete hardware‑software stack, multiple architectures will likely coexist for the foreseeable future, and robust software support is essential for realizing hardware innovations.

Finally, the speaker thanked the audience and invited further engagement.

AIHardwareheterogeneous computingsoftware stackchip architectureDSA
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.