Artificial Intelligence 9 min read

Hammer: An Integrated Hardware-Aware Model Compression Framework

Hammer is an integrated hardware-aware model compression tool developed by Kuaishou in collaboration with universities, combining pruning, quantization, search, and distillation to achieve efficient and accurate neural network models tailored to specific hardware.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Hammer: An Integrated Hardware-Aware Model Compression Framework

Hammer is an integrated hardware-aware model compression framework developed by Kuaishou in collaboration with universities like UT-Austin and Tsinghua. It addresses the critical need to compress complex, slow-running neural networks into efficient models while maintaining inference accuracy. The framework combines multiple compression strategies including pruning, quantization, neural architecture search (NAS), and distillation, while also considering hardware-specific characteristics to achieve optimal performance on different devices.

The framework abstracts neural networks as directed acyclic graphs, enabling multi-functional integrated model compression. It supports hardware-aware optimization by directly constraining inference latency on target hardware, using measured latency tables during training to ensure models meet user-specified timing requirements. This approach has proven superior to traditional FLOPs-based compression methods.

Hammer demonstrates significant advantages over existing open-source tools through its integrated approach. It supports both software and hardware integration, allowing for customized compression across different devices. The framework also includes unique capabilities like converting single-frame image models to video models for temporal processing, which is particularly valuable for video-centric applications.

Experimental results show Hammer's effectiveness across various scenarios. For instance, in ResNet56 on CIFAR10 datasets, Hammer's integrated NAS-pruning-quantization approach achieved optimal results compared to existing algorithms. In video deblurring applications, Hammer improved inference efficiency by 528%, enabling real-time video processing. The framework has been deployed across multiple Kuaishou business lines including recommendation systems, video processing, and real-time detection, delivering substantial business benefits such as 100% increase in supported materials and 49% increase in QPS in recommendation scenarios.

The development team, Kuaishou's AI Platform AutoML team, focuses on using automation to improve model accuracy, inference efficiency, and energy consumption. They maintain close collaborations with academic institutions and have successfully deployed many self-developed and cutting-edge AutoML technologies across various internal scenarios.

model compressionquantizationneural networkspruningAI FrameworkKuaishouNAShardware-aware optimization
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.