HardCompress: A Novel Hardware-based Low-power Compression Scheme for DNN Accelerators

Ayush Arunachalam1, Shamik Kundu1, ARNAB RAHA2, Suvadeep Banerjee3, Suriya Natarajan2, Kanad Basu1
1University of Texas at Dallas, 2Intel Corporation, 3Intel Labs, Intel


Abstract

The ever-increasing computing requirements of Deep Neural Networks (DNNs) have accentuated the deployment of such networks on hardware accelerators. Inference execution of large DNNs often manifests as an energy bottleneck in such accelerators, especially when used in resource-constrained Internet-of-Things (IoT) edge devices. This can be primarily attributed to the massive energy incurred in accessing millions of trained parameters stored in the on-chip memory, as demonstrated in existing research. To address this challenge, we propose HardCompress, which, to the best of our knowledge, is the first compression solution pertaining to commercial DNN accelerators. The three-step approach involves hardware-based post-quantization trimming of weights, followed by dictionary-based compression of the weights and subsequent decompression by a low-power hardware engine during inference in the accelerator. The efficiency of our proposed approach is evaluated on both lightweight networks trained on MNIST dataset and large DNNs trained on ImageNet dataset. Our results demonstrate that HardCompress, without any loss in accuracy on large DNNs, furnishes a maximum compression of 99.27%, equivalent to 137x reduction in memory footprint in the systolic array-based DNN accelerator.