Compressing CNNs by Exponent Sharing in Weights using IEEE Single Precision Format

Prachi Kashikar and Sharad Sinha
Indian Institute of Technology Goa


Abstract

Computer vision and speech recognition applications have shown performances exceeding human accuracy levels on datasets like ResNet. This success is contributed by Convolutional Neural Networks (CNN) in Deep Learning. As the depth of CNN increases the model size, performance, power, and in turn, the cost is affected. On high-end computers, these parameters can be optimized for best performance, but on the edge devices, the power and memory budgets are very stringent. To address these bottlenecks in deploying Computer vision applications on hardware, many compression techniques exist but all having some impact on accuracy. We come up with a novel approach of model compression without having any accuracy loss to export these applications on embedded platforms. The weights in CNN contribute maximum to the size of models. Taking the base of the IEEE floating-point format standard, we propose a method to share exponents in weights by using common referencing. We demonstrate our technique on five different trained models resulting in nearly 10% compression in storage and requiring less than 1.5 times the original execution time.