Dynamic Reconfiguration of CNNs for Input-Dependent Approximation

Maedeh Hemmat1 and Azadeh Davoodi2
1University of Wisconsin-Madison, 2University of Wisconsin, Madison


Abstract

In this work, we propose a novel framework which enables dynamic reconfiguration of an already-trained Convolutional Neural Network (CNN) in hardware during inference. The reconfiguration is for input-dependent approximation of the CNN to achieve power saving without much degradation in its classification accuracy. For each input, our framework uses only a fraction of the CNN’s edge weights based on that input (with the rest remaining 0) to conduct the inference. Consequently, power saving is possible due to the fewer number of fetches from off-chip memory as well as fewer multiplications for majority of the inputs. More specifically, we propose a clustering algorithm which groups similar weights in the CNN based on their importance, and gives an iterative framework which decides how many clusters of weights should be fetched from off-chip memory for each individual input. We also propose new hardware structures to implement our framework on top of a recently-proposed FPGA-based CNN accelerator. In our experiments with popular CNNs, we show significant power saving with almost no degradation in classification accuracy due to doing inference with only a fraction of the edge weights for the majority of the inputs.