BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency

Yixing Li and Fengbo Ren
Arizona State University


Abstract

A binary neural network (BNN) is a compact form of neural network. Both the weights and activations in BNNs can be binary values, which leads to a significant reduction in both parameter size and computational complexity compared to its full-precision counterpart. Such reductions can directly translate into reduced memory footprint and computation cost in hardware, making BNNs highly suitable for a wide range of hardware accelerators. However, it is unclear whether and how a BNN can be further pruned for ultimate compactness. As both 0s and 1s are non-trivial in BNNs, it is not proper to adopt any existing pruning method that interprets 0s as trivial for full-precision networks. In this paper, we present a pruning method tailored to BNNs and illustrate that BNNs can be further pruned by using weight flipping frequency as an informative indicator. The experiments performed on a 9-layer Network-in-Network (NIN) and the AlexNet with CIFAR-10 dataset show that, the proposed BNN pruning method can achieve 20-40% reduction in binary operations with 0.5-1.0% accuracy drop, which leads to a 15-40% runtime speedup on a TitanX GPU.