Runtime Long-Term Reliability Management Using Stochastic Computing in Deep Neural Networks

Yibo Liu1, Shuyuan Yu2, Shaoyi Peng2, Sheldon Tan3
1University of Califronia, Riverside, 2University of California, Riverside, 3University of California at Riverside


Abstract

In this paper, we propose a new dynamic reliability technique using an accuracy-reconfigurable stochastic computing (ARSC) framework for deep learning computing. Unlike the conventional stochastic computing that conducts design time accuracy power/energy trade-off, the new ARSC design can adjust the bit-width of the data in run time. Hence, the ARSC can mitigate the long-term aging effects by slowing the system clock frequency, while maintaining the inference throughput by reducing the data bit-width at a small cost of accuracy. We show how to implement the recently proposed counter-based SC multiplication and bit-width reduction on a layer-wise quantization scheme for CNN networks with dynamic fixed-point data. We validate an ARSC-based five-layer convolutional neural network designs for the MNIST dataset based on Vivado HLS with constraints from Xilinx Zynq-7000 family xc7z045 platform. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the pre-aging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to the frequency scaling, which can further improve system reliability due to the reduced temperature. Experimental results show that new ARSC DNN can sufficiently compensate the NBTI induced aging effects in 10 years with marginal classification accuracy loss while maintaining or even exceeding the preaging computing throughput. At the same time, the proposed ARSC computing framework also reduces the active power consumption due to large frequency scaling, which can further improve system reliability due to the reduced temperature.