Alleviating Bottlenecks for DNN Execution on GPUs via Opportunistic Computing

Xianwei Cheng1, Hui Zhao2, Mahmut Kandemir3, saraju mohanty1, Beilei Jiang2
1University of North Texas, 2UNT, 3PSU


Edge computing and IoT applications are severely constrained by the limited hardware resource. This makes memory-consuming DNN frameworks not applicable to edge computing. Simple algorithms such as direct convolution arefinding their way in embedded machine learning. As one of themost widely used platforms for DNN acceleration, GPUs facethe bottleneck of on-chip bandwidth. This work introduces aGPU DNN execution architecture that targets on relieving the on-chip bandwidth bottleneck by reducing data movement through opportunistic computing. We first investigate data access patterns in the hardware view rather than the software view. Then we propose two opportunistic computing techniques to predictably perform computation when data is available with the help ofassistant warps. By moving computation to data, our techniquesare able to significantly reduce data movement and relieve theDNN execution bottleneck. Our evaluation results show that the proposed technique can improve DNN application performance as much as 55%.