PSO Optimized Design of Error Balanced Weight Stationary Systolic Array Architecture for CNN

Dantu Nandini Devi1, Gandi Ajay Kumar2, Bindu G Gowda2, Madhav Rao3
1International Institute of Information Technology Bangalore, 2international institute of information technology, bangalore, 3International Institute of Information Technology-Bangalore


The utilization of hardware-designed approximate computing in Convolutional Neural Networks (CNNs) offers notable advantages, including accelerated performance, enhanced power efficiency, and a compact design footprint. Systolic Array (SA) architectures, optimized for matrix multiplication and convolution operations, have been extensively studied in the context of stand-alone image processing applications. However, their potential for CNN workloads has not been thoroughly assessed. SAs consist of an array of Processing Elements (PEs) structured to perform product operations and accumulations. Incorporating inexact computing units in the SA introduces deviations from precise results, posing a challenge for sustaining hardware accelerator designs in CNN workloads. This paper presents a strategy for the optimal placement of both positive and negative error-distributed multipliers as PE elements to create an error-diluted SA structure. The proposed strategy to structure SA is evaluated for prewitt filter and three other filters extracted from first layer of CNN. The paper introduces an optimization framework for selecting the most suitable PEs from a pool of positive and negative error-distributed multipliers, aiming to achieve a balance between hardware efficiency and image quality metrics. Furthermore, the framework and hardware design files are made available for further usage to the designers' and researchers community.