Performance-Aware Design of Approximate Integrated MAC Factored Systolic Array Accelerators

Dantu Nandini Devi1, Gandi Ajay Kumar2, Bindu G Gowda2, Madhav Rao3
1International Institute of Information Technology Bangalore, 2international institute of information technology, bangalore, 3International Institute of Information Technology-Bangalore


Approximate Computing has gained traction owing to its compounded benefits achieved in hardware design parameters, without disturbing the overall outcome inferred from the error resilient image processing result. Systolic Array (SA) architecture is one of the core hardware accelerator units targeted for running General matrix multiplications (GEMM) including the convolution based image processing. Typically, approximate computing units incorporated in these SA units are expected to stretch the result away from the error-free ones. This paper presents three different categories of approximate integrated MAC (AIMAC) factored SA configurations and is benchmarked against 8 existing categories of state of the art (SOTA) approximate SA designs. The trade-off between hardware metrics over accuracy was found to be tighter in the proposed AIMAC SA designs over the existing SOTA methods. The proposed AIMAC SA architecture shows maximum design footprint cost cut by 21.53%, highest power savings of 13.84%, with a maximum of 11.11% improvement in critical path delay when compared to the SOTA designs. On the application side, the proposed AIMAC SA method was evaluated for convolution-based Gaussian blurring and matrix-multiplication-supported DCT compression operations, and the proposed AIMAC SA showcased either comparable or better image quality metrics than the SOTA-derived output.