HIE-DRAM: High-Performance Efficient In-DRAM Computing Architecture for SIMD

Mayank Kabra1, Prashanth H C2, Kedar Deshpande1, Madhav Rao3
1Student, 2IIIT-Bangalore, 3International Institute of Information Technology-Bangalore


Abstract

In-memory and near-memory computing allows for placing the processing elements around the periphery or inside the memory blocks. Performing the computation as soon as the data is made available in the memory sub-blocks avoids the need to wait for the processor to manage the data movement. The paper focuses on a new 11 transistor (11T) computing design and a novel operation with energy savings and perfor- mance improvement when compared to the current state-of-the- art (SOTA) available single-instruction- multiple-data in-DRAM (SIMDRAM) computing. The novel 11T pass transistor design is structured to offer logical AND, OR, XNOR and its complement operations. These are sequenced to generate desired operational output with a minuscule change of 4 row circuitry that corre- sponds to footprint expense of 0.05% when compared to the existing DRAM architecture. Based on these logical operations, 13 scalar instructions covering arithmetic, predication, reduction, and relational function types are characterized. These scalar operations is a mix of logarithmic, quadratic, and linear functions applied on either a single or multiple operand. With respect to single-instruction- multiple-data (SIMD) topology, vector opera- tions comprising addition, multiplication, sparse multiplication, selection, unique, reduction, and prefix summation are also realized. All these operations were compared with the current SOTA SIMDRAM architectural design to showcase profound computing time benefits and energy savings. The proposed 11T in-DRAM-compute design offers 5.18% to 50.57% improvement in computing latency and energy across 10 scalar operations, over SIMDRAM architecture. The novel high performance and efficient in-DRAM computing (HIE-DRAM) implementation is a step towards utilizing real-time in-memory vector data processing for autonomous applications.