MAPIM: Mat Parallelism for High Performance Processing in Non-volatile Memory Architecture

Joonseop Sim1, Minsu Kim2, Yeseong Kim3, Saransh Gupta4, Behnam Khaleghi4, Tajana Rosing5
1University of California, Sandiego, 2University of Minnesota, 3University of California San Diego, 4University of California, San Diego, 5UCSD


In the Internet of Things (IoT) era, data movement between processing units and memory is a critical factor in the overall system performance. Processing-in-Memory (PIM) is a promising solution to address this bandwidth bottleneck by performing a portion of computation inside the memory. Many prior studies have enabled various PIM operations on nonvolatile memory (NVM) by modifying sense amplifiers (SA). They exploit a single sense amplifier to handle multiple bitlines with a multiplexer (MUX) since a single SA circuit takes much larger area than an NVM 1-bit cell. This limits potential parallelism that the PIM techniques can ideally achieve. In this paper, we propose MAPIM, mat parallelism for high-performance processing in non-volatile memory architecture. Our design carries out multiple bit-lines (BLs) requests under a MUX in parallel with two novel design components, multi-column/row latch (MCRL) and shared SA routing (SSR). The MCRL allows the address decoder to activate multiple addresses in both column and row directions by buffering the consecutively-requested addresses. The activated bits are simultaneously sensed by the multiple SAs across a MUX based on the SSR technique. The experimental results show that MAPIM is up to 339X faster and 221X more energy efficient than a GPGPU. As compared to the state-of-the-art PIM designs, our design is 16X faster and 1.8X more energy efficient with insignificant area overhead.