Analysis of the Effect of Off-chip Memory Access on the Performance of an NPU System

Keonjoo Lee, Donghyun Kang, Duseok Kang, Soonhoi Ha
Seoul National University


Numerous CNN accelerators, called neural processing units (NPUs), have been proposed and developed recently to accelerate CNN computation with a customized chip. To minimize the DRAM access volume, NPUs commonly have a large on-chip memory and try to reuse the fetched data from the off-chip DRAM maximally. While extensive researches have been conducted to minimize the effect of off-chip DRAM access on the performance in the NPU design, little attention is paid to the detailed analysis of the off-chip DRAM access overhead on the NPU performance. In this paper, we analyze the effects of off-chip DRAM access latency on the NPU performance and how the off-chip SRAM changes the DRAM access latency based on a cycle-accurate system simulation environment.