Asynchronous Design flow for Neuromorphic Chips

Prasad Joshi
Intel Corporation


Abstract

For a growing number of applications, traditional clocked design methods are struggling with challenges posed by the latest process nodes. We see this most vividly today in long-range interconnect, where rising resistance and process variability lead to daunting mesochronous integration challenges that cost design complexity, area/energy efficiency, and latency. Rising power densities motivate new architectures, such as neuromorphic computing, and a need for timing resiliency to support aggressive voltage/frequency scaling and near-threshold operation. Asynchronous design techniques have long been seen as an eventual solution to these problems since they offer timing modularity, robustness to variability, and dramatically relaxed chip-wide timing constraints. Intel has the most advanced asynchronous design flow in the industry, obtained by its acquisition of Fulcrum Microsystems in 2011. This flow has produced five generations of Ethernet switch products and more recently Loihi, a self-learning neuromorphic research test chip, which was introduced in November 2017. Neuromorphic systems act as accelerators for spiking neural networks (SNNs) which emulate the natural neural networks that exist in biological brains. Each “neuron” in an SNN can independently send pulsed signals (spikes) to other neurons in the network, wherein information is encoded within the signals themselves and their timing. Loihi is a many-core mesh comprising 128 neuromorphic cores, where-in each core implements up to 1024 such neurons in a time multiplexed manner. The lack of an explicit synchronization assumption seen in biological neurons make them fundamentally asynchronous in nature. Accordingly, event-driven asynchronous design is widely accepted as the appropriate tool for prototyping SNNs in silicon for the various benefits it provides. Activity gating offered by asynchronous flow control eliminates power consumption wasted by a free running clock in these sparsely active SNNs where workloads can often be in bursts and unpredictable. At the architectural level, an asynchronous multi-chip barrier synchronization scheme to synchronize between the cores in the mesh, leads to a significant improvement in global performance by eliminating needless mesh-wide idle time that would be necessary in a clocked design to account for worst case spiking activity. The primary hurdle facing widespread use of asynchronous design is the lack of adequate tool support and a standardized design methodology. At Intel, we have developed a hierarchical asynchronous design flow that leverages commercial synchronous EDA tools with an almost fully standard cell library. This talk will cover our “push-button” RTL-to-GDSII backend flow as well as our approach to scan insertion, ATPG, timing closure, FPGA emulation and design validation.