FUSE: A Framework for Unified System Evaluation of Accuracy in Edge Inference Accelerators

Shamik Kundu1, Arnab Raha2, Deepak Mathaikutty2
1Intel Corporation, 2Intel Labs


Abstract

The proliferation of AI inference on edge devices has driven aggressive specialization of neural processing units (NPUs) toward compute-in-memory (CiM) and mixed-precision architectures. To sustain throughput and energy efficiency within strict power and area budgets, these accelerators rely on hardware approximations such as reduced precision, analog computation, and selective digital fallback. While these techniques improve utilization and efficiency, they introduce numeric deviations whose cumulative effect on end-to-end model fidelity remains poorly understood. Existing evaluation approaches often analyze isolated circuit or layer-level behavior, providing limited visibility into system-level accuracy trade-offs. We present FUSE—a Framework for Unified System Evaluation of Accuracy in Edge Inference Accelerators—that bridges the gap between hardware-level numeric design and application-level performance. FUSE integrates detailed numeric modeling of CiM architectures directly into full-network inference pipelines. It injects per-layer nonidealities such as quantization, pre-/post-alignment, scaling, converter precision limits, and hybrid digital offload, enabling quantitative evaluation of their effect on metrics like Top-1 accuracy and perplexity. Case studies on vision and language models demonstrate how alignment width, ADC resolution, and scaling granularity jointly shape accuracy loss under constrained precision. By providing a unified, reproducible methodology for cross-domain accuracy analysis, FUSE enables principled, accuracy-centric co-design of future edge accelerators.