Design and Evaluation of Parametric NTT Hardware Unit using different Multiplier based Modular Reduction Techniques

Lokesh Maji1, Aman Prajapati1, Madhav Rao2
1International Institute of Information Technology Bangalore, 2International Institute of Information Technology-Bangalore


The throughput of the efficient lattice-based cryptosystems is typically dependent on the Number Theoretic Transform (NTT) performance. Polynomial rings with the NTT unit is preferred over polynomial multiplication to reduce complexity. NTT has become a major computational entity for various cryptographic modules like hash functions, homomorphic encryption, key-encapsulation, and digital signatures. This research paper delves into the design and evaluation of the multiplier techniques for the modular reduction block which forms the core part of the hardware NTT design. The study explores three distinct multiplier designs: the Karatsuba multiplier (KA), the overlap-free Karatsuba (OKA) multiplier, and the Radix-4 based Booth multiplier and characterizes the same for modular reduction block and adopting the same in the complete NTT design. Comparison of various multipliers adopted designs over the state-of-the-art (SOTA) Processing elements (PEs) based designs were performed to showcase the impact of the multipliers adopted on different design parameters. Both ASIC synthesis using 45 nm GPDK library, and FPGA synthesis using Xilinx tool on Zynq Zedboard was performed on all the four variants including the SOTA design. The Radix-4 based Booth multiplier adopted Modular Reduction block design demonstrated power, performance, and area (PPA) gain over the other designs including SOTA work, and the same is reflected in the NTT hardware unit as well, and the trend of gain is consistent across designs with different number of processing elements (PEs), and for different size of coefficients. The proposed Radix-4 Multiplier designed NTT with 8 PEs was found the most optimal for 32 bit-width considering the power and footprint gain of above 16% each in FPGA design, above 7% each in silicon, and comparable performance over the SOTA design was achieved. The proposed Radix-4 based Booth multiplier incorporated NTT design is a first step towards developing power-performance-area (PPA) efficient hardware security block for modern day applications. All the design files are made freely available for further adoption and usage to the designers and researchers community.