Preliminary Program

Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multi-lingual Neural Machine Translation

Mukul Lokhande¹, Tanushree Dewangan¹, Sharik Mansoori², Tejas Chaudhari³, Akarsh J.¹, Damayanti Lokhande⁴, Adam Teman⁵, Santosh Vishvakarma⁶
¹Indian Institute of Technology Indore, ²Undergraduate, ³Indian Institute of Technology, Indore, ⁴Independent, ⁵Bar-Ilan University, ⁶IIT Indore

Abstract

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results in- dicating a 4.1× reduction in model size (FP4) and a 4.2× speedup in inference, correlating with a 66 tokens/s increase in throughput (4.8×). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low- resource linguistic contexts. The FPGA deployment demonstrated a 1.96× reduction in LUTs and a 1.65× decrease in FFs, resulting in a 2.2× increase in throughput compared to OPU and a 4.6× increase compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire code and dataset for reproducibility are pub- licly available at https://github.com/mukullokhande99/Bhasha- Rupantarika/, facilitating rapid integration and further devel- opment by researchers.