Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multi-lingual Neural Machine Translation

Mukul Lokhande1, Tanushree Dewangan1, Sharik Mansoori2, Tejas Chaudhari3, Akarsh J.1, Damayanti Lokhande4, Adam Teman5, Santosh Vishvakarma6
1Indian Institute of Technology Indore, 2Undergraduate, 3Indian Institute of Technology, Indore, 4Independent, 5Bar-Ilan University, 6IIT Indore


Abstract

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results in- dicating a 4.1× reduction in model size (FP4) and a 4.2× speedup in inference, correlating with a 66 tokens/s increase in throughput (4.8×). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low- resource linguistic contexts. The FPGA deployment demonstrated a 1.96× reduction in LUTs and a 1.65× decrease in FFs, resulting in a 2.2× increase in throughput compared to OPU and a 4.6× increase compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire code and dataset for reproducibility are pub- licly available at https://github.com/mukullokhande99/Bhasha- Rupantarika/, facilitating rapid integration and further devel- opment by researchers.