MambaLiteSR: Image Super-Resolution with Low-Rank Mamba using Knowledge Distillation

Romina Aalishah, Mozhgan Navardi, Tinoosh Mohsenin
Johns Hopkins University


Abstract

Generative Artificial Intelligence (AI) has gained significant attention in recent years, revolutionizing various applications across industries. Among these, advanced vision models for image super-resolution are in high demand, particularly for deployment on edge devices where real-time processing is crucial. However, deploying such models on edge devices is challenging due to limited computing power and memory. In this paper, we present MambaLiteSR, a novel lightweight image Super-Resolution (SR) model that utilizes the architecture of Vision Mamba. It integrates State Space Blocks and a reconstruction module for efficient feature extraction. To optimize efficiency without affecting performance, MambaLiteSR employs knowledge distillation, transferring essential information from a larger Mamba-based teacher model to a smaller student model through hyperparameter tuning. Through a mathematical analysis of model parameters and their impact on the Peak Signal-to-Noise Ratio (PSNR), we identify key factors and adjust them accordingly. Our comprehensive evaluation shows that MambaLiteSR outperforms state-of-the-art edge SR methods by reducing latency and power consumption while maintaining competitive PSNR and SSIM scores across benchmark datasets such as Set5, Set14, and BSD100. It also reduces the power usage during training by adopting low-rank approximation. Moreover, MambaLiteSR reduces the total number of parameters without degrading performance, enabling the efficient deployment of generative AI models on resource-constrained devices. Deployment on the embedded NVIDIA Jetson Orin Nano confirms the superior balance of MambaLiteSR size, latency, and resource efficiency. The experimental results show that MambaLiteSR achieves performance comparable to both the baseline and other edge models while using 15% fewer parameters than the baseline. It also reduces power consumption by up to 3.4x and offers faster inference compared to state-of-the-art SR edge models, all while maintaining low energy consumption during training.