SpotOn: A Gradient-based Targeted Data Poisoning Attack on Deep Neural Networks

Yash Khare1, Kumud Lakara2, Sparsh Mittal3, Arvind Kaushik4, Rekha Singhal5
1Amrita Vishwa Vidyapeetham, 2Manipal Institute of Technology, 3IIT Roorkee, 4NXP Semiconductors, 5TCS Research


Abstract

Deep neural networks (DNNs) have reached human-level accuracy in many computer-vision tasks, yet, they fail miserably on adversarial inputs. As DNNs find increasing utility in security-critical domains, their vulnerability to adversarial attacks becomes a matter of grave concern. Adversarial examples are created by adding minor perturbations to the genuine inputs. From an attacker's perspective, the added perturbations need to be as inconspicuous as possible to evade detection by a human validator. However, previous gradient-based adversarial attacks, such as the ``fast gradient sign method'' (FGSM), add an equal amount (say $\epsilon$) of noise to all the pixels of an image. This leads to a significant loss in image quality, such that a human validator can easily detect the resultant adversarial samples.

We propose a novel gradient-based adversarial attack technique named SpotOn, which seeks to maintain the quality of adversarial images high. In SpotOn, we first identify an image's region of importance (ROI) using a ``class activation maps'' approach such as Grad-CAM. SpotOn has three variants. Two variants of SpotOn attack only the ROI, whereas the third variant adds an epsilon ($\epsilon$) amount of noise to the ROI and a much smaller amount of noise (say $\epsilon/3$) to the remaining image. Experimental results over the Caltech101 dataset show that compared to FGSM, the SpotOn technique achieves comparable degradation in CNN accuracy while maintaining much higher image quality (measured in terms of SSIM). For example, for $\epsilon=0.1$, FGSM degrades VGG19 accuracy from 92\% to 8\% and leads to an SSIM value of 0.48 by attacking all pixels in an image. By contrast, SpotOn-VariableNoise attacks only 34.8\% of the pixels in the image; degrades accuracy to 10.5\% and maintains an SSIM value of 0.78. This makes SpotOn an effective data-poisoning attack technique. Further, SpotOn reduces the hardware overhead of poisoning the image, since it requires poisoning much fewer number of pixels than the conventional technique.