Hardware Trojans for Confidence Reduction and Misclassifications on Neural Networks

Mahdieh Grailoo1, Mairo Leier2, Samuel Pagliarini2
1Dpt. of Computer Systems, Tallinn University of Technology, Estonia, 2Tallinn University of Technology (TalTech)


With the rapid development of deep learning models, neural networks (NNs) have become a prominent solution for many complex problems. As such, NNs have been considered as hardware (HW) accelerators for best-in-class performance in these tasks. Nevertheless, once a NN is deployed in HW, it becomes susceptible to an array of attacks that have no direct counterpart in software (e.g., HW trojan horses (HT)). The malicious logic inserted by a rogue element can alter the behavior of a NN in a stealthy way, escaping both test-based detection and human eye inspection. In this work, we propose 8 specialized HTs for NNs that are architecture-independent and cover varied adversarial goals, including misclassifications and confidence reduction. The proposed trojans require no toolchain manipulation nor access to the NN model, making their deployment feasible. Results on the MNIST set of handwritten digits show that our Trojans can achieve a 100% attack success rate in all adversarial goals while incurring small overheads of about 0.2%, 2%, and 3% on average in resource usage, delay, and dynamic power, respectively. In order to quantify the trojan detectability, we have resorted to a reverse engineering technique, which reveals that the payload of some of our trojans has little impact on the netlist structure.