SAC: A Novel Multi-hop Routing Policy in Hybrid Distributed IoT System based on Multi-agent Reinforcement Learning

Wen Zhang1, Tao Liu2, Mimi Xie3, Jun Zhang4, Chen Pan1
1Texas A&M University-Corpus Christi, 2Lawrence Technological University, 3The University of Texas at San Antonio, 4Harvard University


Energy harvesting(EH) IoT devices have attracted vast attention in both academia and industry as they can work sustainably by harvesting energy from the ambient environment. However, due to the weak and transient nature of harvesting power, EH technology is unable to support power-intensive IoT devices such as IoT edge servers. Therefore, the hybrid IoT system where the EH IoT devices and non-EH IoT devices co-exist is forthcoming. This paper explored the routing problem in such a hybrid distributed IoT system. We first proposed a comprehensive multi-hop routing mechanism of this hybrid system. After that, we proposed a distributed multi-agent deep reinforcement learning algorithm, known as spatial asynchronous advantage actor-critic(SAC). SAC can optimize the system routing policy and energy allocation while maximizing the total amount of transmitted data and the overall data delivery to the sink node. The experiment results indicate that SAC can averagely complete at least  1:5 transmission rate and  12:9 Sink packet delivery rate compared with the baselines.