Efficient analysis of on-chip power delivery networks is one of the most challenging problems that EDA is confronted with. This paper addresses the problem of simulating general multi-layer power delivery networks with significant via resistances. An iterative solution method is combined with an efficient and extremely parallel preconditioning mechanism based on the application of a 3D Fast Transform solver, which enables harnessing the computational resources of massively parallel architectures, such as GPUs. Experimental evaluation of the proposed methodology on a set of large-scale industrial benchmarks demonstrates a speed-up of 290.2X for a 2.62M-node design over a state-of-the-art parallel direct solver, and a speed-up of 75.5X for a 10.51M-node design over a parallel iterative solver with a general-purpose preconditioner, when GPUs are utilized.