All Spin Logic (ASL) employs multiple nano-magnets interacting through spin-torque using metallic interconnect. ASL gates, being magneto-metallic, can operate at ultra low terminal voltage of few millivolts, and hence can be exploited for low power computation. Since, nano-magnets can preserve their state upon withdrawal of supply voltage, ASL can be pipelined for higher performance, without insertion of extra latches. However, pipelining requires the use of clocked CMOS transistors, which significantly increase the required supply voltage. In this work we analyse the design of an 8-bit, pipelined ASL multiplier, integrated with CMOS clocking circuitry. We propose a design scheme for 3-D ASL, which involves stacking of multiple ASL layers that are clocked using the same CMOS transistors. Stacking of N ASL layers using the proposed scheme can enhance the power saving as well as area density by factor of N. The proposed design scheme for magneto-metallic computational blocks can achieve more than two order of magnitude higher density and 20x lower power consumption for 100Mhz operation, as compared to 15nm CMOS design.