Through silicon measurements of test chips designed based on two standard cell libraries in 40nm, this paper shows that by proper current distribution balancing between NMOS and PMOS network, standard cells can be improved in terms of speed, power efficiency and variation resilience in sub-threshold region. Compared to the commercial library cells, combinational cells have 2× better speed at sub-threshold region and up to 60% less leakage power from sub-threshold to super-threshold region without any area penalty. Our Flip-flops have 1.3× better propagation delay, and 40mV lower first failure voltage. The simulation results of an in-house hardware accelerator further proves that based on sub-threshold custom cell library, the accelerator can achieve 2× faster speed, 46% less variation and 20% energy savings at 0.3V.