In-memory computing (IMC) exploits the highly parallel vector-matrix multiplication in high-density memory arrays without data transfer and has been proposed to improve the form factor, cost, and power consumption of the future deep learning hardware. However, the requirements for memory devices are different from conventional storage applications. The limited precision and inherent variability pose severe design constraints at the architecture level. Furthermore, improving the energy and area-expensive peripheral circuits requires innovations. In this talk, I will discuss the cross-layer optimization strategy for energy-efficient and variation-aware IMC designs. The solutions across the device, circuit, and architecture levels will be illustrated using the state-of-the-art STT-MRAM and ferroelectric tunnel junction.