Abstract—At advanced VLSI nodes, detecting potential yield detractors in the physical design is becoming increasing challenging. These design weak points or hotspots tend to be complex geometric patterns that pass all the design rules but are very difficult to manufacture. In this article, we demonstrate how machine learning based techniques can be used to detect these hotspots in a VLSI design. We propose a scalable data generation flow that can be used to train any machine learning model. We use this flow to generate a large balanced dataset and train several models to systematically study the effects of various parameters like the dataset size, the clip diameter and the number of extracted features. We test several standard machine learning algorithms with this dataset and finally demonstrate models with very high hotspot detection accuracy.