Machine learning techniques have recently been applied to the problem of lithographic hotspot detection. It is widely believed that they are capable of identifying hotspot patterns unknown to the trained model. The quality of a machine learning method is conventionally measured by the accuracy rates determined from experiments employing random partitioning of benchmark samples into training and testing sets. In this paper, we demonstrate that these accuracy rates may not reflect the predictive capability of a method. We introduce two metrics—the predictive and memorizing accuracy rates—that quantitatively characterize the method’s capability to capture hotspots. We also claim that the number of false alarms per detected hotspot reflects both the method’s performance and the difficulty of detecting hotspots in the test set. By adopting the proposed metrics, a designer can conduct a fair comparison between different hotspot detection tools and adopt the one better suited to the verification needs.