In order to improve the layout efficiency of VLSI designs, many placement tools employ clustering algorithms to prune the optimization space and produce a design that can balance multiple design constraints. An intelligent clustering algorithm can guide a placement tool to reduce wire length, reduce cycle time, tune additional metrics or trade off a combination of these objectives. Given the myriad of choices, clustering algorithms can be time consuming to run, and this can impact our ability to fully explore the clustering space. Heterogeneous systems have been growing in popularity due to their attractive processing capabilities. Heterogeneous computing systems can be leveraged to improve the performance of clustering algorithms. In this paper, we present an OpenCL-based parallel clustering algorithm used during placement that targets heterogeneous systems. Our algorithm splits the computation, and exploits both CPU and GPU concurrently to balance the memory usage and the computational load in the heterogeneous system. We compare our parallel algorithm CL-Choice to a number of previously proposed algorithms. Simulation results show that our parallel implementation run on a GPU can achieve a 27x improvement over a number of serial algorithms in terms of speed, while not significantly impacting design quality.