Parallelization of Kmeans++ using CUDA
K-means++ is an algorithm which is invented to improve the process of finding initial seeds in K-means algorithm. In this algorithm, initial seeds are chosen consecutively by a probability which is proportional to the distance to the nearest center. The most crucial problem of this algorithm is that when running in serial mode, it decreases the speed of clustering. In this paper, we aim to parallelize the most time consuming steps of the k-means++ algorithm. Our purpose is to reduce the running time while maintaining the quality of the serial algorithm.
READ FULL TEXT