Linear KernelPCA and K-Means Clustering Using New Estimated Eigenvectors of the Sample Covariance Matrix
Abstract
In this article, random matrix theory is used to propose a new K-means clustering algorithm via linear PCA. Our approach is devoted to linear PCA estimation when the number of the features d and the number of samples n go to infinity at the same rate. More precisely, we deal with the problem of building a consistent estimator of the eigenvectors of the covariance data matrix. Numerical results, based on the normalized mutual information (NMI) and the final error rate (ER), are provided and support our algorithm, even for a small number of features/samples. We also compare our approach to spectral clustering, K-means and traditional PCA methods.