European Journal of Operational Research, vol.314, no.3, pp.1065-1077, 2024 (SCI-Expanded)
Cluster ensembles have emerged as a powerful tool to obtain clusters of data points by combining a library of clustering solutions into a consensus solution. In this paper, we address the cluster ensemble selection problem and design a multi-objective optimization-based solution framework to produce consensus solutions. Given a library of clustering solutions, we first design a preprocessing procedure that measures the agreement of each clustering solution with the other solutions and eliminates the ones that may mislead the process. We then develop a multi-objective optimization algorithm that selects representative clustering solutions from the preprocessed library with respect to size, coverage, and diversity criteria and combines them into a single consensus solution, for which the true number of clusters is assumed to be unknown. We conduct experiments on different benchmark data sets. The results show that our approach yields more accurate consensus solutions compared to full-ensemble and the existing approaches for most data sets. We also present an application on the customer segmentation problem, where our approach is used to segment customers and to find a consensus solution for each segment, simultaneously.