keyboard_arrow_up
Clustering for Different Scales of Measurement - The Gap Ratio Weighted K-Means Algorithm

Authors

Joris Guerin, Olivier Gibaru, Stephane Thiery and Eric Nyiri, Laboratoire des Sciences de l'Information et des Systemes, France

Abstract

This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data with a large spread and different levels of measurement between data dimensions. To overcome the combination of these two characteristics in the data, we use a weighted K-means algorithm which consists in weighting each dimension of the feature space before running K-means. The novelty of this paper lies in the introduction of new weights, relevant for the combination of large spread and different scales. The weight associated with a feature is proportional to the ratio of the biggest gap between two consecutive data points, and the average of all the other gaps. We call this algorithm gap-ratio K-means. This method is compared with two other variants of K-means on the Lego bricks clustering problem as well as two other common classification datasets.

Keywords

Unsupervised Learning, Weighted K-means, Scales of measurement, Robotics application

Full Text  Volume 7, Number 6