Skip to main content

Table 2 Description of clustering goals

From: A practical approach to cluster validation in the energy sector

Goal Index
Within-cluster dissimilarities should be small: this implies that the points within a cluster are all relatively similar to one another. \(I_{avg\_wc}\)
Between-cluster dissimilarities should be large: clusters are clearly distinguishable and very different in their characteristics. Ipsep
Points of a cluster should be well represented by a centroid: a representative of the cluster (that is not an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way. Icentroid
Members of a cluster should be well represented by a specific datapoint within the dataset (=representative): a single point (that is an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way -
Clusters should correspond to connected areas in data space with high density: datapoints within a cluster always have very similar neighbors yet might not be very similar to every datapoint in the cluster (exception: spherical clusters). Iwidestgap
All clusters should have roughly the same size. Ientropy
The density of clusters should be roughly the same. Icvdens
The number of clusters should be low (many indices increase with an increasing number (Hennig 2015)) Iparsimony
The number of clusters should be within a certain range of values. Itargetrange*
It should be possible to characterize the clusters using a small number of variables: this is especially useful if the result is used for complexity reduction i.e., to create personas. Ipps*
  1. *Introduced in “Clustering of municipalities” section