A practical approach to cluster validation in the energy sector

Bogensperger, Alexander; Fabel, Yann

doi:10.1186/s42162-021-00177-1

Energy Informatics

Table 2 Description of clustering goals

From: A practical approach to cluster validation in the energy sector

Goal	Index
Within-cluster dissimilarities should be small: this implies that the points within a cluster are all relatively similar to one another.	\(I_{avg\_wc}\)
Between-cluster dissimilarities should be large: clusters are clearly distinguishable and very different in their characteristics.	I_p−sep
Points of a cluster should be well represented by a centroid: a representative of the cluster (that is not an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way.	I_centroid
Members of a cluster should be well represented by a specific datapoint within the dataset (=representative): a single point (that is an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way	-
Clusters should correspond to connected areas in data space with high density: datapoints within a cluster always have very similar neighbors yet might not be very similar to every datapoint in the cluster (exception: spherical clusters).	I_widestgap
All clusters should have roughly the same size.	I_entropy
The density of clusters should be roughly the same.	I_cvdens
The number of clusters should be low (many indices increase with an increasing number (Hennig 2015))	I_parsimony
The number of clusters should be within a certain range of values.	I_targetrange*
It should be possible to characterize the clusters using a small number of variables: this is especially useful if the result is used for complexity reduction i.e., to create personas.	I_pps*

^*Introduced in “Clustering of municipalities” section

Back to article page