Skip to main content

Table 5 Clustering goals and decision rules for municipalities

From: A practical approach to cluster validation in the energy sector

Goal Explanation Mathematical formulation Simos Rank Weight in %
Members of a cluster should be well represented by a specific datapoint within the dataset. This is necessary in order to a) simulate a real municipality and b) let it be as similar to other points in the cluster as possible. Input features are a lower dimensional representation of municipalities. max(Icp2cent) 13 13.20
The number of clusters should be as low as possible. Since the resulting clusters are the basis for a subsequent optimization with high computation time, a lower number is favored. max(Iparsimony) 9 9.13
Clusters should be clearly distinguishable. Since one goal is to create “personas” with the clusters in order to improve explainability, clusters should be distinguishable. max(Ipsep) 9 9.13
Communities within a cluster should be structurally similar. As similarity is defined by Euclidean distance, pairwise distances should correlate with cluster affiliation. max(Ipearson) 9 9.13
The number of clusters should be between 5 and 30. The experts in the simulation software estimate an upper limit of 30 possible simulations. In order to make the clustering viable, a minimum of 5 clusters was determined by the participants. max(Itargetrange) 7 7.10
Within-cluster dissimilarities should be small. This makes sure that not only the representative but also all datapoints in a cluster are comparable. \(\max (I_{avg\_wc})\) 7 7.10
Clusters should be describable by a low number of features. Next to having unique and distinguishable characteristics, in order to create understandable “personas”, the number of characterizing features should be as low as possible. max(Ipps) 5 5.07
Clusters should be relatively even in size. A clustering with 90% of the datapoints in one cluster is not desirable. Hence the participants agreed on this parameter. max(Ientropy) 1 1.00