From: A practical approach to cluster validation in the energy sector
Goal | Index |
---|---|
Within-cluster dissimilarities should be small: this implies that the points within a cluster are all relatively similar to one another. | \(I_{avg\_wc}\) |
Between-cluster dissimilarities should be large: clusters are clearly distinguishable and very different in their characteristics. | Ip−sep |
Points of a cluster should be well represented by a centroid: a representative of the cluster (that is not an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way. | Icentroid |
Members of a cluster should be well represented by a specific datapoint within the dataset (=representative): a single point (that is an original datapoint) reflects the characteristics of the datapoints within a cluster in the best possible way | - |
Clusters should correspond to connected areas in data space with high density: datapoints within a cluster always have very similar neighbors yet might not be very similar to every datapoint in the cluster (exception: spherical clusters). | Iwidestgap |
All clusters should have roughly the same size. | Ientropy |
The density of clusters should be roughly the same. | Icvdens |
The number of clusters should be low (many indices increase with an increasing number (Hennig 2015)) | Iparsimony |
The number of clusters should be within a certain range of values. | Itargetrange* |
It should be possible to characterize the clusters using a small number of variables: this is especially useful if the result is used for complexity reduction i.e., to create personas. | Ipps* |