
Overview of Clustering
Clustering techniques are a type of unsupervised machine learning algorithm. They are used, as the name suggests, to group similar data points together. The way grouping works is the distance between the different groups together is high and the distance between each data point inside the cluster is low. This is known as intra-class (the different groups) and inter-class (within each group).
Clustering techniques determine similarity through distance which can be measured through a number of means, such as Manhattan, Euclidean, or Minkowski.
There are many types of clustering techniques that you can employ. The distinction I want to make in this blog is between hierarchical and nonhierarchical algorithms.
Looking into hierarchical algorithms, there are 2 types that you can use. Agglomerative and divisive. The difference between these is that the former starts with k clusters where k is the number of observations and the algorithm begins to combine the 2 most similar clusters together and the latter starts with one giant cluster and begins to divide towards x clusters.
A nonhierarchical algorithm requires you to choose the initial clusters. The center point of each cluster is selected at random and the centers begin moving until the optimal points are reached for the k clusters.
A closer look at K-means
K-Means Clustering is a type of nonhierarchical algorithm. This means we need to decide on the number of clusters we want before we employ this technique. There are many scenarios where this is a hard decision as one wouldn’t know how many clusters there should be in your data. To help you evaluate how many clusters, or the value of k, to settle on, there are many metrics you can look at before finalizing your decision. Among them are the: Calinski Harabasz Score, the Elbow Plot, and the Silhouette Score.
Calinski Harabasz Score
The Calinski-Harabasz score evaluates a cluster based on the average intra and inter cluster sum of squares. The higher the score, the better the clustering is for your data. This rings true with what we talked about earlier where you want to maximize the distance between different clusters and minimize the distance between data points within each cluster.

Silhouette Score
The silhouette score is calculated based on the mean intra-cluster distance and the mean nearest-cluster distance for each sample. The values range from 1 to -1 where the higher the value the better the clustering fit is.

Elbow Plot
An elbow plot is a general term which essentially plots a metric on the y-axis and a value of k on the x-axis. The purpose of an elbow plot is to determine the point of diminishing returns of naturally occurring clusters in a data set. The value of k is determined by the point where the plot bends like an arm.

Conclusion
Determining the optimal number of clusters in your data depends on the circumstances. There is no right answer to fit 100% of the problems. However, with these 3 metrics, you will have more information in determining what number of clusters makes the most sense for you, your data, and the story you are trying to tell.