Hierarchical Clustering in Python: A Step-by-Step Tutorial

Dr. Soumen Atta, Ph.D.
9 min readApr 3, 2023

Hierarchical clustering is a powerful and widely-used clustering technique that groups similar data points into clusters based on their similarities or dissimilarities. This technique is particularly useful in exploratory data analysis, where the goal is to identify underlying patterns or structures within the data.

Hierarchical clustering is divided into two categories, agglomerative and divisive.

  • In agglomerative clustering, each data point is initially treated as a separate cluster, and then the algorithm iteratively merges the closest pairs of clusters until all data points are assigned to a single cluster.
  • In divisive clustering, the opposite approach is used, starting with a single cluster and then recursively dividing it into smaller clusters.

In this tutorial, we will focus on agglomerative hierarchical clustering, which is the most common type of hierarchical clustering used in practice. We will start by explaining the basic concepts of hierarchical clustering, including linkage criteria, distance measures, and dendrograms. We will then proceed to the step-by-step implementation of hierarchical clustering in Python, using the popular scikit-learn library. We will also discuss how to visualize the results of hierarchical clustering using dendrograms and…

--

--

Dr. Soumen Atta, Ph.D.
Dr. Soumen Atta, Ph.D.

Written by Dr. Soumen Atta, Ph.D.

I am a Postdoctoral Researcher at the Faculty of IT, University of Jyväskylä, Finland. You can find more about me on my homepage: https://www.soumenatta.com/

No responses yet