Member-only story

K-Median Clustering Algorithm in Machine Learning and its Python Implementation

3 min readFeb 19, 2024

K-Median Clustering Algorithm in Machine Learning and its Python Implementation

The k-median algorithm is a clustering algorithm that is used to partition a dataset into k clusters where each cluster is represented by the median of its data points. Unlike the k-means algorithm, which uses the mean as the centroid, the k-median algorithm uses the median, making it more robust to outliers.

1. Introduction to k-median algorithm

1.1 What is k-median?

The k-median algorithm is a partitional clustering algorithm that aims to partition a dataset into k clusters in a way that minimizes the total distance between data points and their respective cluster medians.

1.2 Use Cases

K-median is particularly useful when dealing with datasets where the mean may be sensitive to outliers, and you want a more robust measure of central tendency for each cluster.

1.3 Key Concepts

Median: The median is the middle value of a dataset when it is ordered. It is less sensitive to extreme values compared to the mean.
Objective Function: The algorithm minimizes an objective function, which is the sum of distances between data points and their respective cluster medians.

2. How k-median algorithm works

2.1 Algorithm Steps

Initialization: Randomly select k data points as initial cluster medians.
Assignment: Assign each data point to the nearest cluster median.
Update Medians: Recalculate the medians for each cluster.
Repeat Assignment and Update: Iteratively repeat steps 2 and 3 until convergence.

2.2 Distance Metrics

Common distance metrics for calculating distances between data points and cluster medians include Euclidean distance, Manhattan distance, or any other appropriate metric based on your data and requirements.

3. Python Implementation

Here’s a simple Python implementation using NumPy:

import numpy as np

def…