Artificial Intelligence 4 min read

Master SciPy Clustering: K‑Means and Hierarchical Methods with Python

This guide introduces SciPy's clustering modules, explaining the vector quantization and k‑means algorithm in scipy.cluster.vq, and demonstrates hierarchical clustering with scipy.cluster.hierarchy, accompanied by complete Python code examples and visualizations to help you apply these techniques to real data.

Model Perspective

Aug 18, 2022

Master SciPy Clustering: K‑Means and Hierarchical Methods with Python

Clustering (scipy.cluster)

scipy.cluster.vq

Clustering algorithms are useful in information theory, target detection, communications, compression and other fields. The vq module only supports vector quantization and the k‑means algorithm.

The k‑means algorithm attempts to minimize the Euclidean distance between observations and centroids and includes several initialization methods.

The scipy.cluster.hierarchy module provides functions for hierarchical clustering. It generates hierarchical clusters from a distance matrix, computes cluster statistics, cuts links to produce flat clusters, and visualizes the hierarchy with dendrograms.

Code

K‑Means

from scipy.cluster.vq import kmeans2
import matplotlib.pyplot as plt

np.random.seed(12345678)
a = np.random.multivariate_normal([0, 6], [[2, 1], [1, 1.5]], size=45)
b = np.random.multivariate_normal([2, 0], [[1, -1], [-1, 3]], size=30)
c = np.random.multivariate_normal([6, 4], [[5, 0], [0, 1.2]], size=25)
z = np.concatenate((a, b, c))
np.random.shuffle(z)
centroid, label = kmeans2(z, 3, minit='points')
w0 = z[label == 0]
w1 = z[label == 1]
w2 = z[label == 2]
_ = plt.plot(w0[:, 0], w0[:, 1], 'o', alpha=0.5, label='cluster 0')
_ = plt.plot(w1[:, 0], w1[:, 1], 'd', alpha=0.5, label='cluster 1')
_ = plt.plot(w2[:, 0], w2[:, 1], 's', alpha=0.5, label='cluster 2')
_ = plt.plot(centroid[:, 0], centroid[:, 1], 'k*', label='centroids')
_ = plt.axis('equal')
_ = plt.legend(shadow=True)
_ = plt.show()

Hierarchical Clustering

from scipy.cluster import hierarchy
import matplotlib.pyplot as plt
from scipy.spatial.distance import pdist

X = [[0, 0], [0, 1], [1, 0],
     [0, 4], [0, 3], [1, 4],
     [4, 0], [3, 0], [4, 1],
     [4, 4], [3, 4], [4, 3]]
C = hierarchy.ward(pdist(X))
hierarchy.fcluster(C, t=2, criterion='maxclust')
_ = plt.figure()
Z = hierarchy.linkage(C, 'single')
dn = hierarchy.dendrogram(Z)
plt.show()

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Hierarchical Clustering Machine Learning clustering K-Means

Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.