Clustering: K-Means
Clustering is the most well-known unsupervised learning technique. It finds structure in unlabeled data by identifying similar groups.
StartKey Concepts
Review core concepts you need to learn to master this subject
K-Means: Inertia
Unsupervised Learning Basics
K-Means Algorithm: Intro
K-Means Algorithm: 2nd Step
Scikit-Learn Datasets
K-Means Using Scikit-Learn
Cross Tabulation Overview
K-Means: Reaching Convergence
K-Means: Inertia
K-Means: Inertia
Inertia measures how well a dataset was clustered by K-Means. It is calculated by measuring the distance between each data point and its centroid, squaring this distance, and summing these squares across one cluster.
A good model is one with low inertia AND a low number of clusters (K
). However, this is a tradeoff because as K
increases, inertia decreases.
To find the optimal K
for a dataset, use the Elbow method; find the point where the decrease in inertia begins to slow. K=3
is the “elbow” of this graph.
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory