Classification: K-Nearest Neighbors

K-Nearest Neighbors is a supervised machine learning algorithm for classification. You will implement and test this algorithm on several datasets.

Start[missing "en.views.course_landing_page.machine-learning.course_illustration" translation]

Key Concepts

Review core concepts you need to learn to master this subject

Euclidean Distance

The Euclidean Distance between two points can be computed, knowing the coordinates of those points.

On a 2-D plane, the distance between two points p and q is the square-root of the sum of the squares of the difference between their x and y components. Remember the Pythagorean Theorem: a^2 + b^2 = c^2 ?

We can write a function to compute this distance. Let's assume that points are represented by tuples of the form (x_coord, y_coord). Also remember that computing the square-root of some value n can be done in a couple of ways: math.sqrt(n), using the math library, or n ** 0.5 (n raised to the power of 1/2).

def distance(p1, p2):
  x_diff_squared = (p1[0] - p2[0]) ** 2
  y_diff_squared = (p1[1] - p2[1]) ** 2
  return (x_diff_squared + y_diff_squared) ** 0.5

distance( (0, 0), (3, 4) )      # => 5.0
Chevron Left Icon
Distance Formula
Lesson 1 of 2
Chevron Right Icon
  1. 1

    In this lesson, you will learn three different ways to define the distance between two points: 1. Euclidean Distance 2. Manhattan Distance 3. Hamming Distance Before diving into the distance form...

  2. 2

    Euclidean Distance is the most commonly used distance formula. To find the Euclidean distance between two points, we first calculate the squared distance between each dimension. If we add up al...

  3. 3

    Manhattan Distance is extremely similar to Euclidean distance. Rather than summing the squared difference between each dimension, we instead sum the absolute value of the difference between eac...

  4. 4

    Hamming Distance is another slightly different variation on the distance formula. Instead of finding the difference of each dimension, Hamming distance only cares about whether the dimensions a...

  5. 5

    Now that you've written these three distance formulas yourself, let's look at how to use them using Python's SciPy library: - Euclidean Distance [...] - Manhattan Distance [...] - Hamming Di...

  1. 1

    K-Nearest Neighbors (KNN) is a classification algorithm. The central idea is that data points with similar attributes tend to fall into similar categories. Consider the image to the right. Th...

  2. 2

    Before diving into the K-Nearest Neighbors algorithm, let's first take a minute to think about an example. Consider a dataset of movies. Let's brainstorm some features of a movie data point. A fe...

  3. 3

    In the first exercise, we were able to visualize the dataset and estimate the [...] nearest neighbors of an unknown point. But a computer isn’t going to be able to do that! We need to define wha...

  4. 4

    Making a movie rating predictor based on just the length and release date of movies is pretty limited. There are so many more interesting pieces of data about movies that we could use! So let’s add...

  5. 5

    In the next three lessons, we'll implement the three steps of the K-Nearest Neighbor Algorithm: 1. Normalize the data 2. Find the [...] nearest neighbors 3. Classify the new point based on t...

  6. 6

    The K-Nearest Neighbor Algorithm: 1. Normalize the data 2. Find the [...] nearest neighbors 3. Classify the new point based on those neighbors --- Now that our data has been normalized and...

  7. 7

    The K-Nearest Neighbor Algorithm: 1. Normalize the data 2. Find the [...] nearest neighbors 3. Classify the new point based on those neighbors --- We've now found the [...] nearest neigh...

  8. 8

    Nice work! Your classifier is now able to predict whether a movie will be good or bad. So far, we've only tested this on a completely random point [...] . In this exercise we're going to pick a re...

  9. 9

    You’ve now built your first K Nearest Neighbors algorithm capable of classification. You can feed your program a never-before-seen movie and it can predict whether its IMDb rating was above or belo...

  10. 10

    In the previous exercise, we found that our classifier got one point in the training set correct. Now we can test every point to calculate the validation accuracy. The validation accuracy change...

  11. 11

    The graph to the right shows the validation accuracy of our movie classifier as [...] increases. When [...] is small, overfitting occurs and the accuracy is relatively low. On the other hand, w...

  12. 12

    You've now written your own K-Nearest Neighbor classifier from scratch! However, rather than writing your own classifier every time, you can use Python's [...] library. [...] is a Python librar...

  13. 13

    Congratulations! You just implemented your very own classifier from scratch and used Python's [...] library. In this lesson, you learned some techniques very specific to the K-Nearest Neighbor al...

What you'll create

Portfolio projects that showcase your new skills

Pro Logo

How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory

Pro Logo

Classification: K-Nearest Neighbors

Start[missing "en.views.course_landing_page.machine-learning.course_illustration" translation]