Skip to Content
K-Nearest Neighbors
Using sklearn

You’ve now written your own K-Nearest Neighbor classifier from scratch! However, rather than writing your own classifier every time, you can use Python’s sklearn library. sklearn is a Python library specifically used for Machine Learning. It has an amazing number of features, but for now, we’re only going to investigate its K-Nearest Neighbor classifier.

There are a couple of steps we’ll need to go through in order to use the library. First, you need to create a KNeighborsClassifier object. This object takes one parameter - k. For example, the code below will create a classifier where k = 3

classifier = KNeighborsClassifier(n_neighbors = 3)

Next, we’ll need to train our classifier. The .fit() method takes two parameters. The first is a list of points, and the second is the labels associated with those points. So for our movie example, we might have something like this

training_points = [ [0.5, 0.2, 0.1], [0.9, 0.7, 0.3], [0.4, 0.5, 0.7] ] training_labels = [0, 1, 1], training_labels)

Finally, after training the model, we can classify new points. The .predict() method takes a list of points that you want to classify. It returns a list of its guesses for those points.

unknown_points = [ [0.2, 0.1, 0.7], [0.4, 0.7, 0.6], [0.5, 0.8, 0.1] ] guesses = classifier.predict(unknown_points)



We’ve imported sklearn for you. Create a KNeighborsClassifier named classifier that uses k=5.


We’ve also imported some movie data. Train your classifier using movie_dataset as the training points and labels as the training labels.


Let’s classify some movies. Classify the following movies: [.45, .2, .5], [.25, .8, .9],[.1, .1, .9]. Print the classifications!

Which movies were classified as good movies and which were classified as bad movies?

Remember, those three numbers associated with a movie are the normalized budget, run time, and year of release.

Folder Icon

Sign up to start coding

Already have an account?