Skip to Content
Learn
Logistic Regression
Scikit-Learn

Now that you know the inner workings of how Logistic Regression works, let’s learn how to easily and quickly create Logistic Regression models with sklearn! sklearn is a Python library that helps build, train, and evaluate Machine Learning models.

To take advantage of sklearn‘s abilities, we can begin by creating a LogisticRegression object.

model = LogisticRegression()

After creating the object, we need to fit our model on the data. When we fit the model with sklearn it will perform gradient descent, repeatedly updating the coefficients of our model in order to minimize the log-loss. We train — or fit — the model using the .fit() method, which takes two parameters. The first is a matrix of features, and the second is a matrix of class labels.

model.fit(features, labels)

Now that the model is trained, we can access a few useful attributes of the LogisticRegression object.

  • model.coef_ is a vector of the coefficients of each feature
  • model.intercept_ is the intercept b_0

With our trained model we are able to predict whether new data points belong to the positive class using the .predict() method! .predict() takes a matrix of features as a parameter and returns a vector of labels 1 or 0 for each sample. In making its predictions, sklearn uses a classification threshold of 0.5.

model.predict(features)

If we are more interested in the predicted probability of the data samples belonging to the positive class than the actual class, we can use the .predict_proba() method. predict_proba() also takes a matrix of features as a parameter and returns a vector of probabilities, ranging from 0 to 1, for each sample.

model.predict_proba(features)

Before proceeding, one important note is that sklearn‘s Logistic Regression implementation requires feature data to be normalized. Normalization scales all feature data to vary over the same range. sklearn‘s Logistic Regression requires normalized feature data due to a technique called Regularization that it uses under the hood. Regularization is out of the scope of this lesson, but in order to ensure the best results from our model, we will be using a normalized version of the data from our Codecademy University example.

Instructions

1.

Let’s build, train and evaluate a Logistic Regression model in sklearn for our Codecademy University data! We’ve imported sklearn and the LogisiticRegression classifier for you. Create a Logistic Regression model named model.

2.

Train the model using hours_studied_scaled as the training features and passed_exam as the training labels.

3.

Save the coefficients of the model to the variable calculated_coefficients, and the intercept of the model to intercept. Print calculated_coefficients and intercept.

4.

The next semester a group of students in the Introductory Machine Learning course want to predict their final exam scores based on how much they intended to study for the exam. The number of hours each student thinks they will study, normalized, is given in guessed_hours_scaled. Use model to predict the probability that each student will pass the final exam, and save the probabilities to passed_predictions.

5.

That same semester, the Data Science department decides to update the final exam passage model to consider two features instead of just one. During the final exam, students were asked to estimate how much time they spent studying, as well as how many previous math courses they have taken. The student responses, along with their exam results, were split into training and test sets. The training features, normalized, are given to you in exam_features_scaled_train, and the students’ results on the final are given in passed_exam_2_train.

Create a new Logistic Regression model named model_2 and train it on exam_features_scaled_train and passed_exam_2_train.

6.

Use the model you just trained to predict whether each student in the test set, exam_features_scaled_test, will pass the exam and save the predictions to passed_predictions_2. Print passed_predictions_2.

Compare the predictions to the actual student performance on the exam in the test set. How well did your model do?

Folder Icon

Sign up to start coding

Already have an account?