Learn

While we can view a binary categorical variable as a way of creating two new regression equations with different intercepts, we don’t need to make these equations every time we want to interpret a binary predictor in a multiple regression equation.

In the survey dataset, breakfast is a binary variable that is equal to 1 for students who ate breakfast on test day and 0 for those who didn’t. For predicting score based on hours_studied and breakfast, the multiple regression equation is:

$\text{score} = 32.7 + 8.5*\text{hours\_studied} + 22.5*\text{breakfast}$

Take a look at the scatter plot with regression lines on top:

We can interpret the regression coefficients as follows:

• The breakfast variable has a coefficient of 22.5. The interpretation is: holding all other variables constant, students who ate breakfast scored 22.5 points higher than students who did not. “Holding all other variables constant” means that we’re comparing breakfast groups among students who studied the same number of hours. Visually, this means that the distance between the two regression lines is always 22.5 for any value of hours_studied (the dotted lines in the picture above are all the same length).

• The intercept (32.7) is the average value of the response variable when all predictors in the equation are equal to 0. According to our full regression equation, this means that students who didn’t study (hours_studied = 0) and didn’t eat breakfast (breakfast = 0) earned an average score of 32.7 (the y-intercept for the blue line).

Instructions

1.

Suppose that we fit a model to predict port3 (final Portuguese score) with predictors math1 (first semester math score) and address (urban or rural residence). The coefficients are printed below.

# Output:
# Intercept       3.234071
# math1           0.475892

In the file interpretations.txt write a one-sentence interpretation for the intercept. Does this interpretation make practical sense?

2.

Add a one-sentence interpretation to interpretations.txt for the coefficient on address in terms of the average Portuguese scores (port3) of students from rural areas (R or address = 0) and students from urban areas (U or address = 1). Check your solution against the sample solutions in solutions.txt.