Learn
Linear Regression in R
Assessing Multiple Linear Regression

Time to pull it all together! The interpretation of coefficents in multiple linear regression is slightly different than that of coefficents in simple linear regression. Coefficent of independent continunous variables, like `podcasts`, represents the difference in the predicted value of sales for each one-dollar increase in podcasts, given that all other variables in the model, including `tv`, are held constant. Given the output of calling `summary(model)` below, we can correctly say that for every one dollar increase in podcast advertisement spending, while holding the amount spent on `tv` and `newspaper` constant, the total sales of the related product increases by 1.049 dollars.

``````summary(model)

#output
Call:
lm(formula = sales ~ TV + podcast + newspaper, data = train)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.583386   1.024616   4.473 1.65e-05 ***
TV          3.006340   1.004924   7.380 1.62e-11 ***
podcast     1.049249   1.027665   5.395 3.10e-07 ***
newspaper   1.006340   1.002924   6.380 1.12e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1``````

In addition, the interpretation of boolean categorical variables differs slightly from that of continous variables. The coefficent value associated with a boolean categorical variable represents the effect of changing from one category to another. for instance, the coefficient value of 1.006 for `newspaper` tell us that running print advertisements results in a 1.006 dollar increase in `sales`, holding the values of `TV` and `podcast` constant.

As we’ve suggested throughout this lesson, data scientists often build many variations of a model with different combinations of independent variables before ultimately commiting to the model that best fits test data. Let’s practice building, interpreting, and selecting the best fit multi-linear model for our `convert_clean` dataset!

### Instructions

1.

Build a multiple linear regression model which regresses `impressions`, `clicks`, and `gender` on `total_convert`, using our `train` dataset. Save the result to a variable called `model`; then call `summary(model)` to view the model results.

2.

How might we interpret the coefficient estimate for `gender`? Set the variable `gender_coefficient` equal to the statement that most correctly interprets the estimate value — either `"a"`, `"b"`, or `"c"`:

A. The coefficient of the `gender` variable is not statistically significant, so we cannot come to any substantive conclusion from its’ value.

B. The coefficient of the `gender` variable is negative, which means that as `total_convert`, `clicks`, and `impressions` increases, men are less likely to purchase a advertised product.

C. The coefficient of the gender variable is negative. This means that a men are less likely than women––with the same value of `clicks` and `impressions`–– to purchase an advertised product.

3.

Let’s build a second, simpler model so that we can confirm adding gender to our model increases its’ accuracy. Build a multiple linear regression model which regresses `impressions`, and `clicks` on `total_convert`. Save the result to a variable called `model2`.

4.

Compute the R-squared value for `model` and `model2`, and save the results to `rsq_model` and `rsq_model2` respectively. Call both variables to view their values.

5.

Which model best fits our data? Set the variable `best_fit` equal to the larger r-squared value.

6.

Set the variable `gender_diff` equal to the difference between `rsq_model` and `rsq_model2`. Uncomment the f-string at the bottom of the file to see how we would provide a narrative around the effect of gender on interaction with online advertisements.