Learn
Hypothesis Testing with R
Hypothesis Formulation

You begin the statistical hypothesis testing process by defining a hypothesis, or an assumption about your population that you want to test. A hypothesis can be written in words, but can also be explained in terms of the sample and population means you just learned about.

Say you are developing a website and want to compare the time spent on different versions of a homepage. You could run a hypothesis test to see if version A or B makes users stay on the page significantly longer. Your hypothesis might be:

"The average time spent on homepage A is greater than the average time spent on homepage B."

While this is a fine hypothesis to make, data analysts are often very hesitant people. They don’t like to make bold claims without having data to back them up! Thus when constructing hypotheses for a hypothesis test, you want to formulate a null hypothesis. A null hypothesis states that there is no difference between the populations you are comparing, and it implies that any difference seen in the sample data is due to sampling error. A null hypothesis for the same scenario is as follows:

"The average time spent on homepage A is the same as the average time spent on homepage B."

You could also restate this in terms of population mean:

"The population mean of time spent on homepage A is the same as the population mean of time spent on homepage B."

After collecting some sample data on how users interact with each homepage, you can then run a hypothesis test using the data collected to determine whether your null hypothesis is true or false, or can be rejected (i.e. there is a difference in time spent on homepage A or B).

Instructions

1.

A researcher at a pharmaceutical company is working on the development of a new medication to lower blood pressure, DeePressurize. They run an experiment with a control group of 100 patients that receive a placebo (a sugar pill), and an experimental group of 100 patients that receive DeePressurize. Blood pressure measurements are taken after a 3 month period on both groups of patients.

The researcher wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, hypo_a and hypo_b, are given in notebook.Rmd. Which could be a null hypothesis for comparing the two sets of data? Update the value of null_hypo_1 to the string "hypo_a" or "hypo_b" based on your answer.

2.

A product manager at a dating app company is developing a new user profile page with a different picture layout. They want to see if the new layout results in more matches between users than the current layout. 50% of profiles are updated to the new layout, and over a 1 month period the number of matches for users with the new layout and the original layout are recorded.

The product manager wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, hypo_c and hypo_d, are given in notebook.Rmd. Which could be a null hypothesis for comparing the two sets of data? Update the value of null_hypo_2 to the string "hypo_c" or "hypo_d" based on your answer.

Folder Icon

Take this course for free

Already have an account?