Learn

Hypothesis Testing with R

Hypothesis Formulation

You begin the statistical hypothesis testing process by defining a *hypothesis*, or an assumption about your population that you want to test. A hypothesis can be written in words, but can also be explained in terms of the sample and population means you just learned about.

Say you are developing a website and want to compare the time spent on different versions of a homepage. You could run a hypothesis test to see if version A or B makes users stay on the page significantly longer. Your hypothesis might be:

`"The average time spent on homepage A is greater than the average time spent on homepage B."`

While this is a fine hypothesis to make, data analysts are often very hesitant people. They don’t like to make bold claims without having data to back them up! Thus when constructing hypotheses for a hypothesis test, you want to formulate a null hypothesis. A *null hypothesis* states that there is no difference between the populations you are comparing, and it implies that any difference seen in the sample data is due to sampling error. A null hypothesis for the same scenario is as follows:

`"The average time spent on homepage A is the same as the average time spent on homepage B."`

You could also restate this in terms of population mean:

`"The population mean of time spent on homepage A is the same as the population mean of time spent on homepage B."`

After collecting some sample data on how users interact with each homepage, you can then run a hypothesis test using the data collected to determine whether your null hypothesis is true or false, or can be rejected (i.e. there is a difference in time spent on homepage A or B).

A researcher at a pharmaceutical company is working on the development of a new medication to lower blood pressure, DeePressurize. They run an experiment with a control group of `100`

patients that receive a placebo (a sugar pill), and an experimental group of `100`

patients that receive DeePressurize. Blood pressure measurements are taken after a 3 month period on both groups of patients.

The researcher wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, `hypo_a`

and `hypo_b`

, are given in `notebook.Rmd`

. Which could be a null hypothesis for comparing the two sets of data? Update the value of `null_hypo_1`

to the string `"hypo_a"`

or `"hypo_b"`

based on your answer.

A product manager at a dating app company is developing a new user profile page with a different picture layout. They want to see if the new layout results in more matches between users than the current layout. `50%`

of profiles are updated to the new layout, and over a `1`

month period the number of matches for users with the new layout and the original layout are recorded.

The product manager wants to run a hypothesis test to compare the resulting datasets. Two hypotheses, `hypo_c`

and `hypo_d`

, are given in `notebook.Rmd`

. Which could be a null hypothesis for comparing the two sets of data? Update the value of `null_hypo_2`

to the string `"hypo_c"`

or `"hypo_d"`

based on your answer.