Learn

Hypothesis Testing with R

Designing an Experiment

Suppose you want to know if students who study history are more interested in volleyball than students who study chemistry. Before doing anything else to answer your original question, you come up with a null hypothesis: `"History and chemistry students are interested in volleyball at the same rates."`

To test this hypothesis, you need to design an experiment and collect data. You invite `100`

history majors and `100`

chemistry majors from your university to join an extracurricular volleyball team. After one week, `34`

history majors sign up (`34%`

), and `39`

chemistry majors sign up (`39%`

). More chemistry majors than history majors signed up, but is this a “real”, or significant difference? Can you conclude that students who study chemistry are more interested in volleyball than students who study history?

In your experiment, the `100`

history and `100`

chemistry majors at your university are samples of their respective populations (all history and chemistry majors). The sample means are the percentages of history majors (`34%`

) and chemistry majors (`39%`

) that signed up for the team, and the difference in sample means is `39%`

- `34%`

= `5%`

. The population means are the percentage of history and chemistry majors worldwide that would sign up for an extracurricular volleyball team if given the chance.

You want to know if the difference you observed in these sample means (`5%`

) reflects a difference in the population means, or if the difference was caused by sampling error, and the samples of students you chose do not represent the greater populations of history and chemistry students.

Restating the null hypothesis in terms of the population means yields the following:

`"The percentage of all history majors who would sign up for volleyball is the same as the percentage of all chemistry majors who would sign up for volleyball, and the observed difference in sample means is due to sampling error."`

This is the same as saying, “If you gave the same volleyball invitation to every history and chemistry major in the world, they would sign up at the same rate, and the sample of `200`

students you selected are not representative of their populations.”

Your friend is a dog walker that specializes in working with Golden Retrievers and Goldendoodles. They are interested in knowing if there is a signficant difference in the lengths of the two breeds. After a few weeks of data collection, they give you a spreadsheet of `10`

Golden Retrievers’ lengths and `10`

Goldendoodles’ lengths.

The lengths of the dogs are given in `retriever_lengths`

and `doodle_lengths`

. Calculate the mean of each breed and save the results to `mean_retriever_l`

and `mean_doodle_l`

. View `mean_retriever_l`

and `mean_doodle_l`

.

Calculate the difference between `mean_retriever_l`

and `mean_doodle_l`

and save the result to `mean_difference`

. View `mean_difference`

.

You want to run a hypothesis test to see if there is a significant difference in the lengths of Golden Retrievers and Goldendoodles. Which of the two statements could be a formulation of the null hypothesis?

Update the value of `null_hypo`

with `"st_1"`

or `"st_2"`

depending on your answer.