Learn

Hypothesis Testing with R

One Sample T-Test

Consider the fictional business BuyPie, which sends ingredients for pies to your household so that you can make them from scratch. Suppose that a product manager hypothesizes the average age of visitors to BuyPie.com is `30`

. In the past hour, the website had `100`

visitors and the average age was `31`

. Are the visitors older than expected? Or is this just the result of chance (sampling error) and a small sample size?

You can test this using a One Sample T-Test. A *One Sample T-Test* compares a sample mean to a hypothetical population mean. It answers the question “What is the probability that the sample came from a distribution with the desired mean?”

The first step is formulating a null hypothesis, which again is the hypothesis that there is no difference between the populations you are comparing. The second population in a One Sample T-Test is the hypothetical population you choose. The null hypothesis that this test examines can be phrased as follows: `"The set of samples belongs to a population with the target mean".`

One result of a One Sample T-Test will be a *p-value*, which tells you whether or not you can reject this null hypothesis. If the p-value you receive is less than your significance level, normally `0.05`

, you can reject the null hypothesis and state that there is a significant difference.

R has a function called `t.test()`

in the `stats`

package which can perform a One Sample T-Test for you.

`t.test()`

requires two arguments, a distribution of values and an expected mean:

`results <- t.test(sample_distribution, mu = expected_mean)`

`sample_distribution`

is the sample of values that were collected`mu`

is an argument indicating the desired mean of the hypothetical population`expected_mean`

is the value of the desired mean

`t.test()`

will return, among other information we will not cover here, a p-value — this tells you how confident you can be that the sample of values came from a distribution with the specified mean.

P-values give you an idea of how confident you can be in a result. Just because you don’t have enough data to detect a difference doesn’t mean that there isn’t one. Generally, the more samples you have, the smaller a difference you can detect.

We have provided a small dataset called `ages`

, representing the ages of customers to BuyPie.com in the past hour, in `notebook.Rmd`

.

Even with a small dataset like this, it is hard to make judgments from just looking at the numbers.

To understand the data better, let’s look at the mean. Calculate the mean of `ages`

, and store the result in a variable called `ages_mean`

. View `ages_mean`

.

Use the `t.test()`

function with `ages`

to see what p-value the experiment returns for this distribution, where we expect the mean to be `30`

.

Store the results of the test in a variable called `results`

.

Does the p-value you got with the One Sample T-Test make sense, knowing the mean of `ages`

?