Learn
Aggregates in R
Calculating Aggregate Functions I

When we have a bunch of data, we often want to calculate aggregate statistics (mean, standard deviation, median, percentiles, etc.) over certain subsets of the data.

Suppose we have a grade book with columns `student`, `assignment_name`, and `grade`:

Amy Assignment 1 96
Amy Assignment 2 87
Bob Assignment 1 91
Bob Assignment 2 75
Chris Assignment 1 83
Chris Assignment 2 88

We want to get an average grade for each student across all assignments. We can do this using the helpful dplyr function `group_by()`.

For this example, we’d use the following piece of code:

``````grades <- df %>%
group_by(student) %>%

The output might look something like this:

Amy 91.5
Bob 83
Chris 85.5

In general, we use the following syntax to calculate aggregates:

``````df %>%
group_by(column_1) %>%
summarize(aggregate_name = command(column_2))``````
• `column_1` (`student` in our example) is the column that we want to `group_by()`
• `column_2` (`grade` in our example) is the column that we want to apply `command()`, a summary function, to using `summarize()`
• `aggregate_name` is the name assigned to the calculated aggregate

In addition to the summary functions discussed in the last exercise (`mean()`, `median()`, `sd()`, `var()`, `min()`, `max()`, `IQR()` and `n_distinct()`), another helpful summary function, especially for grouped data, is `n()`. `n()` will return the count of the rows within a group, and does not require a column as an argument. To get the count of the rows in each group of students from our example:

``````grades <- df %>%
group_by(student) %>%
summarize(count = n())``````

### Instructions

1.

Let’s return to our `orders` data from ShoeFly.com.

In the previous exercise, our finance department wanted to know the most expensive shoe that we sold.

Now, they want to know the price of the most expensive shoe for each `shoe_type` (i.e., the price of the most expensive boot, the price of the most expensive ballet flat, etc.).

Save your answer to the variable `pricey_shoes`, and view it.

2.

The inventory team wants to know how many of each `shoe_type` has been sold so they can forecast inventory for the future.

Save your answer to the variable `shoes_sold`, and view it.