Learn

Learn Seaborn Introduction

Understanding Aggregates

Seaborn can also calculate *aggregate statistics* for large datasets. To understand why this is helpful, we must first understand what an *aggregate* is.

An aggregate statistic, or aggregate, is a single number used to describe a set of data. One example of an aggregate is the average, or *mean* of a data set. There are many other aggregate statistics as well.

Suppose we have a grade book with columns `student`

, `assignment_name`

, and `grade`

, as shown below.

student | assignment_name | grade |
---|---|---|

Amy | Assignment 1 | 75 |

Amy | Assignment 2 | 82 |

Bob | Assignment 1 | 99 |

Bob | Assignment 2 | 90 |

Chris | Assignment 1 | 72 |

Chris | Assignment 2 | 66 |

… | … | … |

To calculate a student’s current grade in the class, we need to *aggregate* the grade data by student. To do this, we’ll calculate the average of each student’s grades, resulting in the following data set:

student | grade |
---|---|

Amy | 78.5 |

Bob | 94.5 |

Chris | 69 |

… | … |

On the other hand, we may be interested in understanding the relative difficulty of each assignment. In this case, we would aggregate by assignment, taking the average of all student’s scores on each assignment:

assignment_name | grade |
---|---|

Assignment 1 | 82 |

Assignment 2 | 79.3 |

… | … |

In both of these cases, the function we used to aggregate our data was the average or mean, but there are many types of aggregate statistics including:

- Median
- Mode
- Standard Deviation

In Python, you can compute aggregates fairly quickly and easily using Numpy, a popular Python library for computing. You’ll use Numpy in this exercise to compute aggregates for a DataFrame.

To calculate aggregates using Numpy, you’ll first need to import the Numpy library at the top of **script.py**.

Type the following at the top of your file:

`import numpy as np`

Next, take a minute to understand the data you’ll analyze. The DataFrame `gradebook`

contains the complete gradebook for a hypothetical classroom. Use `print`

to examine `gradebook`

.

Select all rows from the `gradebook`

DataFrame where `assignment_name`

is equal to `Assignment 1`

. Save the result to the variable `assignment1`

.

Check out the DataFrame you just created. Print `assignment1`

.

Now use Numpy to calculate the median grade in `assignment1`

.

Use `np.median()`

to calculate the median of the column `grade`

from `assignment1`

and save it to `asn1_median`

.

Display `asn1_median`

using `print`

. What is the median grade on Assignment 1?