Learn
Learn Seaborn Introduction
Understanding Aggregates

Seaborn can also calculate aggregate statistics for large datasets. To understand why this is helpful, we must first understand what an aggregate is.

An aggregate statistic, or aggregate, is a single number used to describe a set of data. One example of an aggregate is the average, or mean of a data set. There are many other aggregate statistics as well.

Suppose we have a grade book with columns student, assignment_name, and grade, as shown below.

student assignment_name grade
Amy Assignment 1 75
Amy Assignment 2 82
Bob Assignment 1 99
Bob Assignment 2 90
Chris Assignment 1 72
Chris Assignment 2 66

To calculate a student’s current grade in the class, we need to aggregate the grade data by student. To do this, we’ll calculate the average of each student’s grades, resulting in the following data set:

student grade
Amy 78.5
Bob 94.5
Chris 69

On the other hand, we may be interested in understanding the relative difficulty of each assignment. In this case, we would aggregate by assignment, taking the average of all student’s scores on each assignment:

assignment_name grade
Assignment 1 82
Assignment 2 79.3

In both of these cases, the function we used to aggregate our data was the average or mean, but there are many types of aggregate statistics including:

  • Median
  • Mode
  • Standard Deviation

In Python, you can compute aggregates fairly quickly and easily using Numpy, a popular Python library for computing. You’ll use Numpy in this exercise to compute aggregates for a DataFrame.

Instructions

1.

To calculate aggregates using Numpy, you’ll first need to import the Numpy library at the top of script.py.

Type the following at the top of your file:

import numpy as np
2.

Next, take a minute to understand the data you’ll analyze. The DataFrame gradebook contains the complete gradebook for a hypothetical classroom. Use print to examine gradebook.

3.

Select all rows from the gradebook DataFrame where assignment_name is equal to Assignment 1. Save the result to the variable assignment1.

4.

Check out the DataFrame you just created. Print assignment1.

5.

Now use Numpy to calculate the median grade in assignment1.

Use np.median() to calculate the median of the column grade from assignment1 and save it to asn1_median.

6.

Display asn1_median using print. What is the median grade on Assignment 1?

Folder Icon

Sign up to start coding

Already have an account?