If we want to compare two different distributions, we can put multiple histograms on the same plot. This could be useful, for example, in comparing the heights of a bunch of men and the heights of a bunch of women. However, it can be hard to read two histograms on top of each other. For example, in this histogram, we can’t see all of the blue plot, because it’s covered by the orange one:

We have two ways we can solve a problem like this:

use the keyword

`alpha`

, which can be a value between 0 and 1. This sets the transparency of the histogram. A value of 0 would make the bars entirely transparent. A value of 1 would make the bars completely opaque.plt.hist(a, range=(55, 75), bins=20, alpha=0.5) plt.hist(b, range=(55, 75), bins=20, alpha=0.5)This would make both histograms visible on the plot:

use the keyword

`histtype`

with the argument`'step'`

to draw just the outline of a histogram:plt.hist(a, range=(55, 75), bins=20, histtype='step') plt.hist(b, range=(55, 75), bins=20, histtype='step')which results in a chart like:

Another problem we face is that our histograms might have different numbers of samples, making one much bigger than the other. We can see how this makes it difficult to compare qualitatively, by adding a dataset `b`

with a much bigger `size`

value:

a = normal(loc=64, scale=2, size=10000) b = normal(loc=70, scale=2, size=100000) plt.hist(a, range=(55, 75), bins=20) plt.hist(b, range=(55, 75), bins=20) plt.show()

The result is two histograms that are very difficult to compare:

To solve this, we can normalize our histograms using `normed=True`

. This command divides the height of each column by a constant such that the total shaded area of the histogram sums to 1.

a = normal(loc=64, scale=2, size=10000) b = normal(loc=70, scale=2, size=100000) plt.hist(a, range=(55, 75), bins=20, alpha=0.5, normed=True) plt.hist(b, range=(55, 75), bins=20, alpha=0.5, normed=True) plt.show()

Now, we can more easily see the differences between the blue set and the orange set:

### Instructions

**1.**

We’ve provided another dataset in the file **sales_times_s2.csv** that represents the 371 sales at MatplotSip’s first location from 8am to 10pm on the same day. This data has the same structure as the sales times data from store 1, with an `id`

, a `card_no`

, and a `time`

. Take a look at the data in the `csv`

and familiarize yourself with it.

Using **script.py**, we’ve imported the times into a list called `sales_times2`

. You can see how we did this in **script.py**, but you’ll only be interacting with the lists `sales_times1`

and `sales_times2`

in **histogram.py**, so don’t worry if you don’t understand the conversion from `csv`

to list.

**2.**

Plot the histogram of times from the second location on top of the one from the last exercise.

**3.**

Notice that the histogram we plotted second completely obscures the first histogram we plotted.
Modify the transparency value of both histograms to be `0.4`

so that we can see the separate histograms better.

**4.**

Normalize both the histograms so that we can compare the patterns between them despite the differences in sample size.

# Sign up to start coding

By signing up for Codecademy, you agree to Codecademy's Terms of Service & Privacy Policy.