Skip to Content
Learn
Different Plot Types
Multiple Histograms

If we want to compare two different distributions, we can put multiple histograms on the same plot. This could be useful, for example, in comparing the heights of a bunch of men and the heights of a bunch of women. However, it can be hard to read two histograms on top of each other. For example, in this histogram, we can’t see all of the blue plot, because it’s covered by the orange one:

overlap_hist

We have two ways we can solve a problem like this:

  1. use the keyword alpha, which can be a value between 0 and 1. This sets the transparency of the histogram. A value of 0 would make the bars entirely transparent. A value of 1 would make the bars completely opaque.

    plt.hist(a, range=(55, 75), bins=20, alpha=0.5) plt.hist(b, range=(55, 75), bins=20, alpha=0.5)

    This would make both histograms visible on the plot: alpha_histograms

  2. use the keyword histtype with the argument 'step' to draw just the outline of a histogram:

    plt.hist(a, range=(55, 75), bins=20, histtype='step') plt.hist(b, range=(55, 75), bins=20, histtype='step')

    which results in a chart like: step_histogram

Another problem we face is that our histograms might have different numbers of samples, making one much bigger than the other. We can see how this makes it difficult to compare qualitatively, by adding a dataset b with a much bigger size value:

a = normal(loc=64, scale=2, size=10000) b = normal(loc=70, scale=2, size=100000) plt.hist(a, range=(55, 75), bins=20) plt.hist(b, range=(55, 75), bins=20) plt.show()

The result is two histograms that are very difficult to compare: different_hist

To solve this, we can normalize our histograms using normed=True. This command divides the height of each column by a constant such that the total shaded area of the histogram sums to 1.

a = normal(loc=64, scale=2, size=10000) b = normal(loc=70, scale=2, size=100000) plt.hist(a, range=(55, 75), bins=20, alpha=0.5, normed=True) plt.hist(b, range=(55, 75), bins=20, alpha=0.5, normed=True) plt.show()

Now, we can more easily see the differences between the blue set and the orange set: normalized_hist

Instructions

1.

We’ve provided another dataset in the file sales_times_s2.csv that represents the 371 sales at MatplotSip’s first location from 8am to 10pm on the same day. This data has the same structure as the sales times data from store 1, with an id, a card_no, and a time. Take a look at the data in the csv and familiarize yourself with it.

Using script.py, we’ve imported the times into a list called sales_times2. You can see how we did this in script.py, but you’ll only be interacting with the lists sales_times1 and sales_times2 in histogram.py, so don’t worry if you don’t understand the conversion from csv to list.

2.

Plot the histogram of times from the second location on top of the one from the last exercise.

3.

Notice that the histogram we plotted second completely obscures the first histogram we plotted. Modify the transparency value of both histograms to be 0.4 so that we can see the separate histograms better.

4.

Normalize both the histograms so that we can compare the patterns between them despite the differences in sample size.

Folder Icon

Sign up to start coding

Already have an account?