# Statistics in NumPy

Learn how to analyze different statistical distributions using NumPy.

Start- 1
You're a citizen scientist who has started collecting data about rising water in the river next to where you live. For months, you painstakingly measure the water levels and enter your findings int...

- 2
The first statistical concept we'll explore is

*mean*, also commonly referred to as an average. The mean is a useful measurement to get the center of a dataset. NumPy has a built-in function to cal... - 3
We can also use [...] to calculate the percent of array elements that have a certain property. As we know, a logical operator will evaluate each item in an array to see if it matches the specif...

- 4
If we have a two-dimensional array, [...] can calculate the means of the larger array as well as the interior values. Let's imagine a game of ring toss at a carnival. In this game, you have thr...

- 5
As we can see, the mean is a helpful way to quickly understand different parts of our data. However, the mean is highly influenced by the specific values in our data set. What happens when one of t...

- 6
One way to quickly identify outliers is by sorting our data, Once our data is sorted, we can quickly glance at the beginning or end of an array to see if some values lie far beyond the expected ran...

- 7
Another key metric that we can use in data analysis is the

*median*. The median is the middle value of a dataset that’s been ordered in terms of magnitude (from lowest to highest). Let's look at ... - 8
In a dataset, the median value can provide an important comparison to the mean. Unlike a mean, the median is not affected by outliers. This becomes important in

*skewed*datasets, datasets whose va... - 9
As we know, the median is the middle of a dataset: it is the number for which 50% of the samples are below, and 50% of the samples are above. But what if we wanted to find a point at which 40% of t...

- 10
Some percentiles have specific names: - The

**25th percentile**is called the*first quartile*- The**50th percentile**is called the*median*- The**75th percentile**is called the *third quarti... - 11
While the mean and median can tell us about the center of our data, they do not reflect the range of the data. That's where

*standard deviation*comes in. Similar to the interquartile range, the ... - 12
As we saw in the last exercise, knowing the standard deviation of a dataset can help us understand how spread out our dataset is. We can find the standard deviation of a dataset using the Numpy f...

- 13
Let's review! In this lesson, you learned how to use NumPy to analyze single-variable datasets. Here's what we covered: - Using the [...] method to locate outliers. - Calculating central positio...

- 1
A university wants to keep track of the popularity of different programs over time, to ensure that programs are allocated enough space and resources. You work in the admissions office and are asked...

- 2
When we first look at a dataset, we want to be able to quickly understand certain things about it: - Do some values occur more often than others? - What is the range of the dataset (i.e., the min ...

- 3
Suppose we had a larger dataset with values ranging from 0 to 50. We might not want to know exactly how many 0's, 1's, 2's, etc. we have. Instead, we might want to know how many values fall betwee...

- 4
We can graph histograms using a Python module known as

*Matplotlib*. We're not going to go into detail about Matplotlib’s plotting functions, but if you're interested in learning more, take our co... - 5
Histograms and their datasets can be classified based on the shape of the graphed values. In the next two exercises, we'll look at two different ways of describing histograms. One way to classify...

- 6
Most of the datasets that we'll be dealing with will be unimodal (one peak). We can further classify unimodal distributions by describing where most of the numbers are relative to the peak. A *sym...

- 7
The most common distribution in statistics is known as the

*normal distribution*, which is a symmetric, unimodal distribution. Lots of things follow a normal distribution: - The heights of a large... - 8
We can generate our own normally distributed datasets using NumPy. Using these datasets can help us better understand the properties and behavior of different distributions. We can also use them to...

- 9
In a normal distribution, we know that the mean and the standard deviation determine certain characteristics of the shape of our data, but how exactly? Let’s do some exploration to find out!

- 10
We know that the standard deviation affects the "shape" of our normal distribution. The last exercise helps to give us a more quantitative understanding of this. Suppose that we have a normal dis...

- 11
It's known that a certain basketball player makes 30% of his free throws. On Friday night’s game, he had the chance to shoot 10 free throws. How many free throws might you expect him to make? We ...

- 12
There are some complicated formulas for determining these types of probabilities. Luckily for us, we can use NumPy - specifically, its ability to generate random numbers. We can use these random nu...

- 13
Let's return to our original question: Our basketball player has a 30% chance of making any individual basket. He took 10 shots and made 4 of them, even though we only expected him to make 3. Wh...

- 14
Let's review! In this lesson, you learned how to use NumPy to analyze different distributions and generate random numbers to produce datasets. Here's what we covered: - What is a histogram and how...

## What you'll create

Portfolio projects that showcase your new skills

## How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory