# Statistics in NumPy

Learn how to analyze different statistical distributions using NumPy.

Start## Key Concepts

Review core concepts you need to learn to master this subject

NumPy’s Mean and Axis

Conditions in Numpy.mean()

NumPy Percentile Function

NumPy’s Percentile and Quartiles

NumPy’s Sort Function

Definition of Percentile

Datasets and their Histograms

Normal Distribution using Python Numpy module

NumPy’s Mean and Axis

NumPy’s Mean and Axis

```
We will use the following 2-dimensional array for this example:
```
py
ring_toss = np.array([[1, 0, 0],
[0, 0, 1],
[1, 0, 1]])
```
The code below will calculate the average of each row.
```py
np.mean(ring_toss, axis=1)
# Output: array([ 0.33333333, 0.33333333, 0.66666667])
```
```

In a two-dimensional array, you may want the mean of just the rows or just the columns. In Python, the NumPy `.mean()`

function can be used to find these values. To find the average of all rows, set the axis parameter to 1. To find the average of all columns, set the axis parameter to 0.

- 1You’re a citizen scientist who has started collecting data about rising water in the river next to where you live. For months, you painstakingly measure the water levels and enter your findings int…
- 2The first statistical concept we’ll explore is
*mean*, also commonly referred to as an average. The mean is a useful measurement to get the center of a dataset. NumPy has a built-in function to cal… - 3We can also use np.mean to calculate the percent of array elements that have a certain property. As we know, a logical operator will evaluate each item in an array to see if it matches the specif…
- 4If we have a two-dimensional array, np.mean can calculate the means of the larger array as well as the interior values. Let’s imagine a game of ring toss at a carnival. In this game, you have thr…
- 6One way to quickly identify outliers is by sorting our data, Once our data is sorted, we can quickly glance at the beginning or end of an array to see if some values lie far beyond the expected ran…
- 7Another key metric that we can use in data analysis is the
*median*. The median is the middle value of a dataset that’s been ordered in terms of magnitude (from lowest to highest). Let’s look at … - 8In a dataset, the median value can provide an important comparison to the mean. Unlike a mean, the median is not affected by outliers. This becomes important in
*skewed*datasets, datasets whose va… - 9As we know, the median is the middle of a dataset: it is the number for which 50% of the samples are below, and 50% of the samples are above. But what if we wanted to find a point at which 40% of t…
- 10Some percentiles have specific names: - The
**25th percentile**is called the*first quartile*- The**50th percentile**is called the*median*- The**75th percentile**is called the *third quarti… - 11While the mean and median can tell us about the center of our data, they do not reflect the range of the data. That’s where
*standard deviation*comes in. Similar to the interquartile range, the … - 12As we saw in the last exercise, knowing the standard deviation of a dataset can help us understand how spread out our dataset is. We can find the standard deviation of a dataset using the Numpy f…

## What you'll create

Portfolio projects that showcase your new skills

## How you'll master it

Stress-test your knowledge with quizzes that help commit syntax to memory