Statistics in NumPy
Learn how to analyze different statistical distributions using NumPy.
StartKey Concepts
Review core concepts you need to learn to master this subject
NumPy’s Mean and Axis
Conditions in Numpy.mean()
NumPy Percentile Function
NumPy’s Percentile and Quartiles
NumPy’s Sort Function
Definition of Percentile
Datasets and their Histograms
Normal Distribution using Python Numpy module
NumPy’s Mean and Axis
NumPy’s Mean and Axis
We will use the following 2-dimensional array for this example:
```
py
ring_toss = np.array([[1, 0, 0],
[0, 0, 1],
[1, 0, 1]])
```
The code below will calculate the average of each row.
```py
np.mean(ring_toss, axis=1)
# Output: array([ 0.33333333, 0.33333333, 0.66666667])
```
In a two-dimensional array, you may want the mean of just the rows or just the columns. In Python, the NumPy .mean()
function can be used to find these values. To find the average of all rows, set the axis parameter to 1. To find the average of all columns, set the axis parameter to 0.
- 1You’re a citizen scientist who has started collecting data about rising water in the river next to where you live. For months, you painstakingly measure the water levels and enter your findings int…
- 2The first statistical concept we’ll explore is mean, also commonly referred to as an average. The mean is a useful measurement to get the center of a dataset. NumPy has a built-in function to cal…
- 3We can also use np.mean to calculate the percent of array elements that have a certain property. As we know, a logical operator will evaluate each item in an array to see if it matches the specif…
- 4If we have a two-dimensional array, np.mean can calculate the means of the larger array as well as the interior values. Let’s imagine a game of ring toss at a carnival. In this game, you have thr…
- 6One way to quickly identify outliers is by sorting our data, Once our data is sorted, we can quickly glance at the beginning or end of an array to see if some values lie far beyond the expected ran…
- 7Another key metric that we can use in data analysis is the median. The median is the middle value of a dataset that’s been ordered in terms of magnitude (from lowest to highest). Let’s look at …
- 8In a dataset, the median value can provide an important comparison to the mean. Unlike a mean, the median is not affected by outliers. This becomes important in skewed datasets, datasets whose va…
- 9As we know, the median is the middle of a dataset: it is the number for which 50% of the samples are below, and 50% of the samples are above. But what if we wanted to find a point at which 40% of t…
- 10Some percentiles have specific names: - The 25th percentile is called the first quartile - The 50th percentile is called the median - The 75th percentile is called the *third quarti…
- 11While the mean and median can tell us about the center of our data, they do not reflect the range of the data. That’s where standard deviation comes in. Similar to the interquartile range, the …
- 12As we saw in the last exercise, knowing the standard deviation of a dataset can help us understand how spread out our dataset is. We can find the standard deviation of a dataset using the Numpy f…
What you'll create
Portfolio projects that showcase your new skills
How you'll master it
Stress-test your knowledge with quizzes that help commit syntax to memory