Analytics Python

Python NumPy Tutorial

In this lesson on Python NumPy library, we will look at how this library allows us to manage powerful N-dimensional array objects with sophisticated functions present to manipulate and operate over these arrays. To make this lesson complete, we will cover the following sections:

  • What is Python NumPy package?
  • NumPy arrays
  • Different operations which can be done over NumPy arrays
  • Some more special functions

What is Python NumPy package?

Simply put, NumPy stands for ‘Numerical Python’ and that is what it aims to fulfil, to allow complex numerical operations performed on N-dimensional array objects very easily and in an intuitive manner. It is the core library used in scientific computing, with functions present to perform linear algebraic operations and statistical operations.

One of the most fundamental (and attractive) concepts to NumPy is its usage of N-dimensional array objects. We can take this array as just a collection of rows and column, just like an MS-Excel file. It is possible to convert a Python list into a NumPy array and operate functions over it.

NumPy Array representation

Just a note before starting, we use a virtual environment for this lesson which we made with the following command:

python -m virtualenv numpy
source numpy/bin/activate

Once the virtual environment is active, we can install numpy library within the virtual env so that examples we create next can be executed:

pip install numpy

We see something like this when we execute the above command:

Let’s quickly test if the NumPy package has been installed correctly with the following short code snippet:

import numpy as np
a = np.array([1,2,3])
print(a)

Once you run the above program, you should see the following output:

We can also have multi-dimensional arrays with NumPy:

multi_dimension = np.array([(1, 2, 3), (4, 5, 6)])
print(multi_dimension)

This will produce an output like:

[[1 2 3]
[4 5 6]]

You can use Anaconda as well to run these examples which is easier and that is what we have used above. If you want to install it on your machine, look at the lesson which describes “How to Install Anaconda Python on Ubuntu 18.04 LTS” and share your feedback. Now, let us move forward to various types of operations that can be performed with with Python NumPy arrays.

Using NumPy arrays over Python lists

It is important to ask that when Python already has a sophisticated data structure to hold multiple items than why do we need NumPy arrays at all? The NumPy arrays are preferred over Python lists due to the following reasons:

  • Convenient to use for mathematical and compute intensive operations due to presence of compatible NumPy functions
  • They are much fast faster due to the way they store data internally
  • Less memory

Let us prove that NumPy arrays occupy less memory. This can be done by writing a very simple Python program:

import numpy as np

import time
import sys

python_list = range(500)
print(sys.getsizeof(1) * len(python_list))

numpy_arr = np.arange(500)
print(numpy_arr.size * numpy_arr.itemsize)

When we run the above program, we will get the following output:

14000
4000

This shows that the same size list is more than 3 times in size when compared to same size NumPy array.

Performing NumPy operations

In this section, let us quickly glance over the operations that can be performed on NumPy arrays.

Finding dimensions in array

As the NumPy array can be used in any dimensional space to hold data, we can find the dimension of an array with the following code snippet:

import numpy as np

numpy_arr = np.array([(1,2,3),(4,5,6)])
print(numpy_arr.ndim)

We will see the output as “2” as this is a 2-dimensional array.

Finding datatype of items in array

We can use NumPy array to hold any data type. Let’s now find out the data type of the data an array contains:

other_arr = np.array([('awe', 'b', 'cat')])
print(other_arr.dtype)

numpy_arr = np.array([(1,2,3),(4,5,6)])
print(numpy_arr.dtype)

We used different type of elements in the above code snippet. Here is the output this script will show:

<U3
int64

This happens as characters are interpreted as unicode characters and second one is obvious.

Reshape items of an array

If a NumPy array consists of 2 rows and 4 columns, it can be reshaped to contain 4 rows and 2 columns. Let’s write a simple code snippet for the same:

original = np.array([('1', 'b', 'c', '4'), ('5', 'f', 'g', '8')])
print(original)
reshaped = original.reshape(4, 2)
print(reshaped)

Once we run the above code snippet, we will get the following output with both arrays printed to the screen:

[['1' 'b' 'c' '4']
['5' 'f' 'g' '8']]

[['1' 'b']
['c' '4']
['5' 'f']
['g' '8']]

Note how NumPy took care of shifting and associating the elements to new rows.

Mathematical operations on items of an array

Performing mathematical operations on items of an array is very simple. We will start by writing a simple code snippet to find out maximum, minimum and addition of all items of the array. Here is the code snippet:

numpy_arr = np.array([(1, 2, 3, 4, 5)])
print(numpy_arr.max())
print(numpy_arr.min())
print(numpy_arr.sum())
print(numpy_arr.mean())
print(np.sqrt(numpy_arr))
print(np.std(numpy_arr))

In the last 2 operations above, we also calculated the square root and standard deviation of each array items. The above snippet will provide the following output:

5
1
15
3.0
[[1.   1.41421356   1.73205081   2.   2.23606798]]
1.4142135623730951

Converting Python lists to NumPy arrays

Even if you have been using Python lists in your existing programs and you don’t want to change all of that code but still want to make use of NumPy arrays in your new code, it is good to know that we can easily convert a Python list to a NumPy array. Here is an example:

# Create 2 new lists height and weight
height = [2.37,  2.87, 1.52, 1.51, 1.70, 2.05]
weight = [91.65, 97.52, 68.25, 88.98, 86.18, 88.45]

# Create 2 numpy arrays from height and weight
np_height = np.array(height)
np_weight = np.array(weight)

Just to check, we can now print out the type of one of the variables:

print(type(np_height))

And this will show:

<class 'numpy.ndarray'>

We can now perform a mathematical operations over all the items at once. Let’s see how we can calculate the BMI of the people:

# Calculate bmi
bmi = np_weight / np_height ** 2

# Print the result
print(bmi)

This will show the BMI of all the people calculated element-wise:

[16.31682957 11.8394056  29.54033934 39.02460418 29.8200692  21.04699584]

Isn’t that easy and handy? We can even filter data easily with a condition in place of an index inside square brackets:

bmi[bmi > 25]

This will give:

array([29.54033934, 39.02460418, 29.8200692 ])

Create random sequences & repetitions with NumPy

With many features present in NumPy to create random data and arrange it in a required form, NumPy arrays are many times used in generating test dataset at many places, including debugging and testing purposes. For example, if you want to create an array from 0 to n, we can use the arange (note the single ‘r’) like the given snippet:

print(np.arange(5))

This will return the output as:

[0 1 2 3 4]

The same function can be used to provide a lower value so that the array starts from other numbers than 0:

print(np.arange(4, 12))

This will return the output as:

[ 4  5  6  7  8  9 10 11]

The numbers need not be continuous, they can skip a fix step like:

print(np.arange(4, 14, 2))

This will return the output as:

[ 4 6 8 10 12]

We can also get the numbers in a decreasing order with a negative skip value:

print(np.arange(14, 4, -1))

This will return the output as:

[14 13 12 11 10 9 8 7 6 5]

It is possible to fund n numbers between x and y with equal space with linspace method, here is the code snippet for the same:

np.linspace(start=10, stop=70, num=10, dtype=int)

This will return the output as:

array([10, 16, 23, 30, 36, 43, 50, 56, 63, 70])

Please note that the output items are not equally spaced. NumPy does its best to do so but you need not rely on it as it does the rounding off.

Finally, let us look at how we can generate a set of random sequence with NumPy which is one of the most used function for testing purposes. We will pass a range of numbers to NumPy which will be used as an initial and final point for the random numbers:

print(np.random.randint(0, 10, size=[2,2]))

The above snippet creates a 2 by 2 dimensional NumPy array which will contain random numbers between 0 and 10. Here is the sample output:

[[0 4]
[8 3]]

Please note as the numbers are random, the output can differ even between the 2 runs on the same machine.

Conclusion

In this lesson, we looked at various aspects of this computing library which we can use with Python to compute simple as well as complex mathematical problems which can arise in various use-cases The NumPy is one of the most important computation library when it comes to data engineering and calculating numerical dat, definitely a skill we need to have under our belt.

Please share your feedback on the lesson on Twitter with @sbmaggarwal and @LinuxHint.

About the author

Shubham Aggarwal

Shubham Aggarwal

I’m a Java EE Engineer with about 4 years of experience in building quality products. I have excellent problem-solving skills in Spring Boot, Hibernate ORM, AWS, Git, Python and I am an emerging Data Scientist.