NumPy Array of Strings

We can explain a string as a group of characters. It has the same function as any other array, but it stores the characters in it and has fixed numbers of elements or indexes. Like any other language, NumPy also supports the character string arrays to store the characters or groups of characters in the form of arrays. It offers a simple and practical means of storing the Unicode characters and string characters. It acts as a constructor which uses a buffer to create the array. If the value of the buffer is equal to “none”, it creates the array with “C order” using “strides”. Otherwise, it creates an array with strides in “Fortran Order”.

Note that “C” and “Fortran” are two different programming languages. For this purpose, NumPy provides us with NumPy.chararray() functionality. The difference between creating the arrays with Numpy.chararray() function and creating the arrays with regular arrays with the type of string is that this class provides us with a few extra efficient functionalities. When the values are indexed, the chararray() function automatically removes the whitespaces at the end. In the same way, during comparisons, whitespaces are removed by the comparison operators themselves.

Syntax

The syntax to use the numpy.chararray() is as follows:

class numpy.chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0, strides=None, order=None)

Now, let us discuss the arguments that are passed to the function.

shape: It serves to specify how our array is shaped. It is an optional parameter.
itemsize: It is an optional parameter. It is an integer datatype parameter that is used to tell the length of the array.
unicode: This parameter has a Boolean datatype and is used to tell the function whether it takes the Unicode as input or not.
buffer: It is an optional parameter that tells the memory address of the data starting point.
offset: An optional parameter which is fixed stride displacement from beginning to end.
order: Value “C” or “F” is passed to it for Order. It is also an optional parameter.
where: It is an optional parameter and is condition based.

Example 1

To understand the numpy.chararray function in more detail, let’s discuss an example. In the following example, after including the NumPy library, we create a variable str_array and calls our numpy.chararray function against it. In our function, we have only given it one parameter which is (4,5). Here, the parameter is the shape of our array. As we discussed earlier in our introduction, the other parameters are optional, so we do not need to pass those parameters as the function works without those parameters and will not give any errors.

We initialize our str_arr variable with the name “a” in the following line. As we know, “a” is a character. We try to store a character string in our array, that’s why we have taken a character. Finally, we simply print the str_arr variable to see what it holds after the whole operation.

import numpy as np
str_arr = np.chararray((4, 5))
str_arr[:] = 'a'
print(str_arr)

The compiler generates the following output after executing our code. Let’s discuss what happened and why the system has given us this output. The total elements in our array are “20”. Our array has “4” rows and has “5” columns. This is because if we pass the value (4,5) as a parameter to our function, the function takes that parameter as the shape of the array. So, it creates our character array in such a shape that it must have four rows and five columns. After assigning the shape to our array, we pass a character “a” to our str_arr variable. In our output, we can see that the system printed the string “a” as the output, which means that this is our string array.

Example 2

In the previous example, we tried to explain how the chararray function works. In this example, we check whether this type is compatible with other datatypes for parsing or type casting or not. To check, we took two variables str_array and int_arr. As the name explains, the str_arr stores the string array and int_arr stores the int array. We pass “5” to our function which means that our array is 1d and have five elements.

We pass the numbers in string format to our array so that the system takes those values as characters. After that, we create a simple array, pass our string array to that simple array, and pass int32 as a parameter for its datatype. Now, we execute our code to check whether it converts our string array to an integer array or not.

import numpy as np
str_array = np.chararray(5)
str_array[:] = [b'1', b'0', b'1', b'0', b'1']
int_arr = np.array(str_array, dtype=np.int32)
print(str_array)
print(int_arr)

The following is the output that we got after the execution of our code. We print both arrays to compare their outputs, the first output of a string array. We can see that “b” is with every element and every element of the array is in single quotes (”) just because the system stores the strings in quotes. So, from the first output, we can say that our string array stores the numbers in string format. Now, let’s look forward to the second output.

In the second array, the numbers are the same as in the previous array. But the items of the following array merely differ where they are not enclosed in quotation marks. This is because the system does not store the integers of numbers with quotes. So, by looking at our output, we can say that we successfully changed the type of our array from string to integer.

Conclusion

In this tutorial, we briefly discussed the string arrays in NumPy. Arrays can be in any format like integers, characters, etc. We took a look at the numpy.charrray() function of the NumPy library. We tried to understand the behavior of string arrays by performing multiple examples. We also typecast the arrays from string to int successfully. There are many other ways to store and perform an operation on string arrays in NumPy but we explained the np.chararray function specifically which is an important function to provide a convenient view of arrays of string and Unicode values.

NumPy Array of Strings

Syntax

Example 1

Example 2

Conclusion

About the author

Omar Farooq