Python

pandas Array to DataFrame

This pandas tutorial will teach you how to turn a NumPy array into a pandas DataFrame object. There are many ways to create a data frame. For instance, a pandas data frame can be produced from a dictionary. It is very simple to convert a 2-dimensional array into a data frame because pandas data frame objects are already 2-dimensional data structures. The pd.DataFrame() constructor is used to convert a NumPy array, similar to converting a dictionary.

What is a NumPy Array in Python?

Values in multiple dimensions can be stored using multidimensional arrays. A 3D array can be compared to a cube while an array in two dimensions can be compared to a matrix. Multidimensional arrays in Python are often built with the NumPy module. This method of data storage can make it simpler to arrange massive volumes of data into a manageable form. A two-dimensional NumPy array can be compared to a grid in which each box holds a value. In Python, you can also change an array’s data type to an integer if necessary.

Similar to an Excel sheet, a data frame is a table with columns and rows. For most purposes, the columns and rows describing the information/data are made up of your observations. A two-dimensional table is all that a pandas data frame is.

Syntax of DataFrame() constructor:

df = pd.DataFrame(numpy_array, index=['a','b'], columns=['num','chr'])

The first parameter, data, is the only one necessary. The array, which is required to turn into a data frame, will now be placed here. Be aware that you could utilize your data as an input if it is, for example, stored in a dictionary of Python.

Index: This is an index or an array-like index. If we don’t use this parameter, RangeIndex will be used by default.

columns: These are labels for the columns in an index or array-like data frame. Again, RangeIndex (0, 1, 2,…, n) will be used by default if we don’t use this argument.

How to Create a pandas Data Frame From a NumPy Array?

To create a pandas data frame from NumPy arrays, we have to follow the following steps:

  1. The NumPy and pandas modules will be imported in the first step.
  2. After importing the modules, we will create a NumPy array.
  3. Then, for the data frame, we will make a list of the index values and column values.
  4. In the fourth step, we will create the data frame.
  5. In the final step, we will display the data frame.

In this article, we’ll go over a few examples showing how to convert the array into a data frame in Python.

Example 1: Converting the NumPy Array Into a pandas Data Frame

Let’s first create an array. A nested list is added after NumPy is imported to form a 2-dimensional array as follows:

We have imported the NumPy module and created an array using the lists and named it “n_array”. Now, we will import the pandas module into our environment as follows:

We created the array (n_array) in the previous code. The DataFrame class is used in this instance. However, only the data parameter is used. The final data frame will resemble the following:

You can see that if the columns’ arguments are unused, the column names will be numbers. This is frequently the undesired outcome because it might be challenging to carry out the subsequent data analysis without understanding the variables the various numbers represent. The columns of the data frame can now be renamed, or the names can be predetermined when the data frame is created.

Example 2: Array to Data Frame Conversion With Column Names

Just like the first example, we will first create an array using the lists.

After creating the array, we will convert this array to a data frame using the following script. You may simply follow these steps to define column names and transform the created array into a data frame.

It is essential to understand that the columns’ parameter’s input must match the length of the array’s columns. For instance, since we updated our NumPy array to accommodate two columns, we must add the names of those columns. On the other hand, if we had a three-column array, we would need to insert a list of the names for the three columns.

The indexes are numbers, as you can see (0-3, our data frame). The index parameter will be used to modify the index column in the following example. You can change a column’s data type to datetime after creating your data frame if your data includes dates. Additionally, you can count occurrences or instances in one of your data frame columns by using the value_counts() method in pandas.

Example 3: Data Frame Creation From an Array With Customized Indexes

To demonstrate this example, we have to create an array using multiple lists.

When converting an array to a data frame, follow these instructions to create a custom index column:

Keep in mind that the list is used as the index and the index parameter in the data frame. Once more, we need to have an example list that has the same size as the index length for adding column names. We need to insert a list with four members because our sample array has four rows. Here is the NumPy array after conversion as follows:

Remember, after creating the pandas data frame, we may still create a column index for it. For instance, if we employ the set_index() function and specify the desired column as the index, we’re good to go. You can now begin studying your data since it has been placed in a data frame object. For instance, pandas has methods that let you build scatter plots, histograms, and data frames with additional columns.

Example 4: Array to pandas Data Frame Conversion With Custom Indexes and Column Names

Like previous examples, we will do the same (importing the pandas and numpy modules). After importing the modules, we will create an array with lists (the lists may contain more than one data type). We will also create two more lists, one for the name of indexes and the second for the names of the columns. In the last step, we will create our data frame using the pd.DataFrame() function. In the pd.DataFrame() function, we will pass the array, index, and column as an argument.

To print this data frame, we can simply write “df” (as we have specified the name “df” for our data frame), or we can print it using the print() function as seen below:

As you can see, the pd.DataFrame() function has successfully converted the array to the pandas data frame.

Conclusion

You have learned how to convert an array into a data frame in this pandas tutorial. You first studied pandas data frame objects and NumPy arrays. The syntax and DataFrame class, which we can utilize to generate data frame objects, were discussed. Then, we looked at three instances where we transformed NumPy arrays into pandas data frames. In conclusion, the two easy stages (importing pandas and NumPy and using your array’s pd.DataFrame()) for transforming an array to a relevant data frame were discussed.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.