Python Pandas

Pandas.DataFrame.From_Records

The pandas.DataFrame.from_records() is used to create a DataFrame object from a list of tuples,  structured ndarray, or a sequence of dictionaries from another DataFrame object. We can modify or specify the column labels in creating the DataFrame with the “columns” parameter. There are other parameters to read a specific number of rows. We will see them under the parameters.

Syntax:

Let’s see the syntax and parameters that are passed to this function:

pandas.DataFrame.from_records(data, index, exclude, columns, coerce_float, nrows)
  1. As we already discussed, the data is the necessary parameter to create a DataFrame. It can be a list of tuples or structured array (example: NumPy array) or other DataFrame or list of dictionaries.
  2. It is possible to set the row indices for creating the DataFrame using the “index” parameter (by default = None). It takes a list of index labels. If it is not specified, the DataFrame is created with default row indices [0, 1, 2,..n-1].
  3. We can exclude the existing columns from the data while creating the DataFrame using the “exclude” parameter (by default = None). It takes a list of column labels to be removed.
  4. While creating the DataFrame, we can specify the column labels using the “columns” parameter (By default = None). If the existing data holds the column labels already, they are replaced with the specified column names if it is specified.
  5. The “coerce_float” parameter (by default = False) is set to “True” if you want to convert the decimal type columns to float.
  6. The “nrows” parameter (by default = None) reads a specific number of rows into the DataFrame if the data is an iterator. It reads all the rows if the parameter is not specified. It takes an integer that specifies the number of rows.

1. Create a DataFrame

Let’s see how to create a DataFrame from the list of types, list of dictionaries, and structured array with separated examples.

Example 1: From a List of Tuples 

Let’s create a “Products” DataFrame from a list of tuples. There are five tuples in the list and each tuple holds two elements. The first element is the integer and the second element is the string.

import pandas

list_of_tuples = [(101, 'Furniture'), (102, 'Paints'), (103, 'Steel/Iron'), (104, 'Plastic'), (105, 'Cement & sand')]

# Create DataFrame from list of tuples using from_records()
Products = pandas.DataFrame.from_records(list_of_tuples)
print(Products)

Output:

The “Products” DataFrame is created with five rows. The first column holds the integer values and the second column holds the strings from the tuples. The columns and indices of the DataFrame start with 0 by default.

Example 2: From a List of Dictionaries

Let’s create a “Products” DataFrame from a list of dictionaries. There are five dictionaries in the list such that each dictionary holds the “P_id” as the key and the “P_name” as the value.

import pandas

list_of_dictionaries =  [{'P_id': 101, 'P_name': 'Furniture'},
 {'P_id': 102, 'P_name': 'Paints'},
 {'P_id': 103, 'P_name': 'Steel/Iron'},
 {'P_id': 104, 'P_name': 'Plastic'},
 {'P_id': 105, 'P_name': 'Cement & sand'}]

 # Create DataFrame from list of dictionaries using from_records()
Products = pandas.DataFrame.from_records(list_of_dictionaries)
print(Products)

Output:

The “Products” DataFrame is created with five rows along with the column labels (from the dictionary key). The first column holds the integer values and the second column holds the strings from the tuples.

Example 3: From a Structured Array

Let’s create a “Products” DataFrame from the NumPy array. The NumPy array is created with five tuples with two elements each. While creating the array, we also specify the data type with labels. The first element in each tuple is the “i” integer and the data type of the second element in each tuple is U16 (16-character string).

import pandas
import numpy

array = numpy.array([(101, 'Furniture'), (102, 'Paints'), (103, 'Steel/Iron'), (104, 'Plastic'), (105, 'Cement & sand')],dtype=[('P_id', 'i'), ('P_name', 'U16')])

 # Create DataFrame from the structured ndarray using from_records()
Products = pandas.DataFrame.from_records(array)
print(Products)

Output:

The “Products” DataFrame is created with five rows along with the column labels (from dtype).

2. With the Columns Parameter

Let’s see how to create a DataFrame by passing the “columns” parameter to the function.

Example 1:

Let’s create a “Products” DataFrame from a list of dictionaries. There are five dictionaries with key:value pairs.

  1. First, we create a DataFrame with the existing columns that are present in the dictionary.
  2. Now, we pass the “columns” parameter with the columns in order [“Price”, “P_id”, “P_name”].
import pandas

list_of_dictionaries =  [{'P_id': 101, 'P_name': 'Furniture','Price':34000},
 {'P_id': 102, 'P_name': 'Paints','Price':45000},
 {'P_id': 103, 'P_name': 'Steel/Iron','Price':50000},
 {'P_id': 104, 'P_name': 'Plastic','Price':12000},
 {'P_id': 105, 'P_name': 'Cement & sand','Price':22000}]

# Without columns parameter
Products = pandas.DataFrame.from_records(list_of_dictionaries)
print(Products,"\n")

# With columns parameter
Products = pandas.DataFrame.from_records(list_of_dictionaries,columns=['Price','P_id','P_name'])
print(Products)

Output:

In the first output, the DataFrame is created from the list of dictionaries. In the second output, we specify the columns in different order. So, the columns are interchanged.

Example 2:

Let’s create a “Products” DataFrame from a list of tuples. There are five tuples with two elements each. Create a DataFrame from a list of tuples by specifying the columns [“P_id”, “P_name”].

import pandas

list_of_tuples = [(101, 'Furniture'), (102, 'Paints'), (103, 'Steel/Iron'), (104, 'Plastic'), (105, 'Cement & sand')]

Products = pandas.DataFrame.from_records(list_of_tuples,columns=['P_id','P_name'])
print(Products,"\n")

Output:

The DataFrame is created with column labels.

3.   With the Index Parameter

Let’s see how to create a DataFrame by including the custom row indices.

Example:

Let’s create a “Products” DataFrame from a list of dictionaries by specifying the row indices as  [“product-1”, “product-2”, “product-3”, “product-4”, “product-5”].

import pandas

list_of_dictionaries =  [{'P_id': 101, 'P_name': 'Furniture'},
 {'P_id': 102, 'P_name': 'Paints'},
 {'P_id': 103, 'P_name': 'Steel/Iron'},
 {'P_id': 104, 'P_name': 'Plastic'},
 {'P_id': 105, 'P_name': 'Cement & sand'}]

 # Specify the Index
Products = pandas.DataFrame.from_records(list_of_dictionaries,index=['product-1','product-2','product-3','product-4','product-5'])
print(Products,"\n")

Output:

The DataFrame is created from a list of dictionaries with the specified row indices.

4. With the Exclude Parameter

Let’s see how to create a DataFrame by excluding the columns.

  1. Create a “Products” DataFrame from a list of dictionaries with columns.
  2. Create the same DataFrame by excluding single column (“Price”).
  3. Create the same DataFrame by excluding multiple columns (“Price”, “P_id”, “P_name”).
import pandas

list_of_dictionaries =  [{'P_id': 101, 'P_name': 'Furniture','Price':34000},
 {'P_id': 102, 'P_name': 'Paints','Price':45000},
 {'P_id': 103, 'P_name': 'Steel/Iron','Price':50000},
 {'P_id': 104, 'P_name': 'Plastic','Price':12000},
 {'P_id': 105, 'P_name': 'Cement & sand','Price':22000}]
Products = pandas.DataFrame.from_records(list_of_dictionaries)
print(Products,"\n")

# exclude single columns
Products = pandas.DataFrame.from_records(list_of_dictionaries,exclude=['Price'])
print(Products,"\n")

# exclude all columns
Products = pandas.DataFrame.from_records(list_of_dictionaries,exclude=['Price','P_id','P_name'])
print(Products)

Output:

  1. The DataFrame is created by excluding the “Price” column in the second output.
  2. In the last output, the DataFrame is created by excluding all the columns. So, the DataFrame is empty.

5.  With the Nrows Parameter

  1. Create a “Products” DataFrame from a list of tuples with three rows (nrows = 3).
  2. Create the same DataFrame with one row (nrows = 1).
  3. Create the same DataFrame with 0 rows (nrows = 0).
import pandas

list_of_tuples = iter([(101, 'Furniture'), (102, 'Paints'), (103, 'Steel/Iron'), (104, 'Plastic'), (105, 'Cement & sand')])

# Read 3 rows
Products = pandas.DataFrame.from_records(list_of_tuples,columns=['P_id','P_name'],nrows=3)
print(Products,"\n")

# Read 1 row
Products = pandas.DataFrame.from_records(list_of_tuples,columns=['P_id','P_name'],nrows=1)
print(Products,"\n")

# Read 0 row
Products = pandas.DataFrame.from_records(list_of_tuples,columns=['P_id','P_name'],nrows=0)
print(Products)

Output:

Conclusion

Now, you can create the Pandas DataFrame from the array, list of tuples, and list of dictionaries using the pandas.DataFrame.from_records() function. Under the first scenario, we learned the DataFrame creation from the given data. A single or multiple columns can be excluded from the DataFrame while creation by specifying the “exclude” parameter. Lastly, we learned how to create a DataFrame with a specific number of rows by passing the “nrows” parameter.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain