Python Pandas

How to Fix Cannot convert non-finite values (NA or inf)

Data cleaning is the first step before processing data. Due to the missing data, it is not possible to convert the existing float type data into integer type. If you try to convert it by including the missing values, IntCastingNaNError will be encountered. The error – Cannot convert non-finite values (NA or inf) to integer. In this guide, we will see how to resolve this error by removing or replacing the missing values with a non-null value.

Error: Cannot Convert Non-finite Values (NA or inf) to Integer

The error is due to NoN-infinite values present in the pandas DataFrame object. Without cleaning the missing values (NaN, None etc), if you try to convert them into integers, this error is reproduced. First we will reproduce this error to get a better idea.

Create pandas DataFrame – budgets with one column – ‘Budget’ that holds 10 elements (4 are NA’s). Try to convert the Budget column into the integer type.

import pandas

# Create pandas DataFrame with one column
budgets = pandas.DataFrame({'Budget': [2000.00,None,5600.78,None,None,
6000,7000,8000,None,2500]})
print(budgets)

# Try to convert the Budget column into integer type.
budgets['Budget'].astype(int)

Output

You can see that IntCastingNaNError is encountered.

Solution 1: Filling Missing Values with Finite Values

The best approach is to fill the missing values with any other finite values. This replaces all the missing values with the specified finite value. After that, we can convert them into integers using the pandas.DataFrame.fillna() function. This function fills the NA/NaN values with the specified values.

Specification

Let’s see how to fill all the missing values with 0 using the

pandas.DataFrame.fillna(0)

Let’s see how to fill all the missing values with the mean of that column using the

pandas.DataFrame.fillna(DataFrame[‘column’].mean())

Example 1

Using the above DataFrame, first convert the Budget column to integers, and then fill any missing values with 0.

import pandas

budgets = pandas.DataFrame({'Budget': [2000.00,None,5600.78,None,None,6000,7000,8000,None,2500]})

# Fill Missing values with 0
budgets = budgets.fillna(0)
print(budgets,"\n")
print(budgets['Budget'].dtype,"\n")

# Convert the Budget column into integer type.
budgets['Budget']=budgets['Budget'].astype(int)
print(budgets['Budget'].dtype)

Output

NaN values are replaced with 0s. Previously, the type of the Budget column was float64, after converting to an integer, the type is int64.

Example 2

Using the above DataFrame, first convert the Budget column to integers, and then fill any missing values with the mean of the Budget column.

import pandas

budgets = pandas.DataFrame({'Budget': [2000.00,None,5600.78,None,None,6000,7000,8000,None,2500]})

# Fill Missing values with mean
budgets = budgets.fillna(budgets['Budget'].mean())
print(budgets,"\n")
print(budgets['Budget'].dtype,"\n")

# Convert the Budget column into integer type.
budgets['Budget']=budgets['Budget'].astype(int)
print(budgets['Budget'].dtype)

Output

NaN values are replaced with mean value, which is 5183.463333. Previously, the type of the Budget column was float64, after converting to an integer, the type is int64.

Solution 2: Dropping the Missing Values

To eliminate the error, we can drop all the missing values from the DataFrame before converting it to an integer. By using the pandas.DataFrame.dropna() function, we will achieve this functionality.

Specification

By default, this function drops all the missing values from the DataFrame if no parameter is specified.

pandas.DataFrame.dropna()

Example

Convert the Budget column into integer type by dropping all the NaN values.

import pandas

# Create pandas DataFrame with one column
budgets = pandas.DataFrame({'Budget': [2000.00,None,5600.78,None,None,6000,7000,8000,None,2500]})

# Drop the Missing values
budgets = budgets.dropna()
print(budgets,"\n")
print(budgets['Budget'].dtype,"\n")

# Convert the Budget column into integer type.
budgets['Budget']=budgets['Budget'].astype(int)
print(budgets['Budget'].dtype)

Output

There are 4 NaN values in the existing DataFrame’s Budget column, and all were removed first using the dropna() function. Then, we converted the Budget column into an integer type.

Solution 3: Using numpy.nan_to_num()

The numpy.nan_to_num() function converts the NaN values to 0 and infinite values to large finite numbers. We will apply this function to the DataFrame column before converting it into an integer.

Specification

Let’s see the syntax and parameters that are passed to this function

numpy.nan_to_num(input_data,copy, nan, posinf, neginf)
  1. input_data is mandatory for this function. In our case, it will be the DataFrame column.
  2. We can pass a value to the nan parameter to replace all the NaN values with this value. If this parameter is not specified, all NaN values are filled with 0s.
  3. We can pass the value to the posinf parameter such that all the positive infinite values are replaced by this value.
  4. We can pass the value to the neginf parameter such that all the negative infinite values are replaced by this value.

Example 1

Consider the DataFrame with Budget column that holds NaN along with some finite values. Convert this column to integer by replacing NaN with 0s.

import pandas

import numpy

# Create pandas DataFrame with one column
budgets = pandas.DataFrame({'Budget': [2000.00,None,5600.78,None,None,6000,7000,8000,None,2500]})
print(budgets,"\n")
print(budgets['Budget'].dtype,"\n")

# Convert the Budget column into integer type.
budgets['Budget']=numpy.nan_to_num(budgets['Budget']).astype(int)
print(budgets['Budget'].dtype)

Output

There are only NaNs in this DataFrame; no positive infinity or negative infinity values exist. The NaN values are replaced with 0s. Previously, the type of the Budget column was float64, after converting to an integer, the type is int64.

Example 2

Consider the DataFrame with Budget column that holds positive and negative infinity values along with some finite values. Convert this column to an integer by replacing positive infinity values with 10000 and negative infinity values with 0.

import pandas

import numpy

# Create pandas DataFrame with one column
budgets = pandas.DataFrame({'Budget': [2000.00,numpy.inf,5600.78,numpy.inf,-numpy.inf,6000,7000,8000,-numpy.inf,2500]})
print(budgets,"\n")
print(budgets['Budget'].dtype,"\n")

# Convert the Budget column into integer type.
budgets['Budget']=numpy.nan_to_num(budgets['Budget'],posinf=10000,neginf=0).astype(int)
print(budgets['Budget'])

Output

There are only infinity values in this DataFrame (no NaNs). They are replaced with 10000 for Positive Infinity and 0 for Negative Infinity. Previously, the type of the Budget column was float64, after converting to an integer, the type is int64.

Conclusion

Finally, we came to know that by either removing the missing values or by replacing the missing values with some other finite values like 0, the mean of the column, any other values, we are able to convert them into integer type without IntCastingNaNError – Cannot convert non-finite values (NA or inf) to integer. First, we reproduced this error and provided three solutions by considering the same DataFrame with different examples.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain