Python

Pandas Fill Nan with 0

Data science typically involves missing data. Either the entire row can be discarded or a value can be added to the row-column combination. Dropping the row/column would be absurd because it eliminates a certain metric for every row. NaN, which stands for “Not a Number”, is one of the typical ways to show a value that is missing from a set of data. To get the intended outcomes, handling NaN is quite important. Let’s find out how to change the NaN values in a row or column of a Pandas DataFrame to 0.

Method 1: Using Fillna()

The NA/NaN values are filled with the provided approach using the “fillna()” function. It can be utilized by considering the following syntax:

If you want to fill the NaN values for a single column, the syntax is as follows:

pandas.DataFrame_obj[‘column’].fillna(0)

 
If you want to fill the NaN values in the entire DataFrame, the syntax is as follows:

pandas.DataFrame_obj.fillna(0)

 
Example 1: Single Column

Let’s create a DataFrame named “documents” with two columns that includes some NaN values. We create them using NumPy. Now, let’s fill the NaN values with 0 in both columns, separately.

import pandas
import numpy

# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
                        'size':[numpy.nan,45,60,78,numpy.nan]})

print(documents,"\n")

# Fill NaN with 0 in the size column.
print(documents['size'].fillna(0),"\n")

# Fill NaN with 0 in the Color column.
print(documents['Color'].fillna(0))

 
Output:

  Color  size
0   red   NaN
1  blue  45.0
2   NaN  60.0
3   NaN  78.0
4   NaN   NaN

0     0.0
1    45.0
2    60.0
3    78.0
4     0.0
Name: size, dtype: float64

0     red
1    blue
2       0
3       0
4       0
Name: Color, dtype: object

 
Explanation:

First, we fill the NaN values with 0 in the “size” column. Then, we fill the NaN values with 0 in the “Color” column.

Example 2: Multiple Columns

Let’s fill the NaN values with 0 in the entire DataFrame.

import pandas
import numpy

# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
                        'size':[numpy.nan,45,60,78,numpy.nan]})

# Fill NaN with 0 in entire DataFrame
print(documents.fillna(0))

 
Output:

  Color  size
0   red   0.0
1  blue  45.0
2     0  60.0
3     0  78.0
4     0   0.0

 
Explanation:

First, we fill the NaN values with 0 in the entire DataFrame. Now, there are no NaN values in the “documents” DataFrame.

Method 2: Using Replace()

To replace a single column of NaN values, the provided syntax is as follows:

We need to pass the NaN values that has to be replaced with 0 as the first parameter and 0 as a second parameter that replaces the NaN values:

pandas.DataFrame_obj[‘column’].replace(numpy.nan,0)

 
Whereas, to replace the whole DataFrame’s NaN values, we use the following syntax:

pandas.DataFrame_obj.replace(numpy.nan,0)

 
Example 1: Single Column

Let’s create a DataFrame named “orders” with three columns that includes some NaN values. We create them using NumPy. Now, let’s replace the NaN values with 0 in the “price” and “product” columns, separately.

import pandas
import numpy

# Consider the DataFrame
orders=pandas.DataFrame({'product':["one","two",numpy.nan,numpy.nan,numpy.nan],
                        'price':[numpy.nan,45,60,78,numpy.nan],
                         'id':[1,2,3,4,5]})

print(orders,"\n")

# Replace NaN with 0 in the price column.
print(orders['price'].replace(numpy.nan,0),"\n")

# Replace NaN with 0 in the product column.
print(orders['product'].replace(numpy.nan,0))

 
Output:

  product  price  id
0     one    NaN   1
1     two   45.0   2
2     NaN   60.0   3
3     NaN   78.0   4
4     NaN    NaN   5

0     0.0
1    45.0
2    60.0
3    78.0
4     0.0
Name: price, dtype: float64

0    one
1    two
2      0
3      0
4      0
Name: product, dtype: object

 
Explanation:

There are three values in the “price” column and two values in the “product” column. First, we replace the NaN values with 0  in the  “price” column. Then, we replace the NaN values with 0 in the “product” column.

Example 2: Multiple Columns

Let’s create a DataFrame named “orders” with three columns that includes some NaN values. We create them using NumPy. Now, let’s replace the NaN values with 0 in the “price” and “product” columns, separately.

import pandas
import numpy

# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
                        'size':[numpy.nan,45,60,78,numpy.nan]})

# Replace NaN with 0 in entire DataFrame
print(documents.replace(numpy.nan,0),"\n")

 
Output:

  Color  size
0   red   0.0
1  blue  45.0
2     0  60.0
3     0  78.0
4     0   0.0

 
There are five NaN values in the “orders” DataFrame. After that, we replace the NaN values with 0.

Conclusion

Dealing with the missing entries in a DataFrame is a fundamental and necessary requirement to reduce the complexity and handle the data defiantly in the data analysis process. Pandas provides us with a few options to cope with this problem. We brought in two handy strategies in this guide.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain