Data science typically involves missing data. Either the entire row can be discarded or a value can be added to the row-column combination. Dropping the row/column would be absurd because it eliminates a certain metric for every row. NaN, which stands for “Not a Number”, is one of the typical ways to show a value that is missing from a set of data. To get the intended outcomes, handling NaN is quite important. Let’s find out how to change the NaN values in a row or column of a Pandas DataFrame to 0.
Method 1: Using Fillna()
The NA/NaN values are filled with the provided approach using the “fillna()” function. It can be utilized by considering the following syntax:
If you want to fill the NaN values for a single column, the syntax is as follows:
If you want to fill the NaN values in the entire DataFrame, the syntax is as follows:
Example 1: Single Column
Let’s create a DataFrame named “documents” with two columns that includes some NaN values. We create them using NumPy. Now, let’s fill the NaN values with 0 in both columns, separately.
import numpy
# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
'size':[numpy.nan,45,60,78,numpy.nan]})
print(documents,"\n")
# Fill NaN with 0 in the size column.
print(documents['size'].fillna(0),"\n")
# Fill NaN with 0 in the Color column.
print(documents['Color'].fillna(0))
Output:
0 red NaN
1 blue 45.0
2 NaN 60.0
3 NaN 78.0
4 NaN NaN
0 0.0
1 45.0
2 60.0
3 78.0
4 0.0
Name: size, dtype: float64
0 red
1 blue
2 0
3 0
4 0
Name: Color, dtype: object
Explanation:
First, we fill the NaN values with 0 in the “size” column. Then, we fill the NaN values with 0 in the “Color” column.
Example 2: Multiple Columns
Let’s fill the NaN values with 0 in the entire DataFrame.
import numpy
# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
'size':[numpy.nan,45,60,78,numpy.nan]})
# Fill NaN with 0 in entire DataFrame
print(documents.fillna(0))
Output:
0 red 0.0
1 blue 45.0
2 0 60.0
3 0 78.0
4 0 0.0
Explanation:
First, we fill the NaN values with 0 in the entire DataFrame. Now, there are no NaN values in the “documents” DataFrame.
Method 2: Using Replace()
To replace a single column of NaN values, the provided syntax is as follows:
We need to pass the NaN values that has to be replaced with 0 as the first parameter and 0 as a second parameter that replaces the NaN values:
Whereas, to replace the whole DataFrame’s NaN values, we use the following syntax:
Example 1: Single Column
Let’s create a DataFrame named “orders” with three columns that includes some NaN values. We create them using NumPy. Now, let’s replace the NaN values with 0 in the “price” and “product” columns, separately.
import numpy
# Consider the DataFrame
orders=pandas.DataFrame({'product':["one","two",numpy.nan,numpy.nan,numpy.nan],
'price':[numpy.nan,45,60,78,numpy.nan],
'id':[1,2,3,4,5]})
print(orders,"\n")
# Replace NaN with 0 in the price column.
print(orders['price'].replace(numpy.nan,0),"\n")
# Replace NaN with 0 in the product column.
print(orders['product'].replace(numpy.nan,0))
Output:
0 one NaN 1
1 two 45.0 2
2 NaN 60.0 3
3 NaN 78.0 4
4 NaN NaN 5
0 0.0
1 45.0
2 60.0
3 78.0
4 0.0
Name: price, dtype: float64
0 one
1 two
2 0
3 0
4 0
Name: product, dtype: object
Explanation:
There are three values in the “price” column and two values in the “product” column. First, we replace the NaN values with 0 in the “price” column. Then, we replace the NaN values with 0 in the “product” column.
Example 2: Multiple Columns
Let’s create a DataFrame named “orders” with three columns that includes some NaN values. We create them using NumPy. Now, let’s replace the NaN values with 0 in the “price” and “product” columns, separately.
import numpy
# Consider the DataFrame
documents=pandas.DataFrame({'Color':["red","blue",numpy.nan,numpy.nan,numpy.nan],
'size':[numpy.nan,45,60,78,numpy.nan]})
# Replace NaN with 0 in entire DataFrame
print(documents.replace(numpy.nan,0),"\n")
Output:
0 red 0.0
1 blue 45.0
2 0 60.0
3 0 78.0
4 0 0.0
There are five NaN values in the “orders” DataFrame. After that, we replace the NaN values with 0.
Conclusion
Dealing with the missing entries in a DataFrame is a fundamental and necessary requirement to reduce the complexity and handle the data defiantly in the data analysis process. Pandas provides us with a few options to cope with this problem. We brought in two handy strategies in this guide.