Python Pandas

Pandas.DataFrame.Update()

The pandas.DataFrame.update() is used to update/modify the values (Non-Null) from another DataFrame/Series. It considers the column labels from both the DataFrames and rows that are matched if both the columns are equal. The operation is done inplace (the existing DataFrame is updated). We will see how to use this function to update the missing and non-null values in the existing DataFrame and discuss the “ValueError: Data overlaps” when the “errors” parameter is set to “raise”.

Syntax:

Let’s see the syntax and parameters that are passed to this function:

pandas.DataFrame.update(other, join, overwrite, filter_func, errors)

1. other: The DataFrame object with same/different columns. At least one column should match to perform the update operation. If the Series is passed, the name attribute must be set and it is used as the column name for the DataFrame for a match.
2. join (By default = ‘left’): The LEFT JOIN is the default join to do the update. No other joins are allowed.
3. overwrite (By default = True): All the elements in the existing DataFrame are updated by default. Only the NaN values in the existing DataFrame are updated if this parameter is set to “False”.
4. filter_func (By default = None): We can control the updations based on the function. This parameter accepts a function that takes a column (as 1D – Array) as a parameter and returns the Boolean array that indicates whether the value has to be updated or not.
5. errors (By default = ‘ignore’): If both the DataFrames hold the non-null values and update the elements, it raises the “ValueError: Data overlaps”. To get rid of this, set this parameter to “ignore” or leave it as default.

With the Default Parameters

Let’s see how to update the existing DataFrame with the default parameters: join=’left’, overwrite=True, filter_func=None, errors=’ignore’.

1. Create two DataFrames: “Camp1_budgets” and “Camp2_budgets”. The “Camp1_budgets” holds “Day1” and “Day2” and the “Camp1_budgets” holds the “Day1” and “Day3”.
2. Use the update() function to update the “Camp1_budgets” with “Camp2_budgets” by specifying the default parameters.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[1200,1500],[2400,5400],[260,500],[500,500],[2000,4500]],
                                 columns=['Day1','Day2'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[400,1500],[200,5400],[360,500],[300,500],[800,4500]],
                                 columns=['Day1','Day3'])
print(Camp2_budgets,"\n")

# update() with default parameters
Camp1_budgets.update(Camp2_budgets,join='left', overwrite=True, filter_func=None, errors='ignore')
print(Camp1_budgets,"\n")

Output:

The “Day1” column matches in both DataFrames. So, only the values that are present in the “Day1” columns are updated. Finally, the “Day1” column of the “Camp1_budgets” holds [400, 200, 360, 300, 800].

With the Filter_Func Parameter

Let’s see how to use the filter_func parameter.

Example 1:

1. Create two DataFrames: “Camp1_budgets” and “Camp2_budgets”. The “Camp1_budgets” hold “Day1” and “Day2” and the “Camp1_budgets” holds “Day1” and “Day3”.
2. Use the update() function to update the “Camp1_budgets” with “Camp2_budgets” by specifying the filter_func parameter such that only the values that are greater than 500 are updated. The custom function that are passed to this parameter is as follows:

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[1200,1500],[2400,5400],[260,500],[500,500],[2000,4500]],
                                 columns=['Day1','Day2'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[800,1500],[200,5400],[1100,500],[300,500],[300,4500]],
                                 columns=['Day1','Day3'])
print(Camp2_budgets,"\n")

def cust_function(value):
   return (value > 500)

# update() with filter_func = cust_function
Camp1_budgets.update(Camp2_budgets, filter_func=cust_function)
print(Camp1_budgets,"\n")

Output:

In the “Day1” column of the first DataFrame, three values (1200, 2400 and 2000) are greater than 500. Only these three are updated to 800, 200, and 300.

Example 2:

1. Create two DataFrames: “Camp1_names” and “Camp2_names”. Both hold the same column.
2. Use the update() function to update the “Camp1_names” with “Camp2_names” by specifying the filter_func parameter such that only the elements that are equal to “Sales” are updated to “Sales-Linuxhint”. The custom function that are passed to this parameter is as follows:

import pandas

# first DataFrame - Camp1_names
Camp1_names = pandas.DataFrame([['Sales'],['Service']],
                                 columns=['Name'])
print(Camp1_names,"\n")

# second DataFrame - Camp2_names
Camp2_names = pandas.DataFrame([['Sales-Linuxhint'],['Service-Linuxhint']],
                                 columns=['Name'])
print(Camp2_names,"\n")

def cust_function(element):
   return (element == 'Sales')

# update() with filter_func = cust_function
Camp1_names.update(Camp2_names, filter_func=cust_function)
print(Camp1_names,"\n")

Output:

There is only one element which is “Sales”. It is updated to “Sales-Linuxhint” and the “Service” remains unchanged.

With Errors Parameter

Let’s see how to raise errors if the DataFrames hold non-null values and ignore the errors.

Example 1: Errors = “Ignore”

Let’s update the “Camp1_budgets” DataFrame with “Camp2_budgets” DataFrame by setting the errors parameter to “ignore”.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[1200,1500],[2400,5400],[260,500],[500,500],[2000,4500]],
                                 columns=['Day1','Day2'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[None,1500],[200,5400],[None,500],[300,500],[None,4500]],
                                 columns=['Day1','Day3'])
print(Camp2_budgets,"\n")

# update() with errors='ignore'
Camp1_budgets.update(Camp2_budgets, errors='ignore')
print(Camp1_budgets,"\n")

Output:

The “Day1” column in the existing DataFrame is updated with the “Day1” column in the other DataFrame without raising any error.

Example 2: Errors = “Raise”

Create the “Camp1_budgets” DataFrame with all missing values and the “Camp2_budgets” DataFrame with non-null values.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[None],[None],[None],[None],[None]],
                                 columns=['Day1'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[1200],[2400],[260],[500],[2000]],
                                 columns=['Day1'])
print(Camp2_budgets,"\n")

# update() with errors='raise'
Camp1_budgets.update(Camp2_budgets, errors='raise')
print(Camp1_budgets,"\n")

Output:

The “Day1” column in the existing DataFrame (Camp1_budgets) is updated with “Camp2_budgets” which has the “Day1” column values. The error is not raised since all null values are mapped with non-null values.

Example 3: Errors = “Raise” : Data Overlaps

Set it to “True” to update the values in the “Camp1_budgets” DataFrame with “Camp2_budgets” DataFrame with errors = “raise”.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[1200],[2400],[260],[500],[2000]],
                                 columns=['Day1'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[None],[None],[567],[None],[None]],
                                 columns=['Day1'])
print(Camp2_budgets,"\n")

# update() with errors='raise'
Camp1_budgets.update(Camp2_budgets, errors='raise')
print(Camp1_budgets,"\n")

Output:

The “Day1” column in the existing other DataFrame (Camp2_budgets) contains one non-null value. This value is overlapped with a non-null value, i.e. 260 overlaps with 567. So, the ValueError is raised.

With Overwrite Parameter

Example 1: Overwrite = True

Let’s update the “Camp1_budgets” DataFrame with “Camp2_budgets” DataFrame by setting the overwrite to “True”.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[None,1500],[200,5400],[None,500],[300,500],[None,4500]],
                                 columns=['Day1','Day3'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[1200,1500],[2400,5400],[260,500],[500,500],[2000,4500]],
                                 columns=['Day1','Day2'])
print(Camp2_budgets,"\n")

# update() with overwrite=True
Camp1_budgets.update(Camp2_budgets, overwrite=True)
print(Camp1_budgets,"\n")

Output:

All the values that are present in the “Day1” column of “Camp1_budgets” are updated with the values of the “Day1” column from the “Camp2_budgets”.

Example 2: Overwrite = False

Let’s update the “Camp1_budgets” DataFrame with “Camp2_budgets” DataFrame by setting the overwrite to “False”.

import pandas

# first DataFrame - Camp1_budgets
Camp1_budgets = pandas.DataFrame([[None,1500],[200,5400],[None,500],[300,500],[None,4500]],
                                 columns=['Day1','Day3'])
print(Camp1_budgets,"\n")

# second DataFrame - Camp2_budgets
Camp2_budgets = pandas.DataFrame([[1200,1500],[2400,5400],[260,500],[500,500],[2000,4500]],
                                 columns=['Day1','Day2'])
print(Camp2_budgets,"\n")

# update() with overwrite=False
Camp1_budgets.update(Camp2_budgets, overwrite=False)
print(Camp1_budgets,"\n")

Output:

Only the NaN values that are present in the “Day1” column of “Camp1_budgets” are updated with the values in the “Day1” column from “Camp2_budgets”. There are three NaN values in the “Camp1_budgets” and are updated to 1200, 260, and 2000.

Conclusion

In Pandas DataFrame, the data updates are possible using the pandas.DataFrame.update() function. The pandas.DataFrame.update() is used to update/modify the values (non-null) from another DataFrame/Series. All the parameters are discussed with examples to understand the concept better. The data overlap error is discussed in detail with an example under the “raise” errors parameter example.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain