Python Pandas

Pandas Mask

Using the mask function in Pandas, we can flip the actual result; the correct result is replaced with another. This is opposite to the where() function. In this guide, we will see how to use the pandas.Series.mask() with Series and pandas.DataFrame.mask() with the DataFrame. The functionality is the same but the data structures are different.

Pandas.Series.Mask

The pandas.Series.mask() function replaces the values where the condition/s is “True”. If the condition is not satisfied, it keeps the original value and replaces the value in which the condition is satisfied with corresponding value from the other Series or any other variable.

Syntax:

pandas.Series.mask(condition, other,inplace, axis=0, level)

1. It takes the condition as the first parameter.

2.other: This represents a variable or another Series or any callable object in which the values in the current Series are replaced by the values that are present in this variable if the condition is satisfied. If this parameter is not specified, the values that satisfy the condition are replaced as missing value (NaN).

3. Since Series is a one dimensional data structure, the axis is 0 by default.

4. The “level” parameter is used for the alignment level.

5.inplace: We can perform the operation in place of the given Series by setting this parameter to “True”. It is “False” by default.

Example 1:

1. Create the account_status Series with five strings.

2. Create a new “active” Series from the existing Series such that the mask is set to NaN if the element is “Active”. Otherwise, the element that is present in the account_status will be populated directly.

import pandas

# Create Series - account_status with 5 strings
account_status = pandas.Series(['Active', 'Inactive', 'Active', 'Inactive', 'Inactive'])
print(account_status,"\n")

# Create new Series based on the above Series
active=account_status.where(account_status == 'Active')
print(active)

Output:

There are two elements with “Active” in the first Series. In the second Series, NaN is updated for these two elements and the remaining three are filled with “Inactive”.

Example 2:

Use the same Series and create a new “active” Series from the existing Series. If the element is not equal to “Active”, we set the “Not activated” string in this new Series. If the element in the account_status is equal to “Active”, we keep the same in the “active” Series.

import pandas

# Create Series - account_status with 5 strings
account_status = pandas.Series(['Active', 'Inactive', 'Active', 'Inactive', 'Inactive'])
print(account_status,"\n")

# Create new Series based on the above Series
active=account_status.mask(account_status != 'Active',"Not activated")
print(active)

Output:

There are three elements that are not equal to “Active” in the first Series (account_status). So, in the active Series, we set the element to “Not activated” and two are “Active” in the existing Series. These are directly populated into the “active” Series.

Example 3:

Create two Series – “budget1” and “budget2” – with five integers and check if all the values present in “budget1” is greater than 1000. If the condition is satisfied, set the value from the “budget2”. Specify the “inplace” parameter in the pandas.Series.mask() function by setting it to “False” and “True”, separately.

import pandas

# Create Series - budget1 with 5 values
budget1 = pandas.Series([1200,234,500,2500,100])
print(budget1,"\n")

# Create Series - budget2 with 5 values
budget2 = pandas.Series([200,455,900,700,600])
print(budget2,"\n")

# inplace = False
budget1.mask(budget1 > 1000,budget2,inplace=False)
print(budget1,"\n")

# inplace = False
budget1.mask(budget1 > 1000,budget2,inplace=True)
print(budget1)

Output:

1. The “budget1” Series is not updated since inplace is “False”.

2. The “budget2” Series is not updated based on the conditions since inplace is “True”. There are three values (234, 500, and 100) that are not greater than 1000. So, these three remain the same. 1200 and 2500 are replaced with values (200, 700) from “budget2”.

Pandas.DataFrame.Mask

The pandas.DataFrame.mask() function replaces the values in the entire row or in a specific row for a column where the condition/s is True. If the condition is not satisfied, it keeps the original value and replaces the value in which the condition is satisfied with the corresponding value from the other DataFrame/Series.

Syntax:

Look at the syntax of the pandas.DataFrame.mask() function. The parameters are the same as Series.

pandas.DataFrame.mask(condition, other, inplace, level)

Example 1:

Create the “Accounts” Pandas DataFrame with four columns and five rows. The rows are replaced with NaN if the revenue is greater than 2000. This condition is specified by specifying the “inplace” parameter with “True” and “False”, separately.

import pandas

# Create pandas DataFrame
Accounts = pandas.DataFrame({'Name':["Government","Agriculture","Banking","Education","Finance"],
                  'Type' :['Partner','Customer-Direct','Partner','Partner','Customer-Direct'],
                  'Revenue':[25000,45000,20000,15000,10000],
                  'Rating':['Hot','Cool','Cool','Cool','Hot']})
print(Accounts,"\n")

# inplace=False
Accounts['Revenue'].mask(Accounts.Revenue > 20000,inplace=False)
print(Accounts,"\n")

# inplace=True
Accounts['Revenue'].mask(Accounts.Revenue > 20000,inplace=True)
print(Accounts)

Output:

The first output returns the actual DataFrame and the second output is also the same as the first output since the operation is not done in place. So, the DataFrame is not updated. In the last output, we can see that the values are changed in the “Revenue” column. There are two values in the existing DataFrame which are greater than 20000. So, these are replaced with NaN and the other three values remain the same.

Example 2:

Use the same DataFrame that is created in the first example and replace the existing values in the “Name” column as “Not Government” which are not equal to “Government”. We need to pass the the condition with inplace and other parameters like Accounts.Name != ‘Government’,”Not Government”,inplace=True. The mask() function flips the specified condition. So, we need to specify the not equal operator.

import pandas


# Create pandas DataFrame
Accounts = pandas.DataFrame({'Name':["Government","Agriculture","Banking","Education","Finance"],
                  'Type' :['Partner','Customer-Direct','Partner','Partner','Customer-Direct'],
                  'Revenue':[25000,45000,20000,15000,10000],
                  'Rating':['Hot','Cool','Cool','Cool','Hot']})
print(Accounts,"\n")


# Replace existing values in the Name column as "Not Government" which are Not equal to "Government"
Accounts['Name'].mask(Accounts.Name  != 'Government',"Not Government",inplace=True)
print(Accounts)

Output:

The first output returns the actual DataFrame. There are four elements that are not equal to “Government” in the “Name” column. So, they are replaced with “Not Government” in the “Name” column.

Example 3:

Let’s specify multiple conditions. Filter the rows with Cool Rating and Partner Type. The pandas.DataFrame.mask() flips the (Accounts[‘Rating’] != ‘Cool’) | (Accounts[‘Type’] != ‘Partner’) condition. The “!=” operator is flipped to “==” and “|” (OR) is flipped to “&” (AND). This way, it selects the rows with Cool Rating and Partner Type.

import pandas


# Create pandas DataFrame
Accounts = pandas.DataFrame({'Name':["Government","Agriculture","Banking","Education","Finance"],
                  'Type' :['Partner','Customer-Direct','Partner','Partner','Customer-Direct'],
                  'Revenue':[25000,45000,20000,15000,10000],
                  'Rating':['Hot','Cool','Cool','Cool','Hot']})
print(Accounts,"\n")


Accounts_filtered= Accounts.mask((Accounts['Rating']  != 'Cool') | (Accounts['Type']  != 'Partner'))
print(Accounts_filtered)

Output:

There are two rows with the Cool rating with the Partner type. The remaining three rows are replaced with NaN.

Conclusion

We learned how to filter the rows in the Pandas DataFrame using the pandas.DataFrame.mask() function. Similarly, the data is filtered in the Series using the pandas.Series.mask() function. This functionality is opposite to the where() function. In both scenarios, the syntax and functionality are the same but the data structures are different (the Series is 1D and the DataFrame is 2D). Multiple conditions can also be specified in the mask() function. We discussed one example under the DataFrame.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain