Python data science libraries, such as NumPy, Pandas, and others are used by Data scientists to perform fast, modular, and efficient data analysis. We can use the methods and functions of these libraries to perform certain tasks on our data. For example, if we want to create a new column based on particular conditions various methods are used in Python.
In this guide, you will be able to create a DataFrame column based on the condition using the following methods:
Method 1: Create a DataFrame Column Based on Condition Using “List Comprehension”
The “List Comprehension” method is used to create/construct a DataFrame column based on the condition. Here, the new column “Group” is created based on the age value greater or equal to “18”:
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = ['A' if x >=18 else 'B' for x in df['Age']]
print(df)
The new column has been created successfully:
Method 2: Create a DataFrame Column Based on Condition Using “Numpy.where()” Method
In Python, the “numpy.where()” method retrieves the element indices in a specified array where the input condition is fulfilled. In the code below, first we create a dictionary with three columns. After that, the “numpy.where()” method is used to construct a new column based on the specified condition. This method takes three arguments, such as a condition, a value to assign if the condition is true, and a value to assign if the condition is false:
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = numpy.where(df['Sex'] == 'M', 'A', 'B')
print(df)
The new column has been created with Group values “A” and “B”:
Method 3: Create a DataFrame Column Based on Condition Using “Numpy.select()” Method
The “numpy.select()” method retrieves an array that has been selected from the choice list based on the conditions. Here, the “numpy.select()” method takes three values as an argument, conditions to apply, the value if the condition is satisfied, and the default value where the condition is not satisfied:
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Salary'] = numpy.select([(df['Age'] >= 18)& (df['Sex'] == 'M')],[1000], default=500)
print(df)
The below output created the new column based on the condition:
Method 4: Create a DataFrame Column Based on Condition Using “Numpy.apply()” Method
According to the below-given code, the “numpy.apply()” method is used along with the specified function to create a new column. The newly created column will show the length of the specified columns:
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Name Character'] = df['Name'].apply(len)
print(df)
The new column has been created successfully:
Method 5: Create a DataFrame Column Based on Condition Using “DataFrame.map()” Method
The “df.map()” method is used to apply the dictionary or function to each element of the series. In this example code, we create a new column called “Group” in the DataFrame by using the “df.map()” method. Here, the dictionary we used will map the column value “M” to “A” and “F” to “B”. The df.map() method returns a new Series object with the mapped values, which is then assigned to the Group column of the DataFrame:
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = df['Sex'].map({'M': 'A', 'F': 'B'})
print(df)
The new column has been created successfully:
Conclusion
The “List Comprehension”, “np.where()”, “np.select()”, “np.apply()” and “df.map()” methods are used to create a DataFrame column based on the condition. All of these methods can easily create columns based on the specified single or multiple conditions by applying the function. This tutorial delivered a detailed guide on creating columns based on condition.