Python Pandas

Pandas Get_Dummies()

While working with datasets or machine learning algorithms, it is often necessary to convert categorical variables into dummy variables. The categorical variables are represented by different categories rather than numerical values. To convert/transform the categorical variables to dummy variables, the “pandas.get_dummies()” method of the “pandas” module is utilized in Python.

This write-up will deliver a thorough guide on the “pandas.get_dummies()” method utilizing multiple examples.

What is the Python “pandas.get_dummies()” Method?

In Python, the “pandas.get_dummies()” method is utilized to transform categorical variables into dummy/indicator variables. In this process, each unique categorical variable value will be represented by a new column with a value of “1” if the original value was equal to that category and “0” otherwise.

Syntax

pandas.get_dummies(data, prefix=None, columns=None, prefix_sep='_', drop_first=False, dummy_na=False, sparse=False, dtype=None)

 

 

Parameters

Here:

  • The “data” parameter indicates the data, such as Series, DataFrame, or Array, that needs to be manipulated.
  • The “prefix=” parameter represents the string that is used to append the returned column names of the DataFrame.
  • The “prefix_sep=” parameter represents the separator that is utilized while appending the prefix.
  • The “dummy_na” parameter is used to represent NaN values by adding a column.
  • The “columns” parameter demonstrates the names of the DataFrame column that needs to be encoded.
  • The “sparse”, “drop_first”, and “dtype” parameters are optional and are used for specific purposes.

Return Value

The “pandas.get_dummies()” method retrieves the DataFrame with dummy coded categorical data (0s and 1s).

Example 1: Applying the “Pandas.get_dummies()” Method to Convert Categorical Variables into Dummy Variables

In the below code, the “pandas” module is imported, and the Series object is created by utilizing the “pandas.Series()” function. Next, the “pandas.get_dummies()” method is used to convert the categorical variables string into numerical variables using the specified encoding. In this encoding, the new columns for each unique value in the series are created with “1” or “0”. These values indicate the presence and absence of the value:

import pandas
list_value = ['A', 'B', 'C', 'D']
data = pandas.Series(list_value)
print(pandas.get_dummies(data))

 

Here is the output:

Example 2: Applying the “Pandas.get_dummies()” Method to Convert Categorical Variables into Dummy Variables Along With NaN Values

In this example, the “pandas” and “numpy” modules are imported, then the list is initialized with “NaN” values. Next, the “pandas.Series()” method is used to create the series object by taking the list as an argument. Finally, the “pandas.get_dummies()” method takes the Series data and “dummy_na” as an argument to convert the categorical variables into dummy variables and construct new columns for NaN values. Here is an example that demonstrates this:

import pandas
import numpy
list_value = ['A', 'B',numpy.nan, 'C', 'D', numpy.nan]
data = pandas.Series(list_value)
print(pandas.get_dummies(data, dummy_na=True))

 

The NaN values have been retrieved into the new columns of dummies variables:

Example 3: Applying the “Pandas.get_dummies()” Method to Convert Categorical Variables of DataFrame Columns into Dummy Variables

In this code, the “pandas.get_dummies()” method is applied to the DataFrame columns named “grade”. This method converts the categorical variables of the DataFrame column to dummy variables. Take the below code to illustrate this method:

import pandas
df = pandas.DataFrame({"name":["Joseph","Anna","Lily","Henry"],
                   "age":[15,25,19,13],
                   "grade":["A","B","A","C"]})
print(df, '\n')
print(pandas.get_dummies(df, columns = ['grade']))

 

The above code returns a DataFrame by removing the original column and replacing it with the three new columns. Here, the “1” and “0” values are placed at those positions where the value appears in the original columns for example, the variable “A” placed at the “0” and “2” index in the DataFrame columns replaced with the “1” value in the new column named “grade_A” and all other values are considered to be “0”. In this manner, all the new columns are created with dummy values.

Example 4: Applying the “Pandas.get_dummies()” Method to Convert Categorical Variables of DataFrame Columns into Dummy Variables By Dropping First Column

The “drop_first=True” is passed to the “pandas.get_dummies()” method to drop the first column of the categorical variables. Here is an example code to drop the first column:

import pandas
df = pandas.DataFrame({"name":["Joseph","Anna","Lily","Henry"],
                   "age":[15,25,19,13],
                   "grade":["A","B","A","C"]})
print(df, '\n')
print(pandas.get_dummies(df, columns = ['grade'],drop_first = True))

 

As you can see, the first categorical variable of the DataFrame column has been removed successfully:

Example 5: Applying the “Pandas.get_dummies()” Method to Convert Categorical Variables of DataFrame Columns into Dummy Variables By Specifying the Prefix

In the below code, the “prefix=” parameter is used to create a prefix dummy variable name instead of the default categorical variable name. Take a look at the following example:

import pandas
df = pandas.DataFrame({"name":["Joseph","Anna","Lily","Henry"],
                   "age":[15,25,19,13],
                   "grade":["A","B","A","C"]})
print(df, '\n')
print(pandas.get_dummies(df, columns = ['grade'],prefix = 'new_grade'))

 

The dummy variable with the prefix “new_grade” is used instead of the “grade” variable:

Conclusion

In Python, the “pandas.get_dummies()” method of the “pandas” module is utilized to convert/transform the categorical into dummy variables. It creates a new DataFrame with one column for each categorical variable level. The column name is the level name, prefixed with the original variable name. This write-up offered a complete overview of the “pandas.get_dummies()” method with examples.

About the author

Haroon Javed

Hi, I'm Haroon. I am an electronics engineer and a technical content writer. I am a tech geek who loves to help people to the best of my knowledge.