Python Pandas

Pandas Trim Whitespace from Column

This short article will discuss how you can trim leading or trailing whitespace characters from a Pandas DataFrame.

Sample DataFrame

For illustration purposes, we will use the sample DataFrame shown below:

import pandas as pd

df = pd.DataFrame({

"product_name": [' product_1', 'product_2\t', 'product_3\n', '\nproduct_4\t', 'product_5'],

"price": [10.00, 20.50, 100.30, 500.25, 101.30]

})

The DataFrame above contains whitespace characters such as newline characters, spaces, and tabs.

Remove Leading Whitespace Characters

We can use the lstrip function to remove leading whitespace characters from a DataFrame column to remove leading whitespace characters from a DataFrame column as shown:

df.product_name.str.lstrip()

The lstrip function should remove the leading whitespace characters from the product_name column.

The code above should return:

Note that the leading space and new line whitespace characters are removed.

Remove Trailing Whitespace characters.

We can use the rstrip() function to remove trailing whitespace characters from a column.

An example is as shown:

df.product_name.str.rstrip()

Here, the code above should remove the trailing whitespace characters. An example return value is as shown:

Remove Both Leading and Trailing Whitespace Characters

Using the strip () function, you can also remove both the leading and trailing whitespace characters from a column using the strip() function.

An example usage is as shown:

df.product_name.str.strip()

In this case, the function should return:

Note how the leading and trailing whitespace characters are removed from the column.

Using Replace

You can also use the replace() function to remove whitespace characters from a column.

For example, to replace all tab characters from a column, we can do:

df.product_name.str.replace('\t', '')

In this case, the function will take the tab characters and replace them with the specified value.

The resulting output is as shown:

To remove space and newline characters:

df.product_name.str.replace('\n', '') // remove newline

df.product_name.str.replace(' ', '') // remove spaces

Terminating

This article shows you various ways of removing leading and trailing whitespace characters from a Pandas DataFrame.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list