In this tutorial we will show the Pandas DataFrame add_prefix() And add_suffix() methods that are used to dd prefixes and suffixes to a particular column or all columns of a DataFrame.
Syntax to import:
Here is the syntax to import pandas from pyspark:
After that, we can create or use the dataframe from the pandas module.
Syntax to create pandas DataFrame:
We can pass a dictionary or list of lists with values.
Let’s create a pandas DataFrame through pyspark with four columns and five rows.
from pyspark import pandas
#create dataframe from pandas pyspark
pyspark_pandas=pandas.DataFrame({'student_lastname':['manasa','trisha','lehara','kapila','hyna'],'mark1':[90,56,78,54,67],'mark2':[100,67,96,89,32],'mark3':[91,92,98,97,87]})
print(pyspark_pandas)
Output:
Now, we will go into our tutorial.
It is possible to add prefixes and suffixes to a particular column or all columns using the add_prefix() and add_suffix() methods. Let’s discuss them one by one.
add prefixes and suffixes to a particular column or all columnspyspark.pandas.DataFrame.add_prefix()[/cc]
add_prefix() is used to add a prefix string to each and every column at the beginning of the pyspark pandas dataframe. It is also possible to add a prefix to only a single column by specifying the column name. In this scenario, it will be added to row labels.
Syntax:
For entire dataframe –
For particular column –
Where, pyspark_pandas is the pyspark pandas dataframe.
<h2>Parameter:</h2>
A string is a prefix added to the column at the beginning.
<h2>Example 1</h2>
In this example, we are adding the prefix – “Linux_Hint” to all the above columns to create the pyspark pandas dataframe.
[cc lang="python" width="100%" height="100%" escaped="true" theme="blackboard" nowrap="0"]
#import pandas from the pyspark module
from pyspark import pandas
#create dataframe from pandas pyspark
pyspark_pandas=pandas.DataFrame({'student_lastname':['manasa','trisha','lehara','kapila','hyna'],'mark1':[90,56,78,54,67],'mark2':[100,67,96,89,32],'mark3':[91,92,98,97,87]})
#add the prefix - ‘Linux_Hint' to the entire dataframe
print(pyspark_pandas.add_prefix('Linux_Hint'))
Output:
We can see that the prefix is added to all the columns.
Example 2
Add prefix to the values in the mark1 column.
from pyspark import pandas
#create dataframe from pandas pyspark
pyspark_pandas=pandas.DataFrame({'student_lastname':['manasa','trisha','lehara','kapila','hyna'],'mark1':[90,56,78,54,67],'mark2':[100,67,96,89,32],'mark3':[91,92,98,97,87]})
#add the prefix - ‘Linux_Hint' to the mark1 column values
print(pyspark_pandas.mark1.add_prefix('Linux_Hint'))
Output:
Linux_Hint1 56
Linux_Hint2 78
Linux_Hint3 54
Linux_Hint4 67
Name: mark1, dtype: int64
We can see that the prefix is added to all the values in the mark1 column.
add_suffix() is used to add a suffix string to every column at the end of the pyspark pandas dataframe. It is also possible to add a suffix to only a single column by specifying the column name. In this scenario, it will be added to row labels.
Syntax:
For entire dataframe –
For particular column –
Where, pyspark_pandas is the pyspark pandas dataframe.
Parameter:
A string is a suffix added to the column at the beginning.
Example 1
In this example, we are adding the suffix – “Linux_Hint” to all the columns above to create the pyspark pandas dataframe.
from pyspark import pandas
#create dataframe from pandas pyspark
pyspark_pandas=pandas.DataFrame({'student_lastname':['manasa','trisha','lehara','kapila','hyna'],'mark1':[90,56,78,54,67],'mark2':[100,67,96,89,32],'mark3':[91,92,98,97,87]})
#add the suffix - 'Linux_Hint' to the entire dataframe
print(pyspark_pandas.add_suffix('Linux_Hint'))
Output:
We can see that the suffix is added to all the columns.
Example 2
Add suffix to the values in the mark1 column.
from pyspark import pandas
#create dataframe from pandas pyspark
pyspark_pandas=pandas.DataFrame({'student_lastname':['manasa','trisha','lehara','kapila','hyna'],'mark1':[90,56,78,54,67],'mark2':[100,67,96,89,32],'mark3':[91,92,98,97,87]})
#add the suffix - 'Linux_Hint' to the mark1 column values
print(pyspark_pandas.mark1.add_suffix('Linux_Hint'))
Output:
1Linux_Hint 56
2Linux_Hint 78
3Linux_Hint 54
4Linux_Hint 67
Name: mark1, dtype: int64
We can see that the suffix is added to all the values in the mark1 column.
Conclusion
In this pyspark pandas tutorial, we saw how to add a prefix using add_prefix() and suffix using add_suffix() to the pyspark pandas dataframe. It will be added to the column names when we specify the entire dataframe. If we apply the above methods to a particular column, the prefix/suffix will get added to the row positions.