Apache Spark

PySpark – withColumnRenamed method

In Python, PySpark is a Spark module used to provide a similar kind of Processing like spark using DataFrame.

withColumnRenamed() method in PySpark is used to rename the existing columns in the PySpark DataFrame.

Syntax:

Dataframe.withColumnRenamed(‘old_column’,’new_column’)

Parameters:

  1. old_column is the old column name
  2. new_column is the new name for the old column

Let’s create the PySpark DataFrame with 5 rows and 6 columns and display it using the show() method

Example:

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession
#import the col function
from pyspark.sql.functions import col

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

#display the dataframe
df.show()

Output:

Capture.PNG

We will change the ‘address’ column name to ‘students address’, ‘rollno’ column to ‘students id’ and ‘name’ column to ‘students name’.

Example:

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

#rename name column to students name
df=df.withColumnRenamed("name","students name")

#rename address column to students name
df=df.withColumnRenamed("address","students address")

#rename rollno column to students id
df=df.withColumnRenamed("rollno","students id")

#lets display the schema
df.printSchema()

Output:

root

|-- students address: string (nullable = true)

|-- age: long (nullable = true)

|-- height: double (nullable = true)

|-- students name: string (nullable = true)

|-- students id: string (nullable = true)

|-- weight: long (nullable = true)

If we want to multiple column names at a time, then we have to use the withColumnRenamed() function multiple times separated by the dot(.) operator

Syntax:

Dataframe.withColumnRenamed(‘old_column1’,’new_column’). .withColumnRenamed(‘old_column2’,’new_column’). .withColumnRenamed(‘old_column3’,’new_column’).

.withColumnRenamed(‘old_column n’,’new_column’)

Example:

In this example will change the ‘address’ column name to ‘students address’, ‘rollno’ column to ‘students id’ and ‘name’ column to ‘students name’ and finally display the schema.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

#rename name column to students name,
#address column to students name
#rename rollno column to students id
df=df.withColumnRenamed("name","students name").withColumnRenamed("address","students address").withColumnRenamed("rollno","students id")

#lets display the schema
df.printSchema()

Output:

root

|-- students address: string (nullable = true)

|-- age: long (nullable = true)

|-- height: double (nullable = true)

|-- students name: string (nullable = true)

|-- students id: string (nullable = true)

|-- weight: long (nullable = true)

Conclusion

In this article, we discussed how to rename the column names using the withColumnRenamed() function and saw how to rename multiple columns.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain