withColumnRenamed() method in PySpark is used to rename the existing columns in the PySpark DataFrame.
Syntax:
Parameters:
- old_column is the old column name
- new_column is the new name for the old column
Let’s create the PySpark DataFrame with 5 rows and 6 columns and display it using the show() method
Example:
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession
#import the col function
from pyspark.sql.functions import col
#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()
# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
{'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
{'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
{'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
{'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]
# create the dataframe
df = spark_app.createDataFrame( students)
#display the dataframe
df.show()
Output:
We will change the ‘address’ column name to ‘students address’, ‘rollno’ column to ‘students id’ and ‘name’ column to ‘students name’.
Example:
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession
#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()
# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
{'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
{'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
{'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
{'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]
# create the dataframe
df = spark_app.createDataFrame( students)
#rename name column to students name
df=df.withColumnRenamed("name","students name")
#rename address column to students name
df=df.withColumnRenamed("address","students address")
#rename rollno column to students id
df=df.withColumnRenamed("rollno","students id")
#lets display the schema
df.printSchema()
Output:
|-- students address: string (nullable = true)
|-- age: long (nullable = true)
|-- height: double (nullable = true)
|-- students name: string (nullable = true)
|-- students id: string (nullable = true)
|-- weight: long (nullable = true)
If we want to multiple column names at a time, then we have to use the withColumnRenamed() function multiple times separated by the dot(.) operator
Syntax:
.withColumnRenamed(‘old_column n’,’new_column’)
Example:
In this example will change the ‘address’ column name to ‘students address’, ‘rollno’ column to ‘students id’ and ‘name’ column to ‘students name’ and finally display the schema.
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession
#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()
# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
{'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
{'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
{'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
{'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]
# create the dataframe
df = spark_app.createDataFrame( students)
#rename name column to students name,
#address column to students name
#rename rollno column to students id
df=df.withColumnRenamed("name","students name").withColumnRenamed("address","students address").withColumnRenamed("rollno","students id")
#lets display the schema
df.printSchema()
Output:
|-- students address: string (nullable = true)
|-- age: long (nullable = true)
|-- height: double (nullable = true)
|-- students name: string (nullable = true)
|-- students id: string (nullable = true)
|-- weight: long (nullable = true)
Conclusion
In this article, we discussed how to rename the column names using the withColumnRenamed() function and saw how to rename multiple columns.