Apache Spark

PySpark – show method

In Python, PySpark is a Spark module used to provide a similar kind of Processing like spark using DataFrame. It provides the show() method, which displays the dataframe in a tabular format.

Syntax:

Dataframe.show(n,vertical,truncate)

Where Dataframe is the input PySpark dataframe

Parameters:

1. n is the first optional parameter representing integer value to get the top rows in the dataframe, and n represents the number of top rows to be displayed. By default, it will display all rows from the dataframe

2. vertical parameter takes Boolean values, which are used to display the dataframe in the vertical parameter when it is set to True. and display the dataframe in horizontal format when it is set to false. By default, it will display in horizontal format

3. truncate is used to get the number of characters from each value in the dataframe. It will take an integer as some characters to be displayed. By default, it will display all the characters.

Example 1:

In this example, we will create a PySpark dataframe with 5 rows and 6 columns and display the dataframe by using the show() method without any parameters.

So this results in a tabular dataframe by displaying all values in the dataframe

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# dataframe
df.show()

Output:

Capture.PNG

Example 2:

In this example, we will create a PySpark dataframe with 5 rows and 6 columns and display the dataframe by using the show() method with the n parameter. We set the n value to 4 to display the top 4 rows from the dataframe.

So this results in a tabular dataframe by displaying 4 values in the dataframe.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# get top 4 rows in the dataframe
df.show(4)

Output:

Capture.PNG

Example 3:

In this example, we will create a PySpark dataframe with 5 rows and 6 columns and display the dataframe by using the show() method with a vertical parameter. We set vertical to False to display the dataframe in horizontal view.

So this results in a tabular dataframe by displaying all values in horizontal view.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# get the dataframe in horizontal way
df.show(vertical=False)

Output:

Capture.PNG

Example 4:

In this example, we will create a PySpark dataframe with 5 rows and 6 columns and display the dataframe by using the show() method with a vertical parameter. We set vertical to True to display the dataframe in vertical view.

So this results in a tabular dataframe by displaying all values in vertical view.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# get the dataframe in a vertical way
df.show(vertical=True)

Output:

Capture.PNG

Example 5:

In this example, we will create a PySpark dataframe with 5 rows and 6 columns and display the dataframe by using the show() method with truncate parameter. We set the truncate value to 1 to display the first character in every row of the dataframe.

So this results in a tabular dataframe by displaying the first character in horizontal view.

#import the pyspaprk module
import pyspark
#import SparkSession for creating a session
from pyspark.sql import SparkSession

#create an app named linuxhint
spark_app = SparkSession.builder.appName('linuxhint').getOrCreate()

# create student data with 5 rows and 6 attributes
students =[{'rollno':'001','name':'sravan','age':23,'height':5.79,'weight':67,'address':'guntur'},
               {'rollno':'002','name':'ojaswi','age':16,'height':3.79,'weight':34,'address':'hyd'},
               {'rollno':'003','name':'gnanesh chowdary','age':7,'height':2.79,'weight':17,'address':'patna'},
               {'rollno':'004','name':'rohith','age':9,'height':3.69,'weight':28,'address':'hyd'},
               {'rollno':'005','name':'sridevi','age':37,'height':5.59,'weight':54,'address':'hyd'}]

# create the dataframe
df = spark_app.createDataFrame( students)

# get the dataframe by getting only first character in every row
df.show(truncate=1)

Output:

Capture.PNG

Conclusion

This article discussed the show() method in PySpark and its functionality. We considered all the parameters with values for the dataframe to display the dataframe in tabular format.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain