Python Pandas

Pandas Scatter Plot

In Data Science or Machine Learning, you may come across the relationship between two dependent variables. To view the correlation between these two variables in a pictorial representation, the scatter plot is useful. In this guide, we will see how to create a scatter plot from the existing Pandas DataFrame by considering different parameters.

Topic of Contents:

  1. Using Pandas.DataFrame.Plot.Scatter
  2. Using Matplotlib.Pyplot.Scatter

Let’s see how to create a scatter plot using the pandas.DataFrame.plot.scatter and matplotlib.pyplot.scatter.

Using Pandas.DataFrame.Plot.Scatter

In the scatter plot, the coordinates of each point are defined by two DataFrame columns. It can be represented by the filled circles at each point (2D view).

Syntax:

Let’s see the syntax of this function and parameters that are passed to it:

DataFrame.plot.scatter(x, y, s, c, alpha)

Parameters:

  1. The “x” and “y” take the DataFrame columns.
  2. The “s” represents the size of the point. We can set the same size to all the scalars. In this case, we need to assign the integer that represents the size of the point. If you want a different size for each point, you need to pass the size of each point in a list and assign it to this parameter.
  3. The “c” represents the color of the point. We can set the same color to all the scalars. In this case, we need to assign the color string or color code that represents the color. If you want a different color for each point, you need to pass the color for each point in a list and assign it to this parameter.
  4. “Alpha” provides you with transparency to the colors. It is used along with the “c” parameter.

Example 1: “X” and “Y” Parameters

Let’s create a DataFrame (day1_Data) with two columns – “Product_Quantity” and “Purchase”. Plot the scatter by assigning the first column to “x” and the second column to “y”.

import pandas

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

# Create scatterplot for day1_Data
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase')

Output:

Now, you can see the relation between these two columns.

Example 2: “S” Parameter

Utilize the previous DataFrame and plot the scatter with the point size as 100. Similarly, use the same parameter and pass different sizes for all 10 points.

import pandas

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

# Create scatterplot for day1_Data - points with same size
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase',s=100)

# Create scatterplot for day1_Data - points with different size
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase',s=[220,30,150,68,90,23,415,67,91,67])

Output:

The size of all the points in the first scatter plot are the same and are different in the second scatter plot.

Example 3: “C” Parameter

  1. Plot the scatter with all red color points.
  2. Plot the scatter with different color points.
import pandas

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

# All points are red
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase',c='red')

# Some points are different in color
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase',c=['red','green','blue','red','green','blue','black','yellow','purple','pink'])

Output:

The color of points is the same in the first scatter plot and the color of some points are different in the second scatter plot.

Example 4: “Alpha” Parameter

Let’s specify the alpha parameter to the function which is set to 0.5.

import pandas

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

# Create scatterplot for day1_Data
day1_Data.plot.scatter(x='Product_Quantity', y='Purchase',c='red',alpha=0.5)

Output:

You will observe the transparency in the points.

Using Matplotlib.Pyplot.Scatter

In this scenario, we use the matplotlib.pyplot module to create the scatter plot. All the parameters that are passed to the pandas.DataFrame.plot.scatter function can be passed to this function. Apart from this, this function can accept the following parameters:

Syntax:

Let’s see the syntax of this function and the parameters that are passed to it:

matplotlib.pyplot.scatter(DataFrame.column1,DataFrame.column2, s, c, alpha, marker, cmap. vmin, vmax, ....)

Parameters: 

  1. The “marker” is used to emphasize each point.
  2. The “cmap” is an instance of the colormap name which is used to map the scalar data to the colors.
  3. The “vmin” and “vmax” are the minimum data values which define the data range in which the color map covers.

Example 1:

Let’s create a DataFrame (day1_Data) with two columns – “Product_Quantity” and “Purchase”. Plot the scatter by assigning the first column to “x” and the second column to “y”.

import pandas
import matplotlib

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

# Create scatterplot for day1_Data using matplotlib
matplotlib.pyplot.scatter(day1_Data.Product_Quantity, day1_Data.Purchase,s=500,color="green")

Output:

Now, you can see the relation between these two columns.

Example 2:

Let’s set the “plasma” colormap by assigning “c” with the “Product_Quantity”. Also, set the market style to “p” (Pentagon).

import pandas
import matplotlib

# Products DataFrame
day1_Data = pandas.DataFrame({'Product_Quantity': [2,10,5,7,9,20,30,50,90,100],
                   'Purchase': [500,1200,700,900,1000,1500,2000,3500,4400,5000]})

matplotlib.pyplot.scatter(day1_Data.Product_Quantity, day1_Data.Purchase, c=day1_Data.Product_Quantity, cmap="plasma",marker="p")

Output:

You can see the color mapping to all the points and the market is of type pentagon.

Conclusion

We learned how to create a scatter plot using the pandas.DataFrame.plot.scatter and matplotlib.pyplot.scatter functions. In each function, most of the parameters are discussed with examples. The same DataFrame is utilized in this entire guide for better understanding. The advantage of creating a scatter plot is to view the relation between two variables (DataFrame columns). This can be extended to Machine Learning Applications.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain