Cut() Function
When it is required to sort and segment the data values into bins, you use the cut() method. The cut() method only works with the objects like one-dimensional arrays. The cut() method performs statistical analysis on a large set of scalar/numeric data. This function can also convert the elements of an array into various bins.
Syntax:
Parameters:
- x: Unidimensional array; the array which we want to bin.
- bins: Bin edges are defined for the segmentation.
- right: This is set to True by default. It indicates whether the rightmost edge of the bins are included or not
- labels: Can be a Bool or array, and it is optional. The labels for the refilled bins are specified. The length must match to the produced bins. If it is False, only the integer bin indicators are returned.
- retbins: Bool, False by default. Whether the bins are returned or not. When the bins are supplied as a scalar, it is useful.
Example 1: With Bins Parameter
Let’s have a DataFrame that holds 12 integers in the “values1” column. Create 8 bins in the range of 15 each and store the bins in the “BINS” column.
numeric=pandas.DataFrame({'values1':[12,34,56,44,45,34,45,32,67,89,100,34]})
print(numeric)
# Create 8 bins
numeric['BINS'] = pandas.cut(numeric['values1'], bins=[1,15,30,45,60,75,90,105])
print()
print(numeric)
print()
print(numeric['BINS'].unique())
Output:
Explanation:
The bins are created for all values. We also display the Bin sizes using the unique() function. Now, you can see that one bin is allocated for each value.
Example 2: With Labels Parameter
Create 5 bins in the range of 10 each and store the bins in the “BINS” column for the DataFrame having 7 rows.
numeric=pandas.DataFrame({'values1':[2,5,12,32,20,3,10]})
# Create 5 bins and specify labels for each bin.
numeric['BINS'] = pandas.cut(numeric['values1'], bins=[1,10,20,30,40],labels=['first','second','third','last'])
print()
print(numeric)
Output:
Explanation:
The bins are created for all values.
- For the [1-10] bin, the label is “first”. The values 2, 5, 3, and 10 fall under the first bin.
- For the [11-20] bin, the label is “second”. The values 12 and 20 fall under the second bin.
- For the [21-30] bin, the label is “third”. No values are in this range.
- For the [31-40] bin, the label is “last”. The value 32 falls under this bin.
Qcut() Function
The qcut() function is known as a “Quantile-based discretization” method. This means that qcut() is used to create the equal-sized bins by dividing the underlying data. The qcut() function is also known as the “Quantile-based discretization function”. This means that the qcut() is used to divide the underlying data into the bins of equal sizes.
Syntax:
Parameters:
- x: Unidimensional array, the array which we want to bin.
- q: Number of quantiles.
- right: This is set to True by default. It indicates whether the rightmost edge of the bins are included or not.
- labels: Can be a Bool or array, and it is optional. The labels for the refilled bins are specified. The length must match to the produced bins. If it is False, only the integer bin indicators are returned.
- retbins: Bool, False by default. Whether the bins are returned or not. When the bins are supplied as a scalar, it is useful.
Example 1:
Let’s have a DataFrame that holds 12 integers in both “values1” and “values2” columns. Create 2 quantiles for both the columns.
numeric=pandas.DataFrame({'values1':[12,34,56,44,45,34,45,32,67,89,100,34],
'values2':[11,22,33,44,55,66,77,88,99,100,12,12]})
print(numeric)
# Create 2 quantiles for values1 column
numeric['BIN values 1'] = pandas.qcut(numeric['values1'], 2)
# Create 2 bins for values1 column
numeric['BIN values 2'] = pandas.qcut(numeric['values2'], 2)
print()
print(numeric)
Output:
Explanation:
We created 2 quantiles for each column. Now, you can see that each quantile has an equal number of values.
- In the “values1” column, the quantiles are (11.999, 44.5] and (44.5, 100.0]. There are 6 for both the quantiles.
- In the “values2” column, the quantiles are (10.999, 49.5] and (49.5, 100.0]. There are 6 for both the quantiles.
Example 2: Qcut() vs Cut()
Let’s have a DataFrame that holds 12 integers in both “values1” and “values2” columns. Now, using cut(), create two bins. And using qcut(), create 2 quantiles for the “values2” column.
numeric=pandas.DataFrame({'values1':[12,34,56,44,45,34,45,32,67,89,100,34],
'values2':[11,22,33,44,55,66,77,88,99,100,12,12]})
# Create 2 quantiles for values2 column
numeric['qcut()'] = pandas.qcut(numeric['values2'], q=2)
# Create 2 bins for values2 column
numeric['cut()'] = pandas.cut(numeric['values2'], bins=2)
print(numeric['qcut()'])
print()
print(numeric['cut()'])
Output:
Explanation:
Now, you see the actual difference:
The qcut() groups the data into equal parts. Six (6) values come under (10.999, 49.5] and another 6 under (49.5, 100.0]. Whereas in cut(), 7 values come under (10.911, 55.5] and another 5 values come under (10.911, 55.5].
Conclusion
We discussed about the cut() and qcut()functions to bin the data in Pandas Python. We saw the syntax of both functions and described their parameters to help you while using those functions. In the examples of this tutorial, we showed you how to segment the data into bins, label the bins, and how to use the equal-sized binning data using cut() and qcut() functions. Now, you may be able to bin the data on your own using these functions.