In this tutorial, we will learn about the workings of the PARTITION BY clause in SQL and discover how we can use it to partition the data for a more granular subset.
Syntax:
Let us start with the syntax of the PARTITION BY clause. The syntax may depend on the context in which you use it but here is the general syntax:
OVER (PARTITION BY partition_column1, partition_column2, ...)
FROM table_name
The given syntax represents the following elements:
- column1, column2 – This refers to the columns that we wish to include in the result set.
- PARTITION BY columns – This clause defines how we wish to partition or group the data.
Sample Data
Let us create a basic table with a sample data to demonstrate how to use the PARTITION BY clause. For this example, let’s create a basic table that stores the product information.
product_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(255),
category VARCHAR(255),
price DECIMAL(10, 2),
quantity INT,
expiration_date DATE,
barcode BIGINT
);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Chef Hat 25cm',
'bakery',
24.67,
57,
'2023-09-09',
2854509564204);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Quail Eggs - Canned',
'pantry',
17.99,
67,
'2023-09-29',
1708039594250);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Coffee - Egg Nog Capuccino',
'bakery',
92.53,
10,
'2023-09-22',
8704051853058);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Pear - Prickly',
'bakery',
65.29,
48,
'2023-08-23',
5174927442238);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Pasta - Angel Hair',
'pantry',
48.38,
59,
'2023-08-05',
8008123704782);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Wine - Prosecco Valdobiaddene',
'produce',
44.18,
3,
'2023-03-13',
6470981735653);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Pastry - French Mini Assorted',
'pantry',
36.73,
52,
'2023-05-29',
5963886298051);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Orange - Canned, Mandarin',
'produce',
65.0,
1,
'2023-04-20',
6131761721332);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Pork - Shoulder',
'produce',
55.55,
73,
'2023-05-01',
9343592107125);
insert
into
products (product_name,
category,
price,
quantity,
expiration_date,
barcode)
values ('Dc Hikiage Hira Huba',
'produce',
56.29,
53,
'2023-04-14',
3354910667072);
Once we have the sample data setup, we can proceed and use the PARTITION BY clause.
Basic Usage
Suppose we want to calculate the total items for each product category in the previous table. We can use the PARTITION BY to divide the items into unique categories and then determine the total of the quantity in each category.
An example is as follows:
product_name,
category,
quantity,
SUM(quantity) OVER (PARTITION BY category) AS total_items
FROM
products;
Notice that in the given example, we partition the data using the “category” column. We then use the SUM() aggregate function to determine the total items in each category separately. The result shows the total items in each category.
Using the PARTITION BY Clause
To summarize, the most common use case of the PARTITION BY clause is in conjunction with the window functions. The window function is applied to each partition separately.
Some of the common window functions to use with PARTITION BY include the following:
- SUM() – Calculate the sum of a column within each partition.
- AVG() – Compute the average of a column within each partition.
- COUNT() – Count the number of rows within each partition.
- ROW_NUMBER() – Assign a unique row number to each row within each partition.
- RANK() – Assign a rank to each row within each partition.
- DENSE_RANK() – Assign a dense rank to each row within each partition.
- NTILE() – Divide the data into quantiles within each partition.
That’s it!
Conclusion
In this tutorial, we learned how to work with the PARTITION BY clause in SQL to partition the data into various segments and then apply a specific operation to each of the resulting partitions separately.