SQL Standard

SQL PARTITION BY Clause

In SQL, the PARTITION BY clause allows us to divide or partition the result set of a given query into various groups based on one or more columns. The resulting partitions can be quite useful especially when you need to perform the calculations to each partition (individually) or apply the aggregate functions within each group.

In this tutorial, we will learn about the workings of the PARTITION BY clause in SQL and discover how we can use it to partition the data for a more granular subset.

Syntax:

Let us start with the syntax of the PARTITION BY clause. The syntax may depend on the context in which you use it but here is the general syntax:

SELECT column1, column2, ...

OVER (PARTITION BY partition_column1, partition_column2, ...)

FROM table_name

The given syntax represents the following elements:

  1. column1, column2 – This refers to the columns that we wish to include in the result set.
  2. PARTITION BY columns – This clause defines how we wish to partition or group the data.

Sample Data

Let us create a basic table with a sample data to demonstrate how to use the PARTITION BY clause. For this example, let’s create a basic table that stores the product information.

CREATE TABLE products (
    product_id INT PRIMARY KEY AUTO_INCREMENT,
    product_name VARCHAR(255),
    category VARCHAR(255),
    price DECIMAL(10, 2),
    quantity INT,
    expiration_date DATE,
    barcode BIGINT
);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Chef Hat 25cm',
'bakery',
24.67,
57,
'2023-09-09',
2854509564204);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Quail Eggs - Canned',
'pantry',
17.99,
67,
'2023-09-29',
1708039594250);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Coffee - Egg Nog Capuccino',
'bakery',
92.53,
10,
'2023-09-22',
8704051853058);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Pear - Prickly',
'bakery',
65.29,
48,
'2023-08-23',
5174927442238);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Pasta - Angel Hair',
'pantry',
48.38,
59,
'2023-08-05',
8008123704782);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Wine - Prosecco Valdobiaddene',
'produce',
44.18,
3,
'2023-03-13',
6470981735653);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Pastry - French Mini Assorted',
'pantry',
36.73,
52,
'2023-05-29',
5963886298051);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Orange - Canned, Mandarin',
'produce',
65.0,
1,
'2023-04-20',
6131761721332);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Pork - Shoulder',
'produce',
55.55,
73,
'2023-05-01',
9343592107125);

insert
    into
    products (product_name,
    category,
    price,
    quantity,
    expiration_date,
    barcode)
values ('Dc Hikiage Hira Huba',
'produce',
56.29,
53,
'2023-04-14',
3354910667072);

Once we have the sample data setup, we can proceed and use the PARTITION BY clause.

Basic Usage

Suppose we want to calculate the total items for each product category in the previous table. We can use the PARTITION BY to divide the items into unique categories and then determine the total of the quantity in each category.

An example is as follows:

SELECT
    product_name,
    category,
    quantity,
    SUM(quantity) OVER (PARTITION BY category) AS total_items
FROM
    products;

Notice that in the given example, we partition the data using the “category” column. We then use the SUM() aggregate function to determine the total items in each category separately. The result shows the total items in each category.

Using the PARTITION BY Clause

To summarize, the most common use case of the PARTITION BY clause is in conjunction with the window functions. The window function is applied to each partition separately.

Some of the common window functions to use with PARTITION BY include the following:

  • SUM() – Calculate the sum of a column within each partition.
  • AVG() – Compute the average of a column within each partition.
  • COUNT() – Count the number of rows within each partition.
  • ROW_NUMBER() – Assign a unique row number to each row within each partition.
  • RANK() – Assign a rank to each row within each partition.
  • DENSE_RANK() – Assign a dense rank to each row within each partition.
  • NTILE() – Divide the data into quantiles within each partition.

That’s it!

Conclusion

In this tutorial, we learned how to work with the PARTITION BY clause in SQL to partition the data into various segments and then apply a specific operation to each of the resulting partitions separately.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list