PostgreSQL

Postgres Percentile_Cont

Window functions are one of the most powerful features in SQL that allow us to perform the calculations on a set of rows within a defined window or frame. Unlike the regular aggregate functions like SUM or AVG which operate on an entire result set, window functions operate on a subset of rows within the result set based on a specified ordering.

Hence, window functions provide a way to perform the calculations that involve comparison or aggregation from multiple rows without losing the row-level details.

One of the most popular aggregate window functions in PostgreSQL is the percentile_cont() function. This function allows us to calculate the interpolated value that corresponds to a specified percentile in a sorted set of values.

In this tutorial, we will explore how to work with the percentile_cont() function in PostgreSQL to calculate the various percentile values.

PostgreSQL Percentile_Cont() Function

The percentile_cont() function in PostgreSQL returns the interpolated value that falls within the range of the provided dataset.

The syntax of the function is expressed in the following:

percentile_cont(percentile) WITHIN GROUP (ORDER BY expression) [IGNORE NULLS]

The function accepts the following options and parameters:

Percentile – This parameter defines the percentile value that we wish to calculate for the interpolated value. The percentile value should be a decimal value between 0 and 1. For example, the percentile 0.5 refers to the median or 50th percentile.

WITHIN GROUP (ORDER BY expression) – It defines the sorting order of the values before calculating the percentile.

[IGNORE NULLS] – This option tells the percentile_cont function to include or exclude the NULL values in the calculation. By default, the value is set to false. This means that the function ignore the NULL values in the calculation.

Let us look at some basic examples of working with the percentile_cont() function in PostgreSQL.

Create a Sample Table

Let’s start by creating a sample table to work with. We can create a sample sales table as shown in the following query:

CREATE TABLE sales (
    id SERIAL PRIMARY KEY,
    amount NUMERIC(10, 2) NOT NULL,
    region VARCHAR(50) NOT NULL
);

This should create a sample sales table with the id, amount, and region columns.

Next, we can populate the previous table with some sample data as shown in the following insert statements:

INSERT INTO sales (amount, region) VALUES (1000.50, 'North');
INSERT INTO sales (amount, region) VALUES (2000.00, 'North');
INSERT INTO sales (amount, region) VALUES (1500.25, 'South');
INSERT INTO sales (amount, region) VALUES (1800.75, 'South');
INSERT INTO sales (amount, region) VALUES (1200.60, 'North');
INSERT INTO sales (amount, region) VALUES (1900.30, 'South');

This should give us a table with a sample data as shown in the following screenshot:

Example 1: Calculate the 50th Percentile

The following example query demonstrates how we can use the percentile_cont() function to calculate the 50th percentile of the values in the sales table:

SELECT
    percentile_cont(0.5) WITHIN GROUP (ORDER BY amount) AS median
FROM
    sales;

In the given example, we calculate the sales table’s median value or 50th percentile of the amount column. Then, the percentile_cont function is applied within the grouping context that is defined in the WITHIN GROUP clause.

The resulting value is as follows:

Example 2: Calculate Other Percentiles

We can also use the percentile_cont() function to calculate other percentiles by specifying different values for the percentile parameter. For example, to calculate the 75th percentile, we can run the following query:

SELECT
    percentile_cont(0.9) WITHIN GROUP (ORDER BY amount) AS percentile_90
FROM
    sales;

Result:

Example 3: Take the NULL Values Into Account

Using the IGNORE NULL clause, we can also include the NULL values in the calculation.

Example:

SELECT
    percentile_cont(0.9) WITHIN GROUP (ORDER BY amount) IGNORE NULLS AS percentile_90
FROM
    sales;

This should calculate the 90th percentile and return the resulting value.

Conclusion

We explored how to use the percentile_cont() function in PostgreSQL to calculate the various percentiles of a given column or set of values.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list