PostgreSQL

Lag Postgresql

Postgresql Lag function() is used to access the row in the table that is present before the table’s current row. As the name indicates that it lags behind the current row and takes the record of the table. In PostgreSQL, the lag function is important so that it compares the previous rows with the current row.

This function is suitable for knowing about the data you entered previously but are present in the current row. Then you can access the data of the past. Open psql, and after providing a password, you can now apply all the commands on the psql shell.

Syntax

LAG(expression [ offset [value]])

OVER (

    [PARTITION BY expression, ... ]

    ORDER BY expression [ASC | DESC], ...

)

An offset is an integer number set in the query and shows the number of rows present above the current row. It can be a subquery or an expression.

The lag function is applied to the value partition, which is mainly created by a partition clause. At the same time, the order by clause is used to arrange the rows in a specified order.

There are two main features of the Lag function we have described here are:

  • Lag default partition 
  • The Lag function in partitions.

lag default partition

We have a table named “showroom” having the details of the vehicles. The table is created by the “CREATE TABLE” query in psql and by adding the data using an “INSERT” statement.

>> select * from showroom;

We will apply the lag function on the table. The lag function contains two or more “select” statements in it. As we have to select the value before the current row, the current row is selected in the first step and then the row before it.

Example 1

In this example, we will select a column from the table and calculate the sum from the price column. The second select statement will use the lag function. Both these statements use the “order by” clause with “group by” in the aggregate function of select. As no year is similar to the other, so all the rows will be displayed. There will be no grouping. Now we apply the command on the table:

>>  with cte AS ( SELECT year, SUM (price) price FROM showroom GROUP BY year ORDER BY year ) SELECT year, price, LAG (price, 1) OVER (ORDER BY year) previous_year_discount from cte;

To print the value of the row above the current row, we have first access the current row through the select statement and have calculated the sum function. The value obtained is stored in “cte” for further use in the latter query. The resultant value will be ordered by and grouped by the “year” column.

Moving towards the second select statement, the same columns in “cte” are selected from the table. An additional column is created with the name “previous_year_discount” having lag with 1 row. And the resultant column’s rows must be arranged by the year column.

You can see from the table obtained that the first row in the new column is blanked. Because the price is shifted to the next row from the start making the first-row NULL.

Example 2

There is another example; here, we are using three select statements. “Cte” and “cye” are used to store the values that are obtained from the two queries using the “select” statement, and their resultant values will be used in the third select statement. The output of “cte” acts as an output for “cte2”. Similarly, “cte2” will be the input for the third select statement. All the values are displayed in the resultant table combined because one result is a lag function, whereas the second resultant column shows the difference between the two lag values.

Moving towards the first “cte” calculates the same function as the previous example. The sum is calculated from the “price” column.

>> With cte AS ( SELECT year, SUM (price) price FROM showroom GROUP BY year ORDER BY year ),

The second “cte2” is similar to the second select command because we know that the lag function is applied on the sum of the column “price”. The column of that sum “previous_year_Discount” is stored as it is in cte.

>> cte2 AS ( SELECT year, price, LAG (price, 1) OVER (ORDER BY year) previous_year_discount from cte)

The third “select” statement will take the lagged sum of prices and calculate their difference. For example, the price in the third row subtracts the value in the second row. At the same time, the first row is NULL because of the lag function.

>> SELECT year, price, previous_year_Discount, (Previous_year_Discount – price) Difference FROM cte;

The 4th column is named the difference because it uses the minus operator to subtract the values in the 3rd column.

For every row, the value from the 3rd column is subtracted from the previous row. For instance (3rd -2nd ), (4th -3rd ), and so on. And the subtracted value is written in front of the first value (subtracted from the next value). But in the 6th, the last row in the “Difference” column, the answer is written that is obtained by subtracting the 1st row from the 2nd row, the value of the year 2021.

Example 3

Consider another example in which the same concept is applied to the table. But the lag is 2 instead of one. A single select statement displays the prices, not the SUM or any other aggregate function.

>> SELECT year, price, LAG (price, 2) OVER (ORDER BY year) Discount FROM showroom;

The resultant value shows that the price in the discount column is shifted two rows down. It means we have accessed the rows before the current row of the table.

LAg function in the partitions

Via pgAdmin

In pgAdmin, after providing the password, you can now access the database and all the content present in it. Go to the query tool of the tables. Now create new table names “team_members”.

>> CREATE TABLE team_members ( name VARCHAR(100), id INT, year integer, salary integer);

After creating the table, now add values to the table.

>> INSERT INTO team_members( name, id, year, salary)  Values (‘salar’, 1, 2018, 180000);

After the values are inserted, now apply the query.

The LAG () function in this type compares one year’s salary with the last year of each group.

>> select year, salary, id, LAG(salary, 1) over partition by id ( order by year ) Discount FROM team_members;

In this example, ids are grouped collectively. The partition will distribute the rows into the group, as you can see that all the same ids are present together. And the discount column shows the comparison of one-row salary with the second row, having the lag of 1.

Conclusion

“Lag function” is used to fetch the record you have entered before the current row you have selected. Two main categories of lag function are discussed here with examples. Select, order by are used in addition with the “partition by clause”. This function in PostgreSQL can be applied within the table and then apply different operations on them.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.