PostgreSQL

Postgresql group by

Postgresql group by clause is a feature that is used to unite/combine those rows in the table that have the same data. This clause is mainly used to remove duplicate data and to maintain concurrency. Whenever we want to calculate sum, or any other aggregate like AVG, etc., this group by clause is always used as there are many clauses used in PostgreSQL. But there exists a hierarchy between each clause.

FROM > WHERE > “GROUP BY” > HAVING > SELECT > DISTINCT > ORDER BY > LIMIT

The working of PostgreSQL lies within “where” and the “Having” clause.

Syntax

SELECT column  

FROM dumytable  

WHERE [conditions ]    

GROUP BY firstcolumn, secondcolumn..  

ORDER BY firstcolumn, secondcolumn.. ;

Example 1

To understand the concept of the group by function, we use an example here. In the table of passengers, you can see that the last names are the same for some people. Each name that is similar to another forms a group, and their payment is added collectively against each same name. This is elaborated in the below example.

>> select lname, SUM (payment) from passenger GROUP BY lname;

The passenger’s last name is selected with the use of a built-in function ‘SUM’ that takes the column ‘payment’. And add the payment for those people having the same name. For example, the salary of “Javed” and “saad” is added. Whereas for “Malik” and “Shams”, it is individually mentioned.

Similarly, consider a table “hospital”. We want to group the city for the age. In this example, one city exists more than once in a column. Each city is grouped with the same city name. The ages of each group of the city are added and form a single row.

Hospital:

>> select city, SUM (age) from hospital GROUP BY city;

Example 2

Alternatively, if we select the last name with the id from the table passenger, the resultant will be a different table. Because when we group both the columns together, it will display each passenger’s name because the id for every passenger, even having a common last name, is different. The sum is calculated in a separate column, but each passenger’s payment is mentioned in front of his name because the grouping of lname is not done here.

This is an example of the group by clause with multiple columns. Because when multiple columns are selected for grouping, the resultant value is changed as compared to the group by a single table.

>> select id, lname, SUM (payment) from passenger GROUP BY id, lname;

From the output, you will notice one thing that firstly, all those lname that are uncommon are displayed, and then those who are the same are mentioned in the table.

Example 3

This example has a join condition and a group by clause. As ‘join’ is used, it means we have used two tables here. One is “item”, and the other is “orders”.

Items:

Orders:

We have used a concatenation method (used to join two strings) to join the values of two columns of the “items” table with “,” and name the column collectively as “description”. It is optional; you may take them separately. In this query, the keyword “USING” identifies the specific column from the other table. The address from the items table is matched with the address column of the “orders” table. This is done by making a join between two tables. Like previous examples, both the columns will be selected by the GROUP BY clause.

>> select name ||,|| category as Description, address from items inner join Orders USING (address) GROUP BY address, Description;

You can observe that 5 rows will be selected having the address of items matched with the address of Orders. And then, the Description column will be formed correspondence with the address column.

Similarly, there is another example of concatenation with the factor of age in two tables. One is “passenger”, and the other is “worker”. Concatenation is between the first and the last name. Both these names are separated through the space between two names. We have taken a part of the worker table here.

Worker:

The query will work so that the point where the fname of worker is matched with the passenger, the age from the passenger, is displayed in the age column.

>> select fname || ‘ ‘ || passenger.lname as full_name, passenger.age from passenger INNER join worker USING (fname) GROUP BY full_name, passenger.age ORDER BY passenger.age;

One row is formed. The full_name is created by joining two columns with space, and the address is selected where the fname of the passenger matches with the fname of the worker.

Example 4

This example deals with using a count () function to count the ids of the “items” table. This is again group by the id of the table.

>> Select id, COUNT (id) FROM items GROUP BY id;

The rows in the id column are split into groups. Each group is counted in the column that how many times it does appear in the column. In the resultant table, a new column with a “count” name is created, and the values of the count are written here in front of each group.

Via pgAdmin

Now we have applied some examples on the dashboard side of PostgreSQL. These examples are different in some sort because they form a subtotal row from the original column when any one of the items differs from the others, so the value is returned as NULL.

Consider the first example; here, we have created a column name “ALL” that has combined two columns. An address and a category. ‘ALL’ column counts the values in both the columns collectively. The address column is grouped as “add”, and the category column is separately grouped as “cat”. As column names of both tables used can be matched with one another. So each column of the respective table is accessed by a specific object.

The condition applied on the command depends upon the id and the order number. Where both these id and order numbers are the same, the data is fetched. Similarly, an additional condition is also added for the name.

>> SELECT o.address, category, count (*) ASALL, GROUPING(o.address) ASadd, GROUPING (category ) AS “cat” FROM items I, orders o where i.order_no = o.order_id AND i.address= ‘Lahore’ GROUP BY CUBE (o.address, category) ORDER BY 1, 2;

For the city of Lahore, one category is selected. There are 4 possibilities. Sometimes, the toy is present but not the address. And vice versa. But there exists a time where both the category and address are present.

Now, if we change the condition of the name from a table and switch the table with other. “Items.address” is replaced by the “order.address”, then the result is different.

Conclusion

The clause “Postgresql group by” is used to apply any aggregate operator to the collective data. This article uses group by clause with count function, joins, and selection and grouping of multiple columns. I am sure this tutorial will be the best mean of understanding for the readers.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.