SQL Standard

SQL Self-Join

One of the most renowned feature of relational databases is joins. Joins are one of the most powerful and complex features of relational databases. They allow for extensive modularity and very complex relationships between database tables without compromising on the performance.

However, there is a type of join in SQL that tends to go under the radar which is known as a self-join.

In SQL, a self-join is a powerful technique that allows us to combine the rows from the same table based on a related column within the same table.

Unlike the other type of joins in SQL where we take the data from multiple tables, a self-join works on a single table.

You may wonder, why do I need a self-join when I can look up the data from the table itself? Although that may be true, a self-join can help in tasks when you need to perform recursive data queries or find a hierarchical data.

Essentially, a self-join plays a fundamental role where you need to compare the rows of the same table.

Join us in this post as we explore what self-joins are, how they work, and how we can use them in an SQL table.

NOTE: Before we dive in, we will setup a basic table for demonstration purposes. However, you are feel to use any supported dataset.

Sample Data

The following queries creates a basic table containing a product information and inserts the sample records into the table. This allows us to demonstrate how to work with the self-joins in SQL.

CREATE TABLE Products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    parent_id INT
);

Add 10 sample records into the table.

INSERT INTO Products (product_id, product_name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Smartphones', 1),
(3, 'Laptops', 1),
(4, 'iPhone 13', 2),
(5, 'Samsung Galaxy S21', 2),
(6, 'MacBook Pro', 3),
(7, 'Dell XPS 15', 3),
(8, 'Accessories', NULL),
(9, 'Phone Cases', 8),
(10, 'Laptop Bags', 8);

The resulting table is as follows:

Example 1: Basic Self-Join

Let us start with a basic self-join. For example, we can use a simple self-join to retrieve the parent-child relationship of the same table.

For example, we can use it to find all the products and their corresponding parent products. We can run a query as follows:

SELECT c.product_name AS child_product, p.product_name AS parent_product
FROM Products c
LEFT JOIN Products p ON c.parent_id = p.product_id;

In the given example query, we use “c” as an alias for the product table which represents the child products.

We also create the “p” alias for the product table to represent the parent products.

In the next step, we use a basic SQL LEFT JOIN to ensure that the top-level products are included in the result.

Lastly, we use the ON clause to create a relationship between the child and parent records using the “parent_id” and “product_id” columns.

The resulting table is as follows:

You will notice that the parent products do not have a top link as they are at the top of the hierarchy which are also known as the root elements.

Example 2: Fetch a Hierarchical Data

One of the most fundamental roles of a self-join is retrieving a hierarchical data. For example, suppose we want to retrieve all child products where the product is equal to “Accessories”. We can run the query using a self-join as follows:

SELECT c.product_name AS child_product
FROM Products c
JOIN Products p ON c.parent_id = p.product_id
WHERE p.product_name = 'Accessories';

In this case, we use a JOIN to combine the products table with itself and then use the WHERE clause to filter out the results that we want.

Conclusion

In this tutorial, we explored what a self-join is, how it works, and how we can use it in the SQL table to retrieve a hierarchical data or perform recursive queries.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list