SQL Standard

# Boyce–Codd, Fourth and Fifth Normal Forms

“This is part four of the series, The Five Normal Forms. Boyce-Codd Normal Form is abbreviated BCNF. It is also known as 3.5NF. 3.5 is 3½. It comes after the Third Normal Form. The Fourth Normal Form is abbreviated 4NF and comes after the Boyce-Codd Normal Form. The Fifth Normal Form is abbreviated 5NF and comes after the Fourth Normal Form.

This tutorial (article) is the fourth part of the tutorial series, The Five Normal Forms. This tutorial explains BCNF, 4NF, and 5NF.

This tutorial follows a storyline as follows: A father has died. He has left some money for his son. His son has decided to use the money to open (start) a convenience shop. The convenience shop is already stocked and has been operating for some time. The son has some employees, called clerks, in this tutorial series. The son is the proprietor. Before the shop went operational, both the proprietor and the clerks did not know the normal forms.

You, the reader, have completed this Five Normal Form tutorial series and are also a database developer. The son of the late father is your friend. You have visited his shop in the last three days. On the first day you visited them; you trained the proprietor and his employees on how to put a database table in 1NF. On the second day, you trained them on how to move a table from 1NF to 2NF. On the third day, you trained them on how to move a table from 2NF to 3NF.

Today, the fourth day, you visited the shop to train only the proprietor on BCNF, 4NF and 5NF in his office.”

## Boyce-Codd Normal Form

BCNF issue occurs when there is a composite primary key and a non-primary key attribute (column), and the non-primary key functionally depends on the part (e.g., one) of the primary key attributes, while the table is already in the third normal form.

This implies that the non-prime (non-primary key) column and the prime column depend on form an entity and have to be removed as a separate table, where the non-prime column will be the primary key. The other part of the composite primary key will form a new table with the non-prime column, and the non-prime column will not necessarily be part of the primary key. The other part of the composite primary key remains a primary key. The related possible dependees from the parent table appropriately accompany the two children’s tables.

The separation of tables for the Boyce-Codd normal form is not really the same as the separation of tables in 2NF and 3NF.

A table is in Boyce-Codd Normal Form if the following rules are obeyed:

1. The table should already be in the Third Normal Form.
2. No non-prime attribute (column) should depend on the part of the composite primary key.

This second point, as quoted, is simplified.

## Example

In the previous part of this series, the SaleDetails table (with some modification) was given as follows:

SaleDetails(saleID, product, numberSold, unitSellingPrice, discount)

A corresponding table with data is:

The unit selling price is of type, float or number. The primary key in this table is a composite key consisting of saleID and product.

This table is in 3NF. The number of products sold can be considered to depend on the product and not on the saleID. In other words, a non-prime key can depend only on the part of the composite primary key. This should not happen; and so from these three columns, the following two tables could result:

Quantity(numberSold, product)

and

SaleDetails(saleID, numberSold)

For the table, Quantity, the primary key is numberSold. For the new SaleDetails table, the primary key is still saleID.

From the parent SaleDetails table, the only dependee for numberSold is the product. From the parent SaleDetails table, the dependees for saleID are numberSold, unitSellingPrice, discount, and without product. And so the tables should actually be:

Quantity(numberSold, product)

and

SaleDetails(saleID, numberSold, numberSold, unitSellingPrice, discount)

At this point, the proprietor makes the following comment:

“I do not think I will ever want to know the quantity of a particular product sold without wanting to know the saleID, which depends on the trio (buying customer, selling employee and the date of transaction).” You, the database developer, answers as follows:

“In that case, let us allow the SaleDetails and OrderDetails table at 3NF. After all, many commercial databases out there do not go beyond 3NF, and the businesses are comfortable. However, whenever you want to know that for a similar table, do break the parent table into BCNF tables.”

## Fourth Normal Form

1NF, 2NF, 3NF and BCNF rely on functional dependency. 4NF relies on a special kind of functional dependency, which is rather “disturbing”, especially if not well understood. This rather disturbing dependency is called Multivalued Dependency.

Functional dependency is simply called dependency. However, multivalued dependency is not simply called dependency, as that would bring confusion. It is called Multivalued Dependency.

You, the database developer tell the proprietor the following, “It should interest you that for a table to be in 1NF, each cell must have a single value. A somewhat similar problem occurs here, but with a table that is already in BCNF after 3NF. Multivalued dependency is with the composite primary key, and each cell in the whole table already has a single value. In the 1NF issue, the multiple values in a cell do not have to concern a key. With the 4NF issue, there are at least three columns. If there are only three columns, then the three columns form the primary composite key. In this case, the first column of cells may determine the second column of cells, but the second column is independent of the third column. 4NF does not allow this.”

In other words, the issue may be that the second column depends on the first column, and the third column still depends on the first column and has nothing to do in terms of dependence on the second column. 4NF does not allow this.

Before continuing with the explanation of multivalued dependency, how the issue of the fourth normal form can arise should be explained first.

## How Issue of 4NF Can Arise With Convenience Shop

Imagine that after sometime, you have had rivals (other convenience shops) in your neighborhood, and you are not selling as much as you were selling before. This could not be foreseen when you started the convenience shop.

It occurs to you that if you can reduce your prices, not below the cost price, of course, for your customers who buy most, then they will buy more, and your sales and profit would increase from the dropped level. This means you have to know where such customers leave and their addresses (streets) so that you can even deliver products to their houses. Again, this problem was not foreseen at the beginning.

And so you come up with the following table, which you will work through, up to BCNF, before the issue of 4NF can show itself:

Delivery

For you, the proprietor, your interest is the category of the product, the product itself to be delivered, and the address to deliver to. Looking at the whole table, the rest of the row sections beginning from customerName to the right end determine the first three columns. In other words, the first three columns form the primary key for the rest of the columns. That is, the first three columns depend on the rest of the columns by rows. And so, the first three column headings have to be underlined with single lines. With that, this table is now in its first normal form. Each combination of horizontal cells in the first three columns is unique.

This table is also in the second normal form because each combination of horizontal cells in the first three columns is unique, and there is no partial dependency (with repeated groups). However, it is not in the third normal form because the row sections (parts) beginning from customerName to the right end determine customerID (customerID depends on them). So all that has to be removed, leaving you with a copy of the customerID. customerID is now both a foreign key as well as part of the primary key. The table is now in 3NF.

Before you, the database developer and trainer, could continue, the proprietor says, “Instead of working with the category column, product column and address column, I will work with the category column, product column and customerID column, since once the customerID is known, the address can be known, and that would be more convenient, especially for the computer.”

Your reply, “That’s good, proprietor. You are on track. That is what will be done.”

Remember, the Customer table already exists. This was produced in the previous part of this tutorial series. So, the only table to produce now is a table just having the three primary keys.

Category Delivery Permutations, by Product

Unfortunately, the columns of this composite primary key are not in 2NF among themselves. There are repetitions of cell values going downwards, with partial dependency within the composite key. These repetitions are not as constructive as they seem.

Notice that Confectionery is associated with customer 1, and it is also associated with customer 2. Soft Drink is associated with customer 1, and it is also associated with customer 2.

If customer 1 has asked for sweets today, he will ask for chocolate tomorrow (not shown in the table). Confectionery is associated with customer 1 through Sweets on the table, but it can also be associated with customer 1 through Chocolates in deliveries tomorrow. If customer 1 has asked for sprite today, he will ask for coca-cola tomorrow. Soft Drink is associated with customer 1 through Sprite, in the table, but it can also be associated with customer 1 through Coca-cola. The same delivery issue occurs with customer 2. This kind of repetition is called multivalued dependency.

Notice that in the above table, the Product column does not have repetition (at least for now).

To solve this problem, it is best to first put these repetitions of the column values, of the primary key, in the first normal form to result in the following:

Complete Category Delivery Permutations by Product

Permutation means to change the order of an arrangement. In this situation, it means to give all the different possible orders of products in the Product column. Now, there are more repetitions (more redundancy) in this table than in the above. The good news is that these three columns are now alright, in First Normal Form. 2NF and 3NF must be envisaged for these three columns.

Recall that delivery was not foreseen at the very beginning when the shop started (went operational). If it had been foreseen, then the very first transaction table in the first part of this tutorial series would have been like this:

Notice that the multi-values in each cell in this Product column have more factors taken into consideration than what happened in the very first transaction table in the first part of the series. Bringing this table into first normal form would result in the previous table, which is reprinted here, for easy reference to the reader, with some modifications in the Product column:

The two first-normal-form tables are the same because the permutation in the Product column is relative to the customerID column. Do not forget that each customerID here identifies a row in the Customer table.

The definition of multivalued dependency means that if there are three columns, X, Y and Z, and for a particular row of values x, y, and z respectively, a multivalued dependency X ->> Y, signifies that if we choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that can occur in the table, as done above, we will find that xc is associated with the same y entries, regardless of z. So essentially, the presence of z provides no useful information to constrain the possible values of y.

In the above table, xc, for example, is Confectionery, and a corresponding y value is Sweets. The combination of Confectionery and sweets has nothing to do with the customerID column, where the value maybe 1 or 2 in the rows. If x is taken as Soft Drink, a corresponding y value would be Sprite. The combination of Soft Drink and Sprite has nothing to do with the customerID columns, where the value maybe 1 or 2 in the rows.

Seen from a second normal form and functional dependency points of view, the Category column depends on the Product column and also depends on the customerID column but does not depend on both columns when combined. The values in the Product column repeat, determining the values in the Category column; and the values in the customerID column repeat, determining the values in the Category column; but both these columns together do not determine the values in the Category column.

So the table has to be split into two, with the Category and Product columns going one way and the Category and customerID columns going the other way, as follows:

## Category – Customer table

These two children tables are now in 1NF, 2NF, 3NF, BCNF and most importantly, 4NF. The category-Product table has composite primary keys. The Category-Customer table also has composite primary keys. Each of the column keys is either already a primary key in some other table in the database, or it is a foreign key in some other table in the same database.

These two tables replacing the parent table, are not the only tables in the whole database. In fact, they are related to other tables in the database. So there is still some housekeeping work in the database design project to be done concerning the whole database and these two new tables.

Referring to the Category-Product table, remember there is already a Products table in 4NF and a Category table in 4NF in the database. For a table to be in any normal form, it should not violate any of that normal form’s rules. The Products or Product table has productID as its primary key and category or categoryID as a foreign key.

So the Category-Product table produced here, in 4NF, should be abandoned since the following two tables, which are in 4NF, have all its information (and more):

Categories(categoryID, categoryName, description)
Products(productID, categoryID, supplierID, productName, unitPrice, quantityInStock, reorderLevel)

On the other hand, the Category-Customer table in 4NF cannot be abandoned, as it comes with some new relationships on delivery. In fact, the table should be better called, CategoryDelivery. Calling it ProductDelivery would be misleading for database programmers but would not be misleading for the customers, who are illiterates in the database. So call it CategoryDelivery. In table notation form, it is:

CategoryDelivery (Category, customerID)

Remember that each customerID refers to a row in the Customers table. The composite primary key is:   {Category, customerID}. Since there is already a Categories table with categoryID, this table should actually be:

CategoryDelivery (categoryID, customerID, category)

where categoryID depends on category (name) and vice-versa.

So the issue of delivery brings in an extra table in 4NF. The rest of the tables in the database are already in 4NF, as they do not violate the 4NF rules stated below.

The delivery issue was not foreseen before normalization in their different classes started, at the beginning. If it had been foreseen as indicated above, then the CategoryDelivery would have arrived; at or before the Third Normal Form stage without any mention of 4NF.

## Fourth Normal Form Rules

A table is in 4NF if it does not violate the following rules:

1. It is already in Boyce-Codd Normal Form.
2. The table does not have any Multi-valued Dependency.

This means a table can be designed the first time, and it is already in 4NF.

## Fifth Normal Form

5NF situations rarely occur, but when they do occur, Fifth Normal Form must be taken into consideration in order to avoid any known accounting problem. The fifth Normal Form is also known as the Projection Join Normal Form. With the fifth normal form, there are at least three columns. If there are only three columns of interest, then there is one composite primary key, which is made up of the three columns.

Imagine that you, the proprietor, have had up to two convenience shops in the same city and are supplied by the same set of suppliers. Call these shops, shop1 and shop2. The suppliers see these two different shops as two different customers. So here, customer has a different meaning. It means to shop and not the person. The meaning of the product stays the same.

So there is a table with the primary keys: supplier, product and customer. That is:

Table (supplier, product, customer)

The 5NF issue comes up when a given supplier can supply a given product to more than one customer (the two shops). Also, a given customer can receive a given product from more than one supplier. And a given customer can receive from different suppliers different products. That is, any one of the three partners can do the same things to the other two partners.

That is, one supplier can correspond to more than one customer. One customer can correspond to more than one product. And one product can correspond to more than one supplier. This means that the following three binary tables can result:

Supplier-Customer table
Customer-Product table
Product-Supplier table

If a parent table can be broken down into three smaller tables without any loss or addition of information, and if when the tables are joined back, there is still no loss or addition of information, then the parent table should be broken down. The smaller tables are in 5NF. In that case, join dependency has been eliminated. Loss of information means losing row relationships, and addition of information means adding new row relationships.

If this breaking down and putting back together would lead to loss or addition of information, then the table should not be broken down. In that case, the parent table is already in the Fifth Normal Form.

At this point, you, the database developer and trainer, say, “Proprietor, I leave the putting of data into the tables (parent table and three small tables) as an exercise for you. I will check that tomorrow.”

## Fifth Normal Form Rules

A table is in 5NF if it does not violate the following rules:

1. It is already in the Fourth Normal Form.
2. The table does not have Join Dependency.

This means a table can be designed the first time, and it is already in 5NF.

## Conclusion

A table is in Boyce-Codd Normal Form if the following rules are obeyed:

1. The table should already be in the Third Normal Form.
2. No non-prime attribute (column) should depend on the part of the composite primary key.

A table is in 4NF if it does not violate the following rules:

1. It is already in Boyce-Codd Normal Form.
2. The table does not have any Multi-valued Dependency.

A table is in 5NF if it does not violate the following rules:

1. It is already in the Fourth Normal Form.
2. The table does not have Join Dependency.

The proprietor now says, “This calls for a big celebration for both of us.”

You, the database developer, replies, “Why don’t we wait until tomorrow when I will put all the tables together and improve on the Products table?”

The proprietor reacts by smiling, “What would I do without you?”

You, the database developer, add, smiling, “See you tomorrow then, in your office.” and leave.