Second Normal Form: The Five Normal Forms (Part 2)

This tutorial explains the Second Normal Form, related to relational database. It is the second part of the series: The Five Normal Forms. The explanation of these five normal forms follow a story line, which starts as follows:A father died and left some money for his son. The son decided to use the money to open a convenience shop. The son employed a few workers (clerks). Everything in the shop is already stocked, and the staff started selling to some customers. At the start of the functioning of the shop, which is not long ago, the son, who is the proprietor, and his workers, knew nothing about the normal forms.

You are already a friend of the son, the proprietor. You finally completed this tutorial series on normal forms, you know everything about normal forms, and you have become a database developer. You visited your friend’s shop yesterday and you noticed that they had just a transaction table, which did not comply with the First Normal Form. You then taught both the proprietor and his clerks on how to produce the transaction table in the First Normal Form. They understood and the table they now have in the First Normal Form is:

This is the transaction table. There is one column for the primary key, which is TransID (auto-increment).

Since any First Normal Form may have vulnerabilities, you visited them (the shop) today to train them on the Second Normal Form in order to reduce the vulnerabilities.

Before you start to do anything, the proprietor asked:

“Is this current transaction table not too big?

I have heard that the companies that do not use computers record their data in different ledgers.

I think this transaction table actually consists of two smaller transaction tables, which can have new names of Sales Table and Order Table.”

You reply while smiling:

“Yes, the current transaction table is too big.”

You went on to congratulate him for being positively inquisitive, especially concerning his business. You agree with his thoughts. You carry on to say that there are actually three main tables there: The Sales table, the Order Table and the Products table. When the Sales and Order tables are identified, they will replace the Transaction Table. The Sales Table and the Order Table are smaller transactions (halved) tables in themselves. The product table is clearly an entity. It has to be obtained from the previous big transaction table.

However, these three tables obtained from the previous big table would still be in the First Normal Form. The Second Normal Form addresses the problem of repetition (redundancy), and that is not what is to happen here.

Note: The Second Normal Form may not address all the repetition (redundancy) issues.

You, the database developer, and a friend to the proprietor, go on to produce the three tables with their participation as follows:

The Main Entities

The main entities are the three main tables mentioned previously. In the previous column, with the heading, Action has either Sale or Order. Sale means that the product in that row was sold to a customer. Order means that the product in that row was ordered from a supplier. A convenience shop has to order products before selling them.

The columns corresponding to the three tables are as follows:

Sales (saleID, product, customer, employee, sellingPrice)

Order (orderID, product, supplier, employee, costPrice)

Products (productID, product, category)

Each table notation has an underlined primary key. Each of these tables is still in the First Normal Form. And so, each key number is related directly to its row values.

In the Sales table, the saleIDs are not the copies of the TransIDs. In the Order table, the orderIDs are also not the copies of the TransIDs. Each of the saleID and orderID column is auto-increment, beginning from 1. The transaction table and its row IDs are no longer important since all the information in the Transaction table are now in these three tables. The SaleID and OrderID columns replace the Transaction table, but not by copying. The Sales table does not have the supplier column and the Order table does not have the customer column, which were together in the Transaction table. Also, neither table has the Action column since selling and ordering is now separated, and the two values of Sale and Order are no longer needed.

The category column, which was in the transaction column, is neither in the Sales table nor in the Order table. It is in the Products table. This is because the category value in each row is only related to the product value and not to the other cell in the row. The category column has to go out of the Transaction table along with the product column to obtain the Products table. This reasoning has something to do with Transitive Dependency in the production of the Third Normal Form which will be discussed in the next tutorial.

The Products Table obtained from the previous Transaction table is:

Products Table

Note that this table does not have any repetition of product name (downward) that might have been in the Transaction table. Also, each product value is present and each category value is also present.

This table is still in 1NF. The dependence for the values of a row is related to the primary key only, so far. The values in the category column repeat as the column is descended. Confectionery repeats twice. “Soft Drink” repeats four times. “Dairy Product” repeats three times. Repetition is redundant and causes accounting problems. Putting a table in 2NF means removing many repetitions. This removing of repetitions is not done arbitrarily.

The Sales Table obtained from the previous Transaction Table is:

Sales Table

Four new rows are added, and there has been some modification in order to make the explanation better. A date column is introduced for the same reason. Now, there are 15 rows in this Sales table instead of 11 for the Transaction table.

Business Rules

What is a sale? If the same employee sells to a particular customer on the same date at least one product, that is one sale. Even on the same day (date), if a customer comes twice and is served by two different employees, then those are two sales. If a customer comes twice on the same day and is served by the same employee, even if the two sets of products are different, both comings form one sale. In one sale, a customer can buy one or more products. In other words, in one sale, the trio (customer, employee and date) must be the same. Once one of these trio values changes, that is another sale. Different sales are identified by different saleIDs in the previous table. And so, the saleIDs repeat. Different column values repeat in their columns.

In the first sale which has two rows and same saleID, John Smith, a customer, bought one sweets and one sprite from Jacob Jones on the same date.

On the morning of 09/06/22, James Taylor, a customer, came and bought two yogurt and one coca-cola. That is one sale. It takes three rows in the table with same saleID.

On that same day, but in the afternoon, the same James Taylor came and bought pepsi, but from a different employee which is Peter Lewis. The trio now has a change with a different employee. And so, this is a different sale caused by a change in one of the trio. Since this is a different sale, it has a different row in the table with a different saleID.

On 09/08/22, Susan Wright, a customer, came and bought two cheese and one milk from Mary Baker. That is one sale because the trio remains the same (in the three rows). However, it takes three rows in the table. Since the trio remains the same, the saleID also remains the same.

The rest of the rows downwards in the table do not have the same saleID repetition. This table is still in 1NF. The dependence so far for the values of a row is still related only to the primary key of that row. Each column has repeated values. The repetition in a column must not necessarily be in consecutive cells coming down.

See when to consider the price or sellingPrice column in the tutorial section of Handling Repetitions. Putting a table in 2NF solves the problem of common set repetitions (redundancy) across rows.

The Order Table obtained from the previous Transaction Table is:

Order Table

This table is still in the First Normal Form. There is the possibility for any value in any column to repeat below its column. These repetitions are addressed in this tutorial to have the Second Normal Form of the table.

At this point, you, the database developer, agreed with the suggestion of the staff to put the Transaction table into smaller tables. And you put the Transaction table into smaller tables (entities) in a convenient way. The staff, including their proprietor, now believe that they too have the potential to completely understand the Normal Forms and are willing to learn more because their suggestion has materialized.

However, you, the database developer, insists to them that the original Transaction Table no longer exists and has been replaced by the three smaller tables. The Sales and the Order table essentially replace the Transaction Table. The TransID (Transaction ID) is no longer relevant. It is replaced by the saleID and orderID in two different tables.

The main entities which are now smaller tables of the original Transaction Table are: the Products table, the Sales table, and the Order table. You, the database developer, continue to explain and insist to them that these new but smaller tables are still in the First Normal Form. Replacing a table by its main entity tables is not normalization because none of the types of dependence definitions is used in breaking down the big table. You, the database developer, go on to put the three tables in the Second Normal Form as follows:

Handling Repetitions

The Products Table

In the Products table, the values of the category column repeat. All the names (values) of the category column have to be removed from the products table into a Categories table where there would be no or limited repetitions. The Categories table becomes:

Categories Table

The Categories table no longer has any item such as “Soft Drink” that repeats. This table is shorter vertically than its placement in the previous Products table.

Any table needs a primary key. So far, the Categories table consists of one column where all the values are unique and there is no empty cell or null value. This column can be the primary key column for the Categories Table. However, it might be better to have an auto-increment primary key. The following modified Categories table shows this:

Categories Table – 2NF

This is the final categories table. It is now in 1NF and 2NF. What about the original Products Table? A new Products table needs to be created. In the new table, any category name in the original table will be replaced by the corresponding ID in the Categories table. So, the Products Table becomes:

Products Table – 2NF

The category column is replaced by the categoryID column. The information for the category values is still there in the Products table as categoryIDs. The category column is placed just after the productID column instead of after the product column for better presentation. This table is not shorter than the original Products table but it still has an advantage.

“What is the primary key of the Products table?” asked by one of the clerks. Note that in each row, the productID and categoryID are both dependent on the value of the product name (value). If either the productID or the categoryID is changed for any row, the new combination of productID and categoryID would point to a different product name (value). In other words, a product name (value) in a row is related to both the productID and categoryID of that row. Because of this dependency (functional dependency) of the productID and categoryID combination on a particular product name (value), both the productID and categoryID form the primary key.

When a key is a combination of more than one column, the key is called a composite key. The table notation for this new Products table is:

Products(productID, categoryID, product)

The relationship between the productID and categoryID is many-to-one.

Each column name for the primary key (composite key this time) is underlined. The categoryID values in the Products table and the categoryID values in the Categories table make the exact correspondences between the two tables.

The table notation for the Categories table is:

Categories(categoryID, category)

The new tables that replaced the original Products table are: Categories Table and Products Table (same name). These tables are now in the First Normal Form and the Second Normal Form. See the actual rules for the Second Normal Form in the following discussion.

The Sales Table

The Row-sections where the saleID repeats but the column cell values do not repeat, have to be removed from the Sales table and be placed in a new table. In the new table, the identical row-sections (columns) where the cell values repeat along with the saleID in the sales table will not be included. That is, any cell value or row-section such as the trio (customer, employee and date) that must repeat with the same saleID in the Sales table will not be included in the new table even if the repetition is just once. The values of the product column that can change with the same saleID in the Sales table must be in the new table. A new column is introduced that has the number of the same products sold for a particular saleID. That maintains the new table in 1NF, taking it to 2NF. The new table is called SaleDetails Table. If the developer cannot find a new suitable name for the new table, something has gone wrong with his analysis. The SaleDetails table becomes:

SaleDetails – 2NF

The SaleDetails table, removed from the Sales table, is now in the Second Normal Form as well as still in the First Normal Form. The saleID from the original sales table has to be included in the SaleDetails table in order to maintain the relationship between the original Sales table and the new SaleDetails table. Now, there are 13 rows in the SaleDetails table instead of 15 from the original Sales table.

In the original Sales table, any column whose value did not change, while the saleID was not changing, remained in the original Sales table and was not removed. These are essentially the trio (customer, employee and date) in this situation. The product column values changed while the saleID was not changing so it has to be removed. If the price or sellingPrice column values change while the saleID is not changing, it also has to be removed.

Would everybody come to a shop and just buy only one tin of milk? No. For any customer who buys, say 4 tins of milk, the number 4 would just fit well in the numberSold column in the appropriated row.

“And what is the primary key of the new SaleDetails Table?” asked by the proprietor. This is your answer as the database developer:

Note that in each row, the saleID and product are both dependent on the numberSold and sellingPrice (values). If either the salesID or the product-name is changed for any row, the new combination of the saleID and product points to a different row of the numberSold and sellingPrice. In other words, a numberSold and sellingPrice row is related to both the saleID and product values of that row. Because of this dependency (functional dependency) of the salesID and product combination on a particular row, both the salesID and product form the primary key. The product should also be underlined.

When a key is a combination of more than one column, the key is called a composite key. The table notation for this new SaleDetails table is:

SaleDetails(saleID, product, numberSold, sellingPrice)

The relationship between the salesID and product is many-to-many.

“I intend to computerize the database. Since a product table already exist with productID, would it not be better to replace the product as part of the key with productID? Then the computer will use the productID from the SaleDetails table and look for the product-name from the Products table,” the proprietor remarks.

You, the database developer and trainer, smiles while nodding your head. And this is your reply:

“Proprietor, you are doing fine. You are understanding faster than I expected. When a database will be in exercise books (ledgers) only, wherever possible, it will have text names instead of numbered IDs. When a database will be in a computer, wherever possible, it will have numbered IDs instead of text names. The computer will connect the numbered IDs and text names in their tables and print the text names when a query is issued.

Let me have the honor to do the computerization; but for the computerization, you will pay me.” And so, the table notation for the SaleDetails Table becomes:

SaleDetails(saleID, productID, numberSold, sellingPrice)

“What remains of the Sales table?” asked by one of the clerks. The columns whose values did not change while the saleID was not changing remained in the Sales table. The saleID also remains because it “governs” each row in the Sales table. The new Sales table becomes:

Sales Table – Intermediate

The product and sellingPrice columns where there was change in value while the saleID was not changing were removed. Now, there are clearly some duplicate complete rows. Such duplicates have already been counted and recorded in the SaleDetails table in the numberSold column. In the actual Saledetails table, the count is either 2 or 1. So, the duplicates do not have to be taken into consideration in the new Sales table. If the duplicates are allowed, one of the rules of the First Normal Form would be violated. The new Sales table becomes:

Sales Table – 2NF

This new and final sales table is in 2NF as well as still in 1NF. No saleID occurs more than once in this table. There are 10 rows here, and not 15, when compared with the original Sales table. This new Sales table is shorter than the original one by five rows.

This final Sales table has just one primary key column which is the saleID. It is underlined. The saleID values in the Sales table and saleID values in the SaleDetails table make the exact correspondences between the two tables.

The table notation for the Sales table is:

Sales(saleID, customer, employee, date)

And the table notation for the SaleDetails table is:

SaleDetails(saleID, productID, numberSold, sellingPrice)

The new tables that replaced the original Sales table are: SaleDetails Table and Sales Table (same name). These tables are now in the First Normal Form and the Second Normal Form. See the actual rules for the Second Normal Form in the following discussion:

The Order Table

The analysis similar to that for the Sales table can be made for the Order table to have the new replacement tables:

Order(orderID, supplier, employee, date)

and

OrderDetails(orderID, productID, numberBought, costPrice)

At this point, you, the database developer, has just finished illustrating to the staff members including the proprietor on how the 2NF is produced from the 1NF.

The Proprietor now asks, “Is that how we shall form the 2NF table(s) from the 1NF table?” You, the database developer, answers as follows:

“Well, yes. However, whatever way you use to form a 2NF from the 1NF must abide by the rules of the 2NF.” You then go on to explain the 2NF rules.

Rules for the Second Normal Form

For a table to be in the Second Normal Form, it must respect the following two rules:

1) The table must already be in the First Normal Form.

2) There must be no partial dependency.

The Functional Dependency or simply Dependency is explained in the previous part of the series, the First Normal Form. The explanation is briefly be repeated here and then the partial dependency will be explained.

Functional Dependency

In any table in the First Normal Form, once a primary key is known, the rest of the values in the row of the primary key can be fetched. For example, in the very first table in the previous example, the values for the primary key number 10 are: Toothpaste, Toiletries, Cleaning-up Company, Peter Lewis, Order and 4. So, the key number 10 depends on these values. A primary key uniquely identifies all its values.

Partial Dependency

Partial dependency is a situation with composite key where a non-primary key value in a row can have only part of the composite key, e.g. one of its cells depending on it. In the previous tables with composite keys, each non-primary key value in a row has both cells of the primary key depending on it. The second rule for the 2NF says that there must be NO partial dependency. And there is no partial dependency in any of the previous tables.

Both cells of the composite key depend on each value in the row for all the above tables with composite keys. If it were partial dependency, one cell in the composite key would depend on some values in the row, and the other cell of the composite key would depend on the other values of the same row.

Production of 2NF from the Products Table and Sales Table Compared

The Products table has a length (going downwards) limit. The Sales Table does not have a length limit because it is a transaction table. However, this difference is not what necessarily gives both tables their different ways of obtaining the 2NF tables.

In the given 1NF Products table, the categories repeat downwards. The category column is removed to form a new table of limited length and the composite key goes to the parent table, the Products Table.

In the given 1NF Sales table, the saleID and corresponding set of other cells of the same row repeat downwards. The rather non-repeating columns were removed to form a new table and the composite key goes to the child table, the SaleDetails Table. Both the Sales and the SaleDetails tables are of unlimited lengths.

At this point, you, the database developer who trains the staff including their proprietor, asks all of them to verify if all the new tables are truly in the First and Second Normal Forms. They should do that successfully and answer yes.

And then, you conclude.

Conclusion

A table is in the Second Normal Form if it abides by the following rules:

1) The table must already be in the First Normal Form.

2) There must be no partial dependency.

For all the tables with composite primary keys, all the non-primary key values in a row determine each of the primary key value of that row.

Taking a table from 1NF to 2NF would involve handling a major repeating group (set of cells).

Though some vulnerabilities have been eliminated, a table in 2NF still has other vulnerabilities. More of these vulnerabilities will be dealt with in the next tutorial (article) on the Third Normal Form.

From the questions that the staff members are asking and the feedback from them, it shows that they have understood everything that they have been taught so far. You have to congratulate them before leaving and be back again to discuss about the Third Normal Form.

Second Normal Form: The Five Normal Forms (Part 2)

The Main Entities

Products Table

Sales Table

Business Rules

Order Table

Handling Repetitions

The Products Table

Categories Table

Categories Table – 2NF

Products Table – 2NF

The Sales Table

SaleDetails – 2NF

Sales Table – Intermediate

Sales Table – 2NF

The Order Table

Rules for the Second Normal Form

Functional Dependency

Partial Dependency

Production of 2NF from the Products Table and Sales Table Compared

Conclusion

About the author

Chrysanthus Forcha