SQL Standard

The Third Normal Form

This is part three of the series, Five Normal Forms. The titles of the first two parts (tutorials) are First Normal Form, followed by Second Normal Form. In this part of the series, Third Normal Form is explained.

The explanation follows the story line: A father has died and has left some money for his son. The son decided to invest the money in a convenience store. A convenience store, also known as a convenience shop, is a small retail business that receives everyday items from suppliers and sells them to individual customers in the neighborhood.

At this point, the shop is already stocked, and some sales have already been made. The son, who is the proprietor of the business, has some employees, who are called clerks in this tutorial. The proprietor and any employee can receive supplies and make sales after recording the products.

However, before the shop started, neither the proprietor nor the employees knew anything about normal forms. So, they were recording everything as transactions in one table and one exercise book. They did not have a computer.

You, the reader, have completed the five parts of this tutorial series; you are now a database developer. The proprietor of the convenience shop is your friend. You visited the shop two days ago and trained the proprietor and the clerks on producing a table in its first normal form. You also visited the shop yesterday and trained them on how to create a table in the second normal form from the first normal form.

Today, you have just arrived at the shop for a visit to train them on how to produce a table in the third normal form from the second normal form. All the tables they have at present are in the second normal form. The tables (by name and column headings) are:

Products(productID, categoryID, product)
Categories(categoryID, category)

Sales(saleID, customer, employee, date)
SaleDetails(saleID, productID, numberSold, sellingPrice)

Orders(orderID, supplier, employee, date)
OrderDetails(orderID, productID, numberBought, costPrice)

The single or composite keys are underlined.

After summarizing what was taught in the previous two days and before you could do anything, the proprietor asks:

“What about phone numbers, addresses, etc., for customers and employees?

What about quantity in stock, reorder level, etc., for products?
Do they need their own separate tables, or should they be fitted into the present tables?”

You, the database developer, answer:

“Congratulations, Proprietor! You have indirectly introduced the issue of Third Normal Form.”

You continue.

Other Necessary Columns

Other necessary columns are first added to the previous tables, which are in 1NF and 2NF. Some of the previous column names are modified.

As a minimum, the Categories table should have the following columns:

Categories(categoryID, categoryName, description)

The description is a short paragraph that describes the category. This categories table is already in 1NF, 2NF, and 3NF. The 3NF is explained below:

As a minimum, the Products table should have the following columns:

Products(productID, categoryID, supplierID, productName, unitPrice, quantityInStock, reorderLevel)

As each product is being sold, a low level (number) of the products will be reached when the product has to be reordered, so customers should not come to the shop and not have the product. Such absence is not good for business. quantityInStock is the number of a particular product in stock. This includes what is in the store and what is on the shelf.

categoryID and supplierID are foreign keys. That is why they have dash underline instead of single underline. Foreign key is explained below. In the previous part of the series (Second Normal Form), categoryID was part of the primary key with a single underline due to how it was arrived at. However, from the explanation below, it would be clear that the categoryID should be a foreign key (with a dash underline).

This products table is already in 1NF, 2NF, and 3NF. See why it is in 3NF below:

As a minimum, the SaleDetails table should have the following columns:

SaleDetails(saleID, productID, unitSellingPrice, quantity, discount)

The discount value is expected to be zero most of the time. A discount is the discount the shop gives a customer.

As a minimum, the OrderDetails table should have the following columns:

OrderDetails(orderID, productID, unitCostPrice, quantity, discount)

The discount value is expected to be zero most of the time. The discount here is the discount the supplier gives the shop.

As seen below, the Products table may be considered in 2NF or 3NF. The Sales and Order tables have the issue of 3NF. Only the Sales Table will be used to explain the issue and solution. The 3NF for Orders Table and Products Table follow similar reasoning and would just be quoted.

While adding columns, the Sales table would be:

Sales(saleID, dateSold customerName, phone, address, city, region, postalCode, Country, employee)

Seven columns have replaced the customer column in the original table. Since the customers are people in the neighborhood, the cells for the city, region(state), postalCode, and country columns, can be left empty, though they are not left empty in this article.

This Sales table is still in 2NF as both the 1NF and 2NF rules have not been violated. However, it should be realized that in a Sales table row, the customer (name) has been replaced by seven customer row cells.

Note: an address cell has the house number, name of street or road, and name of the town, all separated by commas. A city can be considered as made up of several towns. Though commas separate these particular string components, they form one cell value and not three cell values.

The employee column also has to be replaced by seven such columns. However, that is not done in this tutorial to save teaching time and space. So, a Sales table with data can be:

Sales Table – 2NF – Without customerID

The data type SaleID column is an integer or, better, auto-increment. The data type of the dateSold column is a date and not a number because it has the character “/”, which is not a digit. The data type for the rest of the columns, including the phone column, is string (or text). The phone value has the character “-”, which is not a digit.

Note that for each row, customer (name), as was in the previous part of the series, has been replaced by seven cells, one of which is still customer-name. This means that customer data is an entity. Currently, the customer-name identifies its other six data in a row. If this table is programmed, it will be convenient to identify the customer entity in each row with an integer (not auto-increment). In that case, a customerID column should precede the customerName. The previous table becomes:

Sales Table – 2NF – With customerID

There are three customerIDs: 1, 2, and 3, with 1 occurring five times for John Smith, 2 occurring two times for James Taylor, and 3 occurring once for Susan Wright.

Note that some customerIDs and their dependees repeat.

Rules for Third Normal Form

A table is in Third Normal Form if it abides by the following rules:

  1. It should already be in the Second Normal Form.
  2. And it should not have Transitive Dependency.

Then one of the clerks (employees) asks, “What is a transitive dependency?”. And you, the database developer, reply, “That is a good question!”

Transitive Dependency

It is true that in a row, SaleID identifies all the values in the row; however, customerID identifies its seven data values but does not identify the rest of the values identified by SaleID in that row. Put another way, the SaleID depends on ten cell values in each row. However, the customerID depends on seven cell values in the same row, but the customerID does not depend on the SaleID and the other values that SaleID depends on.

Such dependency for the custumerID is transitive dependency. And customerID is called a foreign key and is dash underlined in this tutorial series, The Five Normal Forms.

Suppose a non-prime attribute (non-primary cell value) depends on other non-prime attributes, and the non-prime attribute in question (e.g., customerID and its dependees) does not depend on the primary key and the rest of the cell values in the row. Then that is transitive dependency.

The previous Sales table with the foreign key and its dependees, would cause accounting problems (anomalies).

Sales Table From 2NF to 3NF

To solve the problem posed by the foreign key and its dependees, remove the foreign key and its dependees, to form a new table without repetitions. However, even if the foreign key does not depend on the primary key, the primary key depends on the foreign key. So, a copy of the foreign key must remain in the parent table. The new sales table, at this point, is 1NF, 2NF, and 3NF compliant; it is a parent table. The new child table from the previous Sales table is also 1NF, 2NF, and 3NF compliant. The name of the child table with foreign key and its dependees, is Customers. If a suitable name cannot be found, then something has gone wrong with the analysis. The new Sales Table in 3NF is:

Final Sales Table in 3NF

This table in 3NF has the same number of rows as the one in 2NF but with fewer columns.

The table notation for this final Sales table in 3NF is:

Sales(saleID, dateSold, customerID, employeeID)

The saleID is the primary key with a single underline. customerID is a foreign key, with a dash underline. employeeID is also a foreign key with a dash underline. Note that the employee situation in the Sales table in 2NF is the same as the customer situation. The employeeID and its own dependees have to be pulled off to form another table; a copy of the employeeID remains.

Note: saleID, customerID, and employeeID do not form a composite key. saleID is dependent on customerID and employeeID.

The relationship between saleID and customerID is many-to-one.

The customer table in 3NF

This table has three rows instead of 9 rows in the 2NF Sales table. In this table, customerID is a primary key. It is the same as the foreign key in the Sales table, but without repetitions. The foreign key in the Sales table and the primary key in the Customer table link both tables.

The repeated rows in the Customer table have been removed in order not to violate 1NF.

As the reader can see, putting a table in 3NF would also solve the problem of repeated rows (redundancy).

The table notation for Customers Table is:

Customers(customerID, customerName, phone, address, city, region, postCode, country)

The Products Table Revisited

The products table given above in notation form is:

Products(productID, categoryID, supplierID, productName, unitPrice, quantityInStock, reorderLevel)

The primary key here is productID. categoryID and supplierID are foreign keys. Similar to the Customer table, there is a Categories table, where categoryID is the primary key, and there is a Supplier table, where supplierID is the primary key.

If the values for the cells for unitPrice, quantityInStock, and reorderLevel will remain fixed, then the Products table, as it is, is truly in 3NF. If these values will be changing, then the Products table, as it is, is in 2NF. In this part of the tutorial series, it is assumed that those values remain fixed over time.

All the Tables

All the tables are now in 3NF. They are shown as:

Employees(employeeID, name, phone, address, city, region, postCode, country, birthDate, hireDate, dateReleased)

Suppliers(supplierID, name, phone, address, city, region, postCode, country)

Products(productID, categoryID, supplierID, productName, unitPrice, quantityInStock, reorderLevel)
Categories(categoryID, categoryName, description)

Sales(saleID, dateSold, customerID, employeeID)
SaleDetails(saleID, productID, numberSold, sellingPrice)
Customers(customerID, customerName, phone, address, city, region, postCode, country)

Orders(orderID, dateSold, supplierID, employeeID)
OrderDetails(orderID, productID, numberBought, costPrice)

Up to nine professional tables have been produced from just one table produced by novices to prevent redundancy and accounting problems (anomalies from the insert, delete and update). The novice table alone would lead to financial losses.

Testing the Staff

At this point, all the employees, including the proprietor, should have understood 1NF, 2NF, and 3NF. However, they have to be tested. All of them, including the proprietor, will sit in different places and complete the test. The test consisting of one question, will take one hour, and it is as follows:

Question: Using rules for 1NF, 2NF and 3NF, prove that all the above nine tables are already in First Normal Form, Second Normal Form, and Third Normal Form. The customers and suppliers do not have to be real entities. Data for tables should back up the table notations.

While they are completing the test, you, as the database developer, go out to have a snack and a beer, to return after one hour.

The Near and Far Future

While you, the database developer, are out, you also consider what advice to give them if they all pass the test.

Also, while you were training them, and now that they are taking the test, customers have been coming and leaving without being served. That is not good for business, and you, the database developer, know that. Some customers may go to the competitor shops and never come back.

You, the database developer, are 30 years old. The proprietor, as your friend, is also 30 years old. The clerks (employees) are between the ages of 18 and 24. All the qualities they needed to work for the proprietor were: to be healthy, to be able to read and write, to be able to add, subtract, multiply, and divide, and to be able to use the computer and the Internet.

When a table is in 3NF, most vulnerabilities have been removed from the database. Many commercial databases do not go beyond 3NF, and the firms or companies are comfortable.

So, if all of them pass the test, you will ask the clerks to go and continue working. You will also advise them to save parts of their salaries so that they can own their convenience shops. You will continue tomorrow to train only the proprietor in 4NF and 5NF. With the knowledge of 4NF and 5NF, all known vulnerabilities are removed.

Evaluation

After one hour, you, the database developer, come back. You mark their scripts. A piece of excellent news! They all, including the proprietor, have 100% each. Hurray! That is excellent!

So congratulations to all of you: the teacher and the students.

There is nothing left to do in this tutorial other than to conclude.

Conclusion

A table is in First Normal Form, if it does not violate any of the following rules:

  1. All the columns in a table should have unique header names.
  2. Each cell must have only a single value.
  3. Values stored in a column should be of the same type.
  4. The rows should be distinct.
  5. The order of the columns or rows does not matter.

A table is in Second Normal Form, if it does not violate any of the following rules:

  1. The table must already be in First Normal Form.
  2. There must be no partial dependency.

A table is in Third Normal Form, if it does not violate any of the following rules:

  1. It must already be in the Second Normal Form.
  2. And it must not have Transitive Dependency.

You, the database developer, tell the clerks that they have learned enough. You provide advice and ask them to return to work and stay at their stations by default.

You set an appointment with the proprietor only, to take place in his office tomorrow for training on 4NF and 5NF.

About the author

Chrysanthus Forcha

Discoverer of mathematics Integration from First Principles and related series. Master’s Degree in Technical Education, specializing in Electronics and Computer Software. BSc Electronics. I also have knowledge and experience at the Master’s level in Computing and Telecommunications. Out of 20,000 writers, I was the 37th best writer at devarticles.com. I have been working in these fields for more than 10 years.