SQL Standard

First Normal Form: The Five Normal Forms (Part 1)

Normalization is a database design approach to systematically decompose the tables in order to remove the data redundancy and prevent anomalies that can accompany the insertion, update, and deletion of data. There is the First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF), and Fifth Normal Form (5NF). The Sixth Normal Form is still being developed (still in research state) and will not be discussed in this series. Boyce-Codd Normal Form is like the 3½ (3.5) Normal Form. A relational database consists of related tables, which should have been normalized.

This article (tutorial) explains about the First Normal Form. This is the first part of the series, The Five Normal Forms. The example database used is that of a convenience shop. In Britain, a convenience shop is called a corner shop. In some countries, it is called a provision shop.

Un-normalized Transaction Table

A transaction is the buying of products by a customer, from the shop, or receiving products by the shop from the suppliers, for sale. It is assumed in this part of the series (in this tutorial) that the clerk who records the transactions is not trained for such recording. There may be more than one clerk who know nothing about normalization. If there are more than one clerk, all the clerks record in the same exercise book and none of them knows anything about normalization. The proprietor (shop owner) also knows nothing about normalization. His father has just died and left him some money, and he decided to invest in a convenience shop. So, the clerk (or clerks) start(s) with one table to record all the transactions. The table has the following column headings where Trans means Transaction:

    Trans (Product, Category, Customer, Supplier, Employee, Price)

Trans refers to a row in the table. The table is a transaction table. The examples of products sold in the shop are: sweets, chocolates, sprite, coca-cola, fanta, pepsi, milk, yogurt, cheese, toothpastes, tissues, diapers, candy, chips, and nuts. Any of these names can appear in the Product column.

These products fall into categories which are: Confectionery, Soft Drink, Dairy Product, Toiletries, and Snack Food.

The Customer column has the names of the persons who buy from the shop. The Supplier column has the names of the Companies that supply the products to be sold by the shop. The Employee column has the name of the clerk that sells a product or receives a product from a supplier and records it. The Action column indicates whether a sale was made to a customer or an order was made (purchased) from a supplier.

Each Price is the total price of the products sold or bought in one transaction.

And so, the first rows of the table are:

Before continuing, know that table columns in a database table are also called attributes. The rows with values can be referred to as objects.

Now, notice that in the Price column, the currency is not indicated. Whether the currency is dollars or euros or some other currency, it does not have to be indicated. However, it is indicated in some comment area which is not part of the table.

From the table: There are three clerks so far, excluding the proprietor, and one of them has carried out the transaction twice; There are three customers so far, and one of them has bought twice; There are four supplier companies so far, and each is concerned with one transaction.

Redundancy:

Note that “John Smith” appears as value in two rows in the same column. Repetition of the same value in the same column is redundancy which  can lead to accounting problems.

First Normal Form Rules

For a table to be in the First Normal Form, the following rules must be respected. Otherwise, there would be accounting problems:

1) All the columns in a table should have unique header names.

2) Each cell must have only a single value.

3) The values stored in a column should be of the same type.

4) The rows should be distinct.

5) The order of the columns or order of rows do not matter.

You, the reader, who finally completes this series and knows everything about normal forms, have become a database developer. The proprietor of the convenience shop is your friend. You decided to visit your friend’s business site (shop).

You look at the previous table that they produced. You reflect for some time, then shake your head, saying, “hm-hm-hm”. You then tell your friend, the proprietor, and his workers, that not all the rules of the First Normal Form have been respected; and that would lead to accounting errors.

You then set out to teach them the rules of the First Normal Form, examining the given five rules as they are related to the previous table, one-by-one.

Rule 1: All the columns in a table should have unique header names.

Do all the columns of the previous table have unique names? Yes, and that is alright. The members of the staff are not unintelligent, after all. They simply have not been educated on normal forms. Rule 1 has not been violated and that does not pose a problem.

Rule 2: Each cell must have only a single value.

A cell is the intersection of a row and a column. Looking at the last attribute (column) with the heading, Price, there is a number in each cell and there is only one number per cell. That is alright. Each cell there is single valued.

Looking at the employee column, each cell has one name of a person. One name consists of the first and last name, forming one string (text). That is alright. Each cell in the column has one string and so, each cell has one value.

The cells for the Customer and Supplier columns has a single string each and so, a single value. That is alright. In the Customer column, “John Smith” is in two different cells (rows). That is redundancy (repetition) and its problem will be discussed later in the tutorial for 2NF.

In the Category column, each cell should have one category. One category text (one value) is one string. However, in the second cell from the top, there are two categories (two strings): “Soft Drink” and “Dairy Product” separated by a comma. These are two values in one cell. The rest of the cells in the Category column have single values, but this cell has two. The rule that each cell must have only a single value, has been violated here in the second cell. This would lead to accounting problems. Since there are two values in one cell, the solution is to split the row of data into correspondingly two rows, putting the two values in two cells, two rows, in the same column as follows:

The two categories which were in one cell, have been put in two cells in the same column. However, the other cells have been repeated in their columns, introducing more redundancy. In the course of solving one problem, repetitions (redundancy) were added, adding another problem. The solution to repetitions will be discussed in the next tutorial on the Second Normal Form.

Looking at the first column on Product, the first cell has two values: “sweets” and “chocolates” separated by a comma. These are also two products. The solution is to split the row into two, putting the two values in two different cells in the same column and unfortunately also doubling the other cells.

The second cell of the same product column has seven values. Like the first cell, this violates the rule: Each cell must have only a single value. The solution is to replace the row with seven new rows, with each of these values in its own cell in the same column and repetitions of the other cells in their own columns. The new repetitions increase the redundancy.

Out of the seven products, four fall in the Soft Drink category and three fall in the Dairy Product category. The four products (sprite, coca-cola, fanta and pepsi) are each associated with the category, Soft Drink. The products (milk, yogurt and cheese) are each associated with the category, Dairy Product.

The new table becomes:

The rule that each cell must have only a single value is now satisfied. There is no cell with more than one value in the table now. This solution came at a cost. More cells became repeated at their columns, adding more repetitions to the given double (original) repetition of “John Smith”. Putting a table in the second normal form solves the problem of repetition (redundancy) from the First Normal Form.

Note: Repetition in the Price column is not considered as a problem and is not taken into consideration when producing the 2NF. Also note that in this resulting table, some of the given prices have been divided and is considered appropriate.

Rule 3: The values stored in a column should be of the same type.

The last attribute, the Price column has only numbers. So, all the values in that column are of the same type called numbers (precisely, integers). Each of the other columns has values of type string. The string type is called text in some database table codes. And so, the rule that the values stored in a column should be of the same type is satisfied without any change to what the staff had done. They are not unintelligent after all.

Rule 4: The rows should be distinct.

In the resulting previous table, no two rows are the same. Some cells occur in more than one row in the same column, but no two rows have the same cells in the same order. And so, the rule that the rows should be distinct is satisfied without any change to what the staff had done. They are not unintelligent after all.

Rule 5: The order of the columns or order of rows do not matter.

Any column can be the first or last column. Any row can be the first or last row. And so, the rule that the order of the columns or order of the rows do not matter is satisfied without any change to what the staff had done. They are not unintelligent after all.

The table is now in First Normal Form because the five rules are satisfied.

Out of the five rules to have a table in First Normal Form, the staff, including their proprietor, violated only one out of ignorance which is not of their fault. You, the web developer training them, has to congratulate them. However, you have to insist that the rule that each cell must have only a single value is the most important rule to put a table in the First Normal Form.

Well, that is not all. You do not leave the people that you are training just yet. The table has to be given a key (a primary key). In other words, each row has to be identifiable by one or more cell values of that row. If that is not possible, a new column has to be created on the left side of the table to be called the key column.

Primary Key

Is there any column in the previous table where no cell can be empty or have a null value, and all the values will always be unique? There is none. If there was one, then it would be made the primary key. The reader might think that the Product column has unique values but this would not be the case all the time. For example, the customer “John Smith” or a new customer can still come and buy sweets in the future, as a new transaction (row). Then, there would be two string values of sweets in two cells in the Product column.

Are there two or more columns in the resulting previous table whose combination of row cells are unique and cannot have any repetition in the following table? If yes, the group of columns will form the primary key. As it happens, there is no such group; unless all the columns of the table are to be considered as the primary key since no row can repeat. There has to be a primary key and there has to be non-key columns in a normalized table.

Since one or more columns cannot be the primary key of the resulting previous table, a new column has to be added on the left end of the table to be the primary key. In such a primary key column, no cell is empty and no cell has a null value. All the values in the cell have to be unique. The easiest way to have such values for the primary key column is to have numbers from 1 upwards. That is, for the first row, the value of the primary key would be 1. For the second row, the value for the primary key would be 2. For the third row, the value for the primary key would be 3, and so on. When a new transaction row is added at the bottom, the new primary key value is one plus the one just above it. So, the complete table for the First Normal Form is as follows:

The primary key column is called TransID for “Transaction ID”. ID means Identifier. Such a primary key is auto-increment type which is the same as the auto-number. A new row will have a primary ID of 12. This table is a transaction table. It is the only table that the shop has so far.

That completes the table design for the First Normal Form. However, you as the the web developer, do not go away from the people that you are training just yet. You still have to explain to the staff (the trainees) the meaning of Functional Dependency. With that, they will be able to handle the issue of primary key as well as the five rules that you taught them in your absence.

Functional Dependency

The idea of Functional Dependency or simply Dependency is that once the primary key of a row is known, any other value in that row can be fetched. So, the primary key is the dependent value. It depends on the rest of the other values in the row. For example, if the TransID of 3 is given for the previous table, the corresponding product is Sprite. The corresponding category is “Soft Drink”. The corresponding customer is “John Smith”. The corresponding supplier is “Drinking Company”. The corresponding employee (clerk) is “Mary Baker”. The corresponding action is “Sales”. And the amount (price) that the customer paid is $12 ($ is not indicated in the table).

At this point, you as the database developer can leave. However, before you go, you summarize what you taught them.

Conclusion

A table in the First Normal Form must not violate any of the following rules:

1) All the columns in a table should have unique header names.

2) Each cell must have only a single value.

3) The values stored in a column should be of the same type.

4) The rows should be distinct.

5) The order of the columns or rows do not matter.

Any table in any normal form should have a primary key.

The idea of the Functional Dependency or simply Dependency is that once the primary key of a row is known, any other value in that row can be fetched.

At this point, you as the database developer can take your leave. However, you will return to teach them about the Second Normal Form because a table in 1NF may still have some vulnerabilities.

About the author

Chrysanthus Forcha

Discoverer of mathematics Integration from First Principles and related series. Master’s Degree in Technical Education, specializing in Electronics and Computer Software. BSc Electronics. I also have knowledge and experience at the Master’s level in Computing and Telecommunications. Out of 20,000 writers, I was the 37th best writer at devarticles.com. I have been working in these fields for more than 10 years.