Normalization in DBMS: A Detailed Guide

As an SQL Developer, you often work with enormous amounts of data stored in different tables that are present inside multiple databases. It usually becomes strenuous to extract the data if it needs to be correctly organized. Using Normalization, you can solve the problem of data redundancy and organize the data using different forms. This tutorial will help you understand the concept of normalization in DBMS.

Advance Your Career with SQL

SQL Certification CourseExplore Program
Advance Your Career with SQL

What is Normalization in DBMS?

Normalization in a DBMS (database management system) eliminates data redundancy and enhances data integrity in the table. It also helps organize the data in the database. This multi-step process sets the data into tabular form and removes duplicate data from the relational tables.

Normalization organizes the columns and tables of a database to ensure that database integrity constraints properly execute their dependencies. It is a systematic technique of decomposing tables to eliminate data redundancy (repetition) and undesirable characteristics like Insertion, Update, and Deletion anomalies.

Unlock the power of data with our SQL Certification Course! Learn essential SQL skills and gain a valuable certification to advance your career. Enroll now and take the first step towards becoming a data expert!

In 1970, Edgar F. Codd defined the First Normal Form.

Now, let's understand the types of Normal forms with the help of examples.normalizationinsql_1.

Normal Forms in DBMS

1. First Normal Form (1NF)

  • A table is referred to as being in its First Normal Form if the atomicity of the table is 1.
  • Atomicity states that a single cell cannot hold multiple values; it must hold only a single attribute.
  • The First Normal Form disallows the multi-valued attribute, composite attribute, and their combinations.

Now, you will understand the First Normal Form with the help of an example.

Below is a students’ record table that has information about the student roll number, student name, student course, and age of the student.

normalizationinsql_2.

In the student record table, the course column has two values. However, it does not follow the First Normal Form. You get the table below if you use the First Normal Form in the above table.

normalizationinsql_3.

By applying the First Normal Form, you achieve atomicity; every column has unique values.

Start Your Journey to Data Mastery

SQL Certification CourseExplore Program
Start Your Journey to Data Mastery

Before proceeding with the Second Normal Form, get familiar with Candidate Key and Super Key.

  • Candidate Key
A candidate key is a set of one or more columns that can identify a record uniquely in a table, and YOU can use each candidate key as a Primary Key.

Now, let’s use an example to understand this better.

normalizationinsql_13.

  • Super Key

A super key is a set of more than one key uniquely identifying a table record. The primary Key is a subset of the Super Key.

Let’s understand this with the help of an example.

normalizationinsql_14

2. Second Normal Form (2NF)

The first condition for the table to be in the Second Normal Form is that the table has to be in the First Normal Form. The table should not possess partial dependency. The partial dependency here means the proper subset of the candidate key should give a non-prime attribute.

Now, understand the Second Normal Form with the help of an example.

Consider the table Location:

normalizationinsql_4

The Location table possesses a composite primary key cust_id, storeid. The non-key attribute is store_location. In this case, store_location only depends on storeid, which is a part of the primary key. Hence, this table does not fulfill the second normal form.

To bring the table to the Second Normal Form, you need to split the table into two parts. This will give you the tables below:

normalizationinsql_5

normalizationinsql_6.

As you have removed the partial functional dependency from the location table, the column store_location entirely depends on the primary key of that table, storeid.

Now that you understand the 1st and 2nd Normal forms, you will look at the next part of this Normalization in the SQL tutorial.

3. Third Normal Form (3NF)

  • The first condition for a table to be in the Third Normal Form is that it should be in the Second Normal Form.
  • The second condition is that there should be no transitive dependency for non-prime attributes, which indicates that non-prime attributes (not part of the candidate key) should not depend on other non-prime attributes in a table. Therefore, a transitive dependency is a functional dependency in which A → C (A determines C) indirectly because of A → B and B → C (where it is not the case that B → A).
  • The Third Normal Form ensures the reduction of data duplication. It is also used to achieve data integrity.

Below is a student table that has student ID, student name, subject ID, subject name, and address of the student as its columns.

normalizationinsql_7

In the above student table, stu_id determines subid, and subid determines sub. Therefore, stu_id determines sub via subid. This implies that the table possesses a transitive functional dependency and does not fulfill the third normal form criteria.

Now, to change the table to the third normal form, you need to divide the table as shown below:

normalizationinsql_8.

normalizationinsql_9.

As you can see in both tables, all the non-key attributes are now fully functional, dependent only on the primary key. In the first table, the columns' names, subids, and addresses only depend on stu_id. In the second table, the sub only depends on the subid.

Ready to level up your career in data management? Our SQL Certification Course is your gateway to mastering essential database skills. Enroll today and join thousands of professionals who have elevated their careers with our comprehensive program!

4. Boyce CoddNormal Form (BCNF)

Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF and was developed by Raymond F. Boyce and Edgar F. Codd to tackle certain types of anomalies that were not resolved with 3NF.

The first condition for the table to be in Boyce Codd Normal Form is that the table should be in the third normal form. Secondly, every Right-Hand Side (RHS) attribute of the functional dependencies should depend on the super key of that particular table.

For example:
You have a functional dependency X → Y. In the particular functional dependency, X has to be part of the super key of the provided table.

Consider the subject table below:

normalizationinsql_10.

The subject table follows these conditions:

  • Each student can enroll in multiple subjects.
  • Multiple professors can teach a particular subject.
  • For each subject, it assigns a professor to the student.

In the above table, student_id and subject together form the primary key because by using student_id and subject, you can determine all the table columns.

Another important point is that one professor teaches only one subject, but one subject may have two professors.

This exhibits a dependency between the subject and the professor, i.e., the subject depends on the professor's name.

The table is in 1st Normal form as all the column names are unique, all values are atomic, and all the values stored in a particular column are of the same domain.

The table also satisfies the 2nd Normal Form, as there is no Partial Dependency.

There is no Transitive Dependency; hence, the table satisfies the 3rd Normal Form.

This table follows all the Normal forms except the Boyce Codd Normal Form.

As you can see, stuid and subject form the primary key, which means the subject attribute is a prime attribute.

However, there exists yet another dependency - professor → subject.

BCNF does not follow in the table as a subject is a prime attribute, and the professor is a non-prime attribute.

To transform the table into the BCNF, you will divide the table into two parts. One table will hold stuid, which already exists, and the second table will hold a newly created column profid.

normalizationinsql_11

The second table will have profid, subject, and professor columns, which satisfies the BCNF.

normalizationinsql_12.

Discover SQL and Database Management

SQL Certification CourseExplore Program
Discover SQL and Database Management

Importance of Normalization in Database Design

The importance of normalization in DBMS can be attributed to the following reasons:

  • Streamlines Data: It assists in simplifying and making databases concise. It also eases database handling, making navigation, editing, modification, extraction, sharing, and other tasks user-friendly.
  • Consistent Data: Removing disturbances in data leads to consistent and reliable results. It also helps eliminate outdated data, ensuring the database is ready.
  • Data Connectivity: Normalization facilitates clear relationships between database elements by retaining only relevant data. This compatibility allows seamless integration with other tools or software while resolving synchronization and storage challenges. Improved data quality boosts team efficiency.
  • Cost and Time Efficiency: Removing irrelevant data prevents the use of unnecessary storage space, reducing processing time and budget inefficiency.

What is Data Redundancy?

Data redundancy refers to the repeated presence of the same data, which takes up space by being stored in multiple places. The presence of such data in any database leads to clarity and better efficiency in data retrieval. It also leads to consistency, further resulting in reliable information. The presence of redundant data increases processing time, while the excess space requirement harms cost.

Steps Involved in Normalizing a Database

Let’s understand normalization in DBMS with examples. The following table contains multiple redundant data.

StudentID

StudentName

CourseID

CourseName

InstructorID

InstructorName

InstructorOffice

1

John

CS101, CS102

Computer Science, Data Structures

101, 102

Dr. Smith, Dr. Lee

Room 101, Room 102

2

Alice

CS101

Computer Science

101

Dr. Smith

Room 101

3

Bob

CS103

Databases

103

Dr. Brown

Room 103

The normalization shown here will be till 3NF, i.e., Third Normal Form. 

Step 1: First Normal Form (1NF)

This step will ensure: 

  • Presence of indivisible values in each column, i.e., single data
  • Unique value in each record
  • Absence of repeating groups or arrays

Hence, the table in 1NF will be: 

StudentID

StudentName

CourseID

CourseName

InstructorID

InstructorName

InstructorOffice

1

John

CS101

Computer Science

101

Dr. Smith

Room 101

1

John

CS102

Data Structures

102

Dr. Lee

Room 102

2

Alice

CS101

Computer Science

101

Dr. Smith

Room 101

3

Bob

CS103

Databases

103

Dr. Brown

Room 103

Step 2: Second Normal Form (2NF)

To achieve 2NF in DBMS, the table must lack partial dependencies. This means that the table data must depend on the primary key. In the current example, StudentID and CourseID are the primary keys. The columns like InstructorName and InstructorOffice vary according to CourseID and not on other values. So, separating the information accordingly, we get the following table: 

Student Course Table:

StudentID

StudentName

CourseID

CourseName

1

John

CS101

Computer Science

1

John

CS102

Data Structures

2

Alice

CS101

Computer Science

3

Bob

CS103

Databases

Course Instructor Table:

CourseID

InstructorID

InstructorName

InstructorOffice

CS101

101

Dr. Smith

Room 101

CS102

102

Dr. Lee

Room 102

CS103

103

Dr. Brown

Room 103

Step 3: Third Normal Form (3NF)

In 3NF, the table must:

  • Be in 2NF.
  • Remove transitive dependencies, where non-key attributes depend on other non-key attributes.

Here, InstructorOffice depends on InstructorID. To resolve this, we create a separate Instructors Table:

Courses Table:

CourseID

CourseName

CS101

Computer Science

CS102

Data Structures

CS103

Databases

Instructors Table:

InstructorID

InstructorName

InstructorOffice

101

Dr. Smith

Room 101

102

Dr. Lee

Room 102

103

Dr. Brown

Room 103

Course Instructor Table:

CourseID

InstructorID

CS101

101

CS102

102

CS103

103

Obtain Your SQL Certification

SQL Certification CourseExplore Program
Obtain Your SQL Certification

Advanced Normalization Techniques

Some of the advanced normalization techniques include following normal forms in DBMS:

Fourth Normal Form (4NF)

4NF ensures no non-trivial multivalued dependencies other than those involving a candidate key. It follows the preceding Normal forms. The 4NF strictly ensures the presence of only one multivalued dependency. For the database to be in 4NF, the key conditions to be satisfied include:

  • Presence of Boyce-Codd Normal Form (BCNF) in DBMS
  • Absence of multi-valued dependency

Fifth Normal Form (5NF)

This form is also known as the Project Join Normal Form. The database is to be in 5NF; it requires 4NF data without any join dependencies. It should also possess the lossless join property. 5NF eliminates redundancy caused by join dependencies and ensures lossless decomposition of tables into smaller relations.

Sixth Normal Form (6NF)

This form is focused on the independent handling of temporal data variations or time-varying databases. Here, the decomposition occurs up to the steps where no further breakdown is possible.

Benefits of Normalization

The normalization in DBMS offers the following benefits:

Reduces Size

Normalized data includes quantitatively more reliable and qualitatively significant information in the same space occupied by unnormalized data. The presence of quality data makes it easy to use, reduces retrieval processing time, offers up-to-date information for every member, and is cost-efficient.

Streamlines Data Update

With easily accessible data and clear distinctions, the data update process is also seamless. It becomes less time-consuming and less prone to errors as the data needs to be updated in only one place. The possibility of missing out on the key information or repetition of data is removed. It also ensures data accuracy and quick retrieval.

Efficient Database Design

The visually comprehensible data division results in an efficient and simple database design. It reflects user-friendly properties, making it easier to maintain and optimize further per the newer requirements and changes.

Enhanced Cross-Collaboration

The visually interpretable data contributes to smooth functionality among businesses in multiple sectors. It eases cross-collaboration among different teams and departments. Further, integration with different tools also provides effective data analysis.

Easy Scalability 

With the growing business needs, the scalability of data also becomes easier. The reduced data redundancy ensures easy locating of the relevant data and subsequent upgrade of the relevant tables. It goes without impact on other unrelated aspects of the database.

Enhances Security

Well-organized and normalized data also enhances security. In addition to easily identifying data loss, it allows the intentional addition of redundancy. This extra protection allows for effective, secure database management.

Discover SQL and Database Management

SQL Certification CourseExplore Program
Discover SQL and Database Management

Best Practices in Database Normalization

To ensure efficient normalization in DBMS, focus on the following points is necessary:

  • Sequence-wise proceeding during normalization to ensure accurate data division
  • Accurate identification of key attributes
  • Consideration of foreign keys for relating tables
  • Identifying the right requirement of level of normalization without ending up with over and under-normalization

Conclusion

You have seen and understood Normalization in SQL in its different forms in this tutorial. Now, you can organize the data in the database, remove data redundancy and promote data integrity. This tutorial also helps beginners understand the concept of normalization in SQL during their interview processes.

If you want to enhance your skills further, we highly recommend you check Simplilearn's SQL Certification Course. This course can help you hone the right skills and make you career-ready in no time.

FAQs

1. What are the five rules of data normalization?

The normalization in DBMS involves following five rules:  

  • First Normal Form: Involves atomicity and removes duplicate columns
  • Second Normal Form: Eliminates partial dependency
  • Third Normal Form: Removes transitive dependency
  • Boyce-Codd Normal Form: Continuation of the Third Normal Form ensures every determinant is a candidate key
  • Fourth Normal Form: Eliminates multi-valued dependencies

2. What happens if I skip normalization?

The lack of normalization can result in inaccuracy in data retrieval from the database, slow processing speed, excess storage taken up with irrelevant data, challenging data updating tasks and data anomalies.

3. How does normalization affect query performance?

Removing redundant data frees up useless storage space and organizes the available data. This leads to easy retrieval of information with increased speed and accuracy. Thus, normalization has a positive impact on query performance.

4. Is normalization always necessary in every database design?

Normalization ensures a direct data relationship with the primary key and eliminates data duplication, making your database time and cost-effective. These properties make normalization necessary.

5. How do I decide when to denormalize a database?

Data denormalization can be considered if you are dealing with ready-heavy applications or reporting systems, complex queries with multiple joins, and are fine with data redundancy.

About the Author

Haroon Ahamed KitthuHaroon Ahamed Kitthu

Haroon is the Senior Associate Director of Products at Simplilearn. bringing 10 years of expertise in product management and software development. He excels in building customer-focused, scalable products for startups and enterprises. His specialties include CRM, UI/UX and product strategy.

View More
  • Acknowledgement
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, OPM3 and the PMI ATP seal are the registered marks of the Project Management Institute, Inc.