Normalization in DBMS

Posted in

Normalization in DBMS
vinaykhatri

Vinay Khatri
Last updated on December 22, 2024

    Normalization is a method or technique applied to databases to divide the data into multiple tables and minimize data redundancy. In general, normalization is applicable to relational databases, such as SQL, because such databases store data in a tabular form. The main objective of normalization is to divide a big table of data into small tables so the repeated data could be reduced to some extent.

    Normalization in DBMS

    Before moving further in normalization, let’s discuss what is data redundancy and why we need to minimize it.

    What is data redundancy?

    Data redundancy refers to the repetition of data at multiple places in the same table. For example, let’s take a table from a database:

    S.No Name Department College
    1 Amber CSE X
    2 Bezos CSE X
    3 Carry CSE X
    4 Deny CSE X
    5 Emily CSE X
    6 Fizz CSE X

    Here the department and college for all the students are the same, and this repeated data is occupying unnecessary space in the memory. So, if we insert any new student data in this table the Department and College for them will also remain the same and this can cause many problems such as extra space, insertion anomaly, deletion anomaly, and update anomaly.

    How can Normalization solve this problem?

    The normalization will divide this table in such a way that we do not miss any data and we also get an optimal database form.

    S.No Name Department College
    1 Amber CSE X
    2 Bezos CSE X
    3 Carry CSE X
    4 Deny CSE X
    5 Emily CSE X
    6 Fizz CSE X

    Normalization

    1. <Student_detail>
    S.No Name Department
    1 Amber CSE
    2 Bezos CSE
    3 Carry CSE
    4 Deny CSE
    5 Emily CSE
    6 Fizz CSE
    2. <Department_Detail>
    Department College
    CSE X

    Here the normalization divides the table into two separate tables and from now we do not have to insert the redundant college data. one question that arises here is that even after the normalization why we are repeating the department data? Well, this is because there should be a link between the two divided tables that act as a bridge so the data could be fetched. That’s how normalization reduces data redundancy. Here we have used a small example that’s why we can only see the normalization can only reduce the college data but if we apply the normalization to a big set of databases, it can reduce the data redundancy at a high level.

    Types of Normalization

    Normalization can be achieved in many ways, and there are 5 types of Normalization that are as follows:

    • First Normal Form
    • Second Normal Form
    • Third Normal Form
    • Boyce-Codd Normal Form (BCNF) or 3.5 Normal Form
    • Fourth Normal Form

    1. First Normal Form

    The first normal form is the first step in the normalization process, and it expects from you to create such a database table so it could be normalized easily. If your table does not follow the condition of the First Normal Form, it would be considered a bad database table design, and no other Normal Forms are applicable to it. Conditions of First Normal Form

    • Each cell of the table must have a single value, which means each cell must have an atomic value.
    • Each column should contain values that are of the same data type.
    • The heading of each column must have a unique name.

    Your database table must follow all the above conditions to fall under the First Normal form, and then only the table would be able to normalize. Example:

    Not in 1 st Normal Form In 1 st Normal Form
    Roll No Name Language
    1 Amber Java, Python
    2 Bezos Java
    3 Carry Python, C++
    There is more than one value in Single-cell.
    Roll No Name Language
    1 Amber Java
    1 Amber Python
    2 Bezos Java
    3 Carry Python
    3 Carry C++
    Every Cell Has an atomic value.

    2. Second Normal Form

    A table to be in the second Normal Form should satisfy the following conditions:

    • The table should follow the conditions of the First Normal Form.
    • There should be no partial dependencies.

    What is Partial Dependency?

    A primary key could be generated from a single column or a composite of two or more columns. If the primary key of the table is formed from the composition of two columns and a non-key attribute of the table is dependent on the value of only one column of the primary key, it is known as partial dependency. Let’s try to understand it with the following example:

    Employee_id Employee_name Employee_project_no Project_location
    E01 Sam 1244 Outside
    E02 Will 1254 Inside
    E03 Jack 6223 Inside
    E04 Hijo 1244 Outside

    Primary key = Employee_id + Employee_project_no Here, we can find the Employee_name using Employe_id, and Employee_project_no has no significance to find the Employee_name, which means the Employee_name is only dependent on Employee_id. Similarly, the Project_location is only dependent on the Employee_project_no. So, to be in the second normal form, the partial dependency should be eliminated. To eliminate the partial dependency, we should divide the table into two parts:

    1. <Employee Detail>

    Employee_id Employee_name Employee_project_no
    E01 Sam 1244
    E02 Will 1254
    E03 Jack 6223
    E04 Hijo 1244

    Now Primary key= Employee_id

    2. <Project Detail>

    Employee_project_no Project_location
    1244 Outside
    1254 Inside
    6223 Inside

    Now Primary key= Employee_project_no

    3. Third Normal Form

    A table is considered in the Third Normal Form if it meets the following conditions:

    • The table should be in the Second Normal Form.
    • It should not have Transitive Dependency.

    What is Transitive Dependency?

    If any non-prime attribute (not a part of the primary key) of the table depends on another non-prime attribute rather than the prime attribute, it will be termed Transitive Dependency. Let’s understand it with an example:

    Student_Rollno Student_name Marks Subjects Total_Marks
    1 Jack 98 Computer 196
    1 Jack 99 Math 198
    2 Sam 78 Math 500

    Here primary key = Student_ID + Student_name Here, Marks represent the marks of one semester and Total_Marks represent the total overall marks scored by the student throughout the year. The Total_marks, which is a non-prime attribute, depends on the Subjects (non-prime attribute) rather than Student_ID or Student_name. This is a case of Transitive Dependency , and to remove this dependency, we need to divide the table into two parts.

    1. <Student Detail>

    Student_Rollno Student_name Marks Subjects
    1 Jack 98 Computer
    1 Jack 99 Math
    2 Sam 78 Math

    Prime-attributes= Student_ID + Student_name

    2. <Exam Table>

    Subjects Total_Marks
    Computer 196
    Math 198
    Math 500

    Prime-attributes= Subjects

    4. Boyce-Codd Normal Form (BCNF)

    It is an extension of the Third Normal Form, and that’s why it also is known as the 3.5 Normal form. For a table to be in BCNF, it should satisfy the following two conditions:

    • The table should be in the Third Normal form.
    • For any dependency, A ---> B, and A should be a super key.

    The main objective of a prime attribute is to let us access the non-prime attributes, but what if using a non-prime attribute, we can access a prime attribute. If this happens, it would violate the second condition of the BCNF.

    5. Fourth Normal Form

    For a table to be in Fourth Normal Form, it should satisfy the following two conditions:

    • The table should follow all the conditions of BCNF.
    • It should not have Multivalued Dependency.

    What is Multivalued Dependency?

    If these three conditions arise in a table, it would be considered a multi-valued dependency:

    • A ----> B, for a single value of A. If we have more than one value of B, where A is a prime-key attribute and B is a non-prime-key attribute.
    • The table should have at least three columns.
    • If the table has 3 columns such as A, B, and C, where A----> B so B and C should be independent.

    If all the above three conditions are true in a table, there is a multivalued dependency in that table and that table does not satisfy the Fourth Normal Form.

    Conclusion

    So this is all about normalization in DBMS, which helps to eliminate or minimize data redundancy. The Normalization forms that we discussed above are used to design the structure for the database so that we could have an optimal database with minimum data redundancy. Though creating a database that follows all these forms is a challenging task for a Database Manager.

    People are also reading:

    Leave a Comment on this Post

    0 Comments