Category: 06. Normalization in Database Designing

https://cdn3d.iconscout.com/3d/premium/thumb/data-trends-3d-icon-png-download-12809862.png

  • Difference Between 4NF and 5NF

    Normalization is an essential part of designing efficient and reliable databases. Most discussions focus on the first three normal forms and Boyce-Codd Normal Form (BCNF), however the journey does not stop there. The Fourth Normal Form (4NF) and the Fifth Normal Form (5NF) take normalization to even higher levels. 4NF and 5NF address specific types of dependencies like multivalued and join dependencies. Read this chapter to understand these advanced normal forms in detail.

    What is Fourth Normal Form (4NF)?

    The fourth normal form helps in eliminating multivalued dependencies from a database table. A multivalued dependency occurs when two attributes are independent of each other but depend on a third attribute. For instance, if one value in a table implies multiple rows, then there may be redundancy caused by multivalued dependencies.

    Rules of 4NF

    For a table to satisfy 4NF −

    • It must already be in Boyce-Codd Normal Form (BCNF).
    • The table must not have more than one multivalued dependency.

    What is Multivalued Dependency?

    If A →→ B (A multi-determines B), this means for a single value of A, there can be multiple values of B. But B is independent of any other attribute.

    Example: Multivalued Dependency in Action

    Consider a table that stores information about a Person, their Mobile Numbers, and their Food Preferences −

    PersonMobileFood_Likes
    Mahesh9893Burger
    Mahesh9424Pizza
    Ramesh9191Pizza

    Here −

    • Person →→ Mobile
    • Person →→ Food_Likes

    Both Mobile and Food_Likes are independent of each other, however they depend on Person. This results in redundancy because the same Person is repeated unnecessarily for each Mobile and Food_Likes.

    Normalizing a Table to 4NF

    To bring this table into 4NF, we can separate the dependencies into two tables −

    Table 1 − Person-Mobile

    PersonMobile
    Mahesh9893
    Mahesh9424
    Ramesh9191

    Table 2 − Person-Food_Likes

    PersonFood_Likes
    MaheshBurger
    MaheshPizza
    RameshPizza

    By decomposing the table, we eliminate redundancy. And each table is now in 4NF.

    What is Fifth Normal Form (5NF)?

    After 4NF, we must understand the concept of Fifth Normal Form, which is also known as the Projected Normal Form (PJNF). It addresses the join dependencies.

    join dependency exists when a table can be split into two or more tables. And the original table can be reconstructed by joining them without any data loss. In 5NF, every join dependency must be implied by the table’s candidate keys.

    Rules of 5NF

    For a table to satisfy 5NF −

    • It must already be in 4NF.
    • It must not contain any join dependency that cannot be implied by its candidate keys.

    What is Join Dependency?

    join dependency occurs when a table can be decomposed into smaller tables, but joining those smaller tables recreates the original table without any data loss or spurious rows.

    Example: Join Dependency in Action

    Let us consider a table that stores the data on Agents, the Companies they work with, and the Products they sell.

    AgentCompanyProduct
    A1PQRNut
    A1PQRBolt
    A1XYZNut
    A1XYZBolt
    A2PQRNut

    Here, if an Agent works with a Company and the Company sells a particular Product, then the Agent is assumed to sell that product.

    Normalizing a Table to 5NF

    To eliminate the join dependency, we decompose the table into three smaller tables −

    Table 1 − Agent-Company

    AgentCompany
    A1PQR
    A1XYZ
    A2PQR

    Table 2 − Company-Product

    CompanyProduct
    PQRNut
    PQRBolt
    XYZNut
    XYZBolt

    Table 3 − Agent-Product

    AgentProduct
    A1Nut
    A1Bolt
    A2Nut

    When these three tables are again joined back using their shared attributes (Company and Product), the original table is reconstructed without any spurious data.

    Comparing 4NF and 5NF

    The following table highlights how 4NF differs from 5NF −

    Aspect4NF5NF
    Dependency TypeMultivalued DependencyJoin Dependency
    PurposeEliminates redundancy from multivalued dependencies.Ensures lossless decomposition of join dependencies.
    Example ScenarioPerson with multiple phones and food preferences.Agent, company, and product relationships.

    Practical Considerations in Using 4NF and 5NF

    Although 4NF and 5NF provide the highest levels of normalization, they are used quite rarely. 4NF and 5NF are not always necessary for every database. because −

    • Complexity − Splitting the tables into smaller parts can make the database structure more complicated.
    • Performance − Highly normalized databases might require more joins, which may slow down the query performance.
    • Use Cases − Applications with simpler data relationships often do not need to go beyond BCNF or 4NF.

    However, when it is paramount to maintain accuracy and avoid any sort of data redundancy (while maintaining financial or scientific databases, for example), it becomes a necessity to normalize the tables to 4NF and 5NF.

  • Boyce-Codd Normal Form (BCNF) in DBMS

    The Third Normal Form (3NF) eliminates many redundancy issues, but still there are cases where 3NF is not strict enough. In such cases, we have to apply the Boyce-Codd Normal Form (BCNF), which is also known as 3.5NF.

    BCNF is a more restrictive version of 3NF that addresses potential anomalies. Read this chapter to understand the basic concepts of BCNF, its rules, and how to apply it in practice.

    What is Boyce-Codd Normal Form?

    The Boyce-Codd Normal Form (BCNF) is a special case of 3NF. As we know in 3NF, it allows some flexibility with non-prime attributes and functional dependencies. The BCNF tightens the rules. It ensures that every functional dependency in a table adheres to stricter conditions.

    Rules of Boyce-Codd Normal Form

    For a table to be in BCNF they must follow the following rules −

    • The table must be in 3NF.
    • For every functional dependency, the left-hand side (LHS) must be a candidate key or super key.

    In 3NF we know that there will be no transitive dependency. Where the RHS can be a prime attribute to satisfy the rule. In BCNF it demands that the LHS always includes a candidate key. This eliminates any chance of redundancy caused by improper dependencies.

    Understanding BCNF with an Example

    Let us consider the following Student table −

    Roll NumberNameVoter IDAge
    1JohnV00120
    2AliceV00222
    3BobV00321

    The table has the following candidate keys and functional dependencies −

    • Candidate Keys − Roll Number and Voter ID (each uniquely identifies a row).
    • Functional Dependencies −
      • Roll Number → Name
      • Roll Number → Voter ID
      • Voter ID → Age
      • Voter ID → Roll Number

    Checking Each Dependency against BCNF

    Now let us analyze each functional dependency step by step and check whether the table is BCNF compliant.

    Roll Number → Name −

    • LHS (Roll Number) is a candidate key.
    • This dependency is valid under BCNF.

    Roll Number → Voter ID −

    • Again, the LHS is a candidate key, so it satisfies BCNF.

    Voter ID → Age −

    • LHS (Voter ID) is also a candidate key.
    • This is valid under BCNF.

    Voter ID → Roll Number −

    • Here, the LHS is a candidate key (Voter ID), making it compliant with BCNF.

    Since all the functional dependencies have a candidate key or super key on the LHS, this table is in BCNF.

    Importance of Boyce-Codd Normal Form

    BCNF eliminates anomalies that might persist even in 3NF. For instance −

    • Redundancy − A non-prime attribute depending on something other than the candidate key can lead to duplicate data.
    • Update Anomalies − Incorrect updates to redundant data might create inconsistencies.
    • Deletion Anomalies − Removing a record could unintentionally delete important relationships.

    BCNF Compared to Other Normal Forms

    Let us compare BCNF with the normal forms that we have discussed so far −

    First Normal Form (1NF)Ensures no multi-valued attributes.Every cell must contain atomic values.
    Second Normal Form (2NF)Builds on 1NF.Eliminates partial dependencies.
    Third Normal Form (3NF)Builds on 2NF.Eliminates transitive dependencies.Allows non-prime attributes on the RHS if the LHS is a candidate key or super key.
    Boyce-Codd Normal Form (BCNF)Builds on 3NF.No exceptions: the LHS of every functional dependency must be a candidate key or super key.

    Relationship between Normal Forms

    The following diagram shows how the different Normal Forms are related −

    Boyce-Codd Normal Form1

    BCNF is the most restrictive form, sitting inside 3NF. 3NF sits inside 2NF and 2NF sits inside 1NF. This hierarchy means every table in BCNF is also in 3NF. However, not every table in 3NF is necessarily in BCNF.

    Another Example: BCNF in Practice

    Consider a relation {A, B, C, D} with these functional dependencies −

    • AB → C
    • C → D

    Step 1: Identify the Candidate Keys

    • From AB → C, we see AB is a candidate key because it determines C.
    • From C → D, we can derive AB → D via transitivity, so AB determines all attributes.

    Step 2: Check Each Dependency

    AB → C −

    • LHS is a candidate key.
    • Satisfies BCNF.

    C → D −

    • LHS (C) is not a candidate key.
    • Violates BCNF because C is a non-prime attribute.

    Step 3: Split the Table

    To satisfy BCNF, we split the table into two −

    Table 1 − {AB → C}

    Boyce-Codd Normal Form2

    Table 2 − {C → D}

    Boyce-Codd Normal Form3

    BCNF – Key Points to Note

    Make a note of the following key points on BCNF –

    • Stricter than 3NF − BCNF removes any dependency where the LHS is not a candidate key.
    • No Exceptions − While 3NF allows non-prime attributes on the RHS, BCNF does not tolerate them unless the LHS is a candidate key.
    • Requires Splitting − Many tables in 3NF need to be split further to achieve BCNF.
  • Third Normal Form (3NF) in DBMS

    Once a table meets the requirements of Second Normal Form (2NF), the next step is to convert it into Third Normal Form (3NF). This is used to eliminate transitive dependency. Read this chapter to get a clear understanding of 3NF and transitive dependency.

    What is Third Normal Form (3NF)?

    In relational database, the third normal form has two certain conditions −

    • The table is already in Second Normal Form (2NF).
    • The table does not have any transitive dependency.

    What is Transitive Dependency?

    A transitive dependency occurs when a non-prime attribute depends on another non-prime attribute rather than directly depending on a candidate key.

    To put it simply, in a transitive dependency, one non-prime attribute is indirectly connected to the candidate key through another non-prime attribute. This will may make unnecessary redundancy and inconsistency in the data.

    To understand this better, let us elaborate some of the important terms −

    • Candidate Key − A minimal set of attributes that uniquely identify each row in the table.
    • Prime Attribute − An attribute that is part of a candidate key.
    • Non-Prime Attribute − An attribute that is not part of any candidate key.

    Example of Transitive Dependency

    Consider the Student table with the following data

    Roll NumberStateCity
    1PunjabMohali
    2PunjabLudhiana
    3KarnatakaBangalore
    4MaharashtraMumbai

    The Functional Dependencies are −

    • Roll Number → State
    • State → City

    In this example, the candidate key is the Roll Number because it uniquely identifies each row. State and City are non-prime attributes.

    Third Normal Form1

    The Roll Number determines the State. And, the State determines the City. It creates a transitive dependency. Here, the Roll Number indirectly determines the City through the State.

    While “Roll Number → State” is valid, this dependency “State → City” violates 3NF because it is a relationship between two non-prime attributes.

    How to Eliminate Transitive Dependency?

    To remove transitive dependency, we need to split the table into smaller tables or decomposition.

    Step 1: Divide the Table

    We create two separate tables: one table links the candidate key to the first non-prime attribute (State), and the other table links the first non-prime attribute (State) to the second non-prime attribute (City).

    Third Normal Form2

    Step 2: Identify the Keys

    In the Student Table, the candidate key remains Roll Number. The State is a non-prime attribute. In the State Table, the primary key is State, but we can see there are duplicate States. So, we can add another key attribute called state id, and that can be used inside the Student table as well.

    By splitting the table, we can ensure that all non-prime attributes directly depend on the candidate key or are part of a separate table where they follow the same rule.

    Another Example of 3NF: Table and Functional Dependencies

    Consider a relation {A, B, C, D} with the following dependencies −

    • AB → C
    • C → D

    Step 1: Identify the Candidate Key

    To find the candidate key −

    • AB → C means AB determines C.
    • C → D means C determines D.

    From AB, we can easily determine all the attributes: {AB → C → D}. So, the candidate key is {AB}.

    Step 2: Classify the Attributes

    Next, let’s classify the attributes −

    • Prime Attributes − A, B (since they form the candidate key).
    • Non-Prime Attributes − C, D.

    Step 3: Check for Transitive Dependency

    • AB → C: This is valid because the candidate key determines a non-prime attribute.
    • C → D: This creates a transitive dependency since:
    • D is a non-prime attribute.
    • D is determined by another non-prime attribute (C).

    Step 4: Eliminate Transitive Dependency

    Split the table into two −

    Third Normal Form3

    Now,

    • In the Main Table, AB is the candidate key and C is a non-prime attribute directly dependent on it.
    • In the Derived Table, C is the candidate key and D depends directly on it.

    How to Check for 3NF Compliance

    To determine if a table is in 3NF, follow the steps give below –

    • Ensure the Table is in 2NF − No partial dependency should exist.
    • Check Functional Dependencies − For each dependency, ensure that either:
      • The left-hand side (LHS) is a candidate key or super key, or
      • The right-hand side (RHS) is a prime attribute.

    A quick rule of thumb − if a non-prime attribute depends on another non-prime attribute, it is a transitive dependency. This is violating 3NF.

    Practical Steps to Achieve 3NF

    Follow the steps given below to turn a table into its Third Normal Form –

    • Find the Candidate Keys − Use closure methods to identify all the candidate keys.
    • Classify the Attributes − Divide the attributes into prime and non-prime categories.
    • Check the Dependencies − If the LHS of a functional dependency is not a candidate key or super key, check if the RHS is a prime attribute.
    • Eliminate the Transitive Dependencies − Split the table into smaller ones if necessary.
  • Second Normal Form (2NF) in DBMS

    Normal Forms ensure that the data in the tables remain structured and efficient. After achieving First Normal Form (1NF), the next step in normalization is the Second Normal Form (2NF) that helps eliminate certain types of redundancy by addressing partial dependency. Read this chapter to learn in detail what is 2NF and how it is applied.

    What is Second Normal Form (2NF)?

    The rules for 2NF are straightforward. According to E. F. Codd, the father of relational databases,

    • The table must already be in First Normal Form (1NF).
    • There should be no partial dependency in the table.

    Let’s break down these rules further −

    • 1NF Requirement − It means the table should not have multi-valued attributes. Each cell should have one and only one value.
    • No Partial Dependency − A partial dependency occurs when a non-prime attribute depends on only a part of a composite candidate key; not only on the entire key.

    To understand this better, let us elaborate some of the important terms −

    • Candidate Key − A minimal set of attributes that can uniquely identify each row.
    • Prime Attributes − Attributes that are part of a candidate key.
    • Non-Prime Attributes − Attributes that are not part of any candidate key.

    What is Partial Dependency?

    partial dependency exists when a non-prime attribute is dependent on just a part of a composite candidate key. It creates redundancy and anomalies in the database.

    Example of Partial Dependency

    Let’s consider the following table −

    Customer IDStore IDLocation
    11Delhi
    21Delhi
    32Bangalore
    43Mumbai

    In this case,

    • Candidate Key − The combination of Customer ID and Store ID uniquely identifies each row.
    • Prime Attributes − Customer ID, Store ID
    • Non-Prime Attribute − Location

    If we look closely, the attribute “Location” is dependent only on “Store ID”. For example, wherever Store ID is 1, the location is Delhi. It violates the second rule of 2NF because Location does not depend on the entire composite key (Customer ID, Store ID).

    How to Convert a Table to 2NF?

    Whenever a table violates 2NF, the solution is to split it into smaller tables. We call this process “Decomposition“. Let us see how it works with the above example.

    Step 1: Divide the Table

    We break the table into two smaller tables −

    • One table stores the composite key along with its prime attributes.
    • Another table stores the partial dependency.
    Second Normal Form1

    Step 2: Identify the Keys

    • In the Customer Table, the candidate key remains Customer ID, Store ID.
    • In the Store Table, the primary key is Store ID.

    Now, the non-prime attribute Location depends fully on the candidate key of its table (Store ID). This is ensuring that both tables are in 2NF.

    General Steps for Achieving 2NF

    Given below are the general steps to make any table comply with 2NF −

    • Identify the Candidate Key − Use functional dependencies to find all possible candidate keys.
    • Classify the Attributes − Prime attributes are part of the candidate key. Non-prime attributes are everything else.
    • Check for Partial Dependencies − If any non-prime attribute depends on only part of a composite candidate key. It is a partial dependency.
    • Split the Table − Create separate tables to resolve partial dependencies.

    Another Example: Functional Dependency

    Let us explore a more complex example with functional dependencies.

    Table and Functional Dependencies

    We have a relation: {A, B, C, D, E, F} with these dependencies −

    • C → F
    • E → A
    • E, C → D
    • A → B

    Step 1: Identify the Candidate Key

    To find the candidate key, we must look at the right-hand side (RHS) of the functional dependencies: {F, A, D, B}. Attributes not on the RHS (E and C) must be part of the candidate key.

    Start with E, C −

    • E → A
    • C → F
    • E, C → D
    • A → B

    Thus, the closure of E, C covers all attributes: {E, C, A, F, D, B}. So, the candidate key is {E, C}.

    Step 2: Prime and Non-Prime Attributes

    • Prime Attributes − E, C.
    • Non-Prime Attributes − A, B, D, F.

    Step 3: Check for Partial Dependencies

    partial dependency occurs when −

    • The left-hand side (LHS) of a functional dependency is a proper subset of the candidate key.
    • The RHS is a non-prime attribute.

    In our example,

    • C → F: C is a proper subset of {E, C}, and F is non-prime. Partial dependency exists.
    • E → A: E is a proper subset of {E, C}, and A is non-prime. Partial dependency exists.

    Step 4: Split the Table

    In this step, to eliminate partial dependencies, we divide the table into smaller tables −

    Second Normal Form2

    The derived tables are given below –

    Second Normal Form3

    Each table satisfies 2NF because all non-prime attributes fully depend on the candidate key of their respective tables.

    Key Concepts in 2NF

    Following are the key concepts in 2NF –

    • Prime Attribute − Part of a candidate key.
    • Non-Prime Attribute − Not part of any candidate key.
    • Partial Dependency − A non-prime attribute depends on part of a composite candidate key.
    • Full Functional Dependency − A non-prime attribute depends on the entire candidate key.

    Conclusion

    In this chapter, we explained in detail the concept of Second Normal Form (2NF) and how to address partial dependency in relational databases. We started with the basic rules of 2NF, highlighting the importance of achieving 1NF first.

    Through detailed examples, we explored the concept of partial dependency and demonstrated how to split tables into smaller ones to remove redundancy. By ensuring 2NF, we make the databases more efficient by reducing the data redundancy and setting the stage for higher levels of normalization.

  • Normalization

    Functional Dependency

    Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,…, An, then those two tuples must have to have same values for attributes B1, B2, …, Bn.

    Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

    Armstrong’s Axioms

    If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional dependencies logically implied by F. Armstrong’s Axioms are a set of rules, that when applied repeatedly, generates a closure of functional dependencies.

    • Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.
    • Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies, does not change the basic dependencies.
    • Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a → b is called as a functionally that determines b.

    Trivial Functional Dependency

    • Trivial − If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold.
    • Non-trivial − If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.
    • Completely non-trivial − If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD.

    Normalization

    If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible.

    • Update anomalies − If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.
    • Deletion anomalies − We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.
    • Insert anomalies − We tried to insert data in a record that does not exist at all.

    Normalization is a method to remove all these anomalies and bring the database to a consistent state.

    First Normal Form

    First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

    unorganized relation

    We re-arrange the relation (table) as below, to convert it to First Normal Form.

    Relation in 1NF

    Each attribute must contain only a single value from its pre-defined domain.

    Second Normal Form

    Before we learn about the second normal form, we need to understand the following −

    • Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
    • Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

    If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

    Relation not in 2NF

    We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

    Relation  in 2NF

    We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

    Third Normal Form

    For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −

    • No non-prime attribute is transitively dependent on prime key attribute.
    • For any non-trivial functional dependency, X → A, then either −
    • A is prime attribute.
    Relation not in 3NF

    We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.

    To bring this relation into third normal form, we break the relation into two relations as follows −

    Relation in 3NF

    Boyce-Codd Normal Form

    Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

    • For any non-trivial functional dependency, X → A, X must be a super-key.

    In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

    Stu_ID → Stu_Name, Zip

    and

    Zip → City

    Which confirms that both the relations are in BCNF.