Database Design Theory: Normalization

Normalization

  •  Normalization is the process of organizing the data in the database.
  • Normalization is used to,
Minimize the redundancy (repetition of data which increases the size of data) from a relation or set of relations. 
Eliminate the undesirable characteristics like Insertion, Updates and Deletion Anomalies.

  • Normalization divides the larger table into the smaller table and links them using relationship.
  • The normal form is used to reduce redundancy from the database table.

Types of Normal Forms

There are four types of normal forms,

  • 1NF
  • 2NF
  • 3 NF
  • BCNF
  • 4 NF
  • 5 NF

First Normal Form (1NF)

  • A relation will be 1NF if it contains an atomic value.
  • It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
  • First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
A table is in 1 NF iff:

  • There are only Single Valued Attributes.
  • Attribute Domain does not change.
  • There is a unique name for every Attribute/Column.
  • The order in which data is stored does not matter.


Second Normal Form (2NF)

  • A relation must be in first normal form and relation must not contain any partial dependency. 
  • A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.
To remove Partial dependency, we can divide the table, remove the attribute which is causing partial dependency, and move it to some other table where it fits in well.

A relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on the primary key, then the relation is in Second Normal Form (2NF).

Note – If the proper subset of candidate key determines non-prime attribute, it is called partial dependency.

The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a new relation along with a copy of their determinant.


Functional Dependency

  • Functional dependency in DBMS, as the name suggests is a relationship between attributes of a table dependent on each other.
  • It helps in preventing data redundancy.
  • Functional Dependency is represented by -> (arrow sign)

For example: 

Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age
Here Stu_Id attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id we can tell the student name associated with it. 
This is known as a functional dependency and can be written as Stu_Id->Stu_Name or in words, we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of the same table then it can be represented as
 A->B (Attribute B is functionally dependent on attribute A)

NOTE: Where an attribute in a table depends on only a part of the primary key and not on the whole key is referred to as Partial Dependency.

Third Normal Form (3NF)

A relation will be in 3NF,
  1. It should be in the Second Normal form.
  2. And it should not have Transitive Dependency.
A relation that is in First and Second Normal Form and in which no non-primary-key attribute is transitively dependent on the primary key, then it is in Third Normal Form (3NF).
 

Transitive Dependency

  • When a non-prime attribute depends on other non-prime attributes rather than depending upon the prime attributes or primary key, this is called Transitive Dependency
  • A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies. For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold true:

  • X->Y
  • Y does not ->X
  • Y->Z

Note: A transitive dependency can only occur in a relation of three or more attributes. This dependency helps us normalizing the database in 3NF (3rd Normal Form).

The advantage of removing transitive dependency is,

  • Amount of data duplication is reduced.
  • Data integrity achieved.

Boyce-Codd Normal Form (BCNF)

For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:

  1. It should be in the Third Normal Form.
  2. And, for any dependency A → B, A should be a super key.
In simple words, it means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a prime attribute.

Fourth Normal Form (4NF)

For a table to satisfy the Fourth Normal Form, it should satisfy the following two conditions:

  1. It should be in the Boyce-Codd Normal Form.
  2. And, the table should not have any Multi-valued Dependency.

Multi-valued Dependency

A table is said to have a multi-valued dependency if the following conditions are true,

  1. For a dependency A → B, if for a single value of A, multiple values of B exists, then the table may have a multi-valued dependency.
  2. Also, a table should have at least 3 columns for it to have a multi-valued dependency.
  3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C should be independent of each other.

If all these conditions are true for any relation(table), it is said to have a multi-valued dependency.


Fifth normal form (5NF)
  • A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
  • 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
  • 5NF is also known as Project-join normal form (PJNF).

    No comments:

    Post a Comment

    Monk and Inversions

    using System; public class Solution { public static void Main () { int T = Convert . ToInt32 ( Console . ReadLine...