Databases Design: step 2

Structuring the data.

The information contained in the data dictionary must be organized into uniform "sets" (entities, tables) in which the repetition (redundancy) of the information must be minimal.

Attributes that are computed (that have a calculation rule) should not be considered when structuring the data. They are, indeed, a combination of other attributes, and include them in the structuring phase would introduce a form of redundancy and thus the possibilities of inconsistency. In our example, the attribute "Moyenne" falls within this category.

We will say that a set of attributes is homogeneous if all of its constituent attributes have a "direct" link. Redundancy within a set is expected to be minimal because redundancy involves inconsistency. If the same information is repeated more than one time, it means:

  • Information will need to be keypad, several times, which increases the risk of typing mistakes.
  • When information is updated, changes must be made in several places, which implies risks of partial updating.

To structure the data dictionary attributes, we use functional dependency (FD) as the structuring element.

------------------------------------
For a formal explanation of the structuring process presented below, see Normalization in these pages.
------------------------------------

Functional Dependency Concepts (FD)

We say that there is a functional dependency between an attribute A1 and an attribute A2, we note A1 - > A2, if knowing the value of A1 we can associate only one value of A2. We also say that A1 determines A2. A1 is the source of the functional dependency and A2 the goal.

Example of FD

In our example database, knowing the student's number, we know the student's name and, among other things, his date of birth. So we have the following FD:

Numetu - > Nometu, Dtnaiss.

The expression: Numetu - > Note is incorrect.

Knowing a student number one may know the marks, he obtained in each subject. Therefor, the knowledge of the student number therefore makes it impossible to know a particular mark and there is no FD.

Simple FD:
An FD will be considered simple if its source consists of a single attribute. A simple FD characterises an entity (table) whose source is the key and whose properties are the goal of the FD.

Compound FD:
It will be said that an FD is composed, if its source is composed of the collection of more than one attribute. A compound FD describes an association between entities (tables) whose source is the key and whose properties are the goal of the FD. There must be no extraneous attributes in the source of a compound FD. For example, if we have the 2 compound FDs:

1- A1, A2, A3 -> A4
2- A1, A2 -> A4

The FD number 1 is incorrect because A3 is not necessary. The "correct" compound FD is: A1, A2 -> A4.

To represent entities or associations, the following notation must be used:

NAME_OF_ENTITY (Source, Goal). The key is underlined.

For example:

FD Entities
A1 - > A2 , A3 , A4 ENTITY1 (A1 , A2 , A3, A4)
A3 - > A4 ENTITY2 (A3 , A4)
A1 , A5 - > A6 ASSOC1 (A1 , A5 , A6)
Removing redundancy.

The use of FDs permit the structuring of the information contained in the data dictionary but does not directly eliminate the potential redundancy of this information. Thus, in the table above, A4 is redundant because it appear in two entities: ENTITY1 and ENTITY2. It is therefore necessary to remove this redundancy without losing any information: i.e. deleting entities.

In this example, we cannot delete A4 from entity2 as this would involve deleting the entity and therefore losing information. On the other hand, A4 can be removed from ENTITY1, without loss of information, since it can be obtained (inferred) by means of A3, using the transitivity of the FD.

The "Optimal" structure (maximum consistency and minimum redundancy) of the previous FDs schema is thus:

A1 - > A2 , A3
A3 - > A4
A1 , A5 - > A6

So we have the rule:

To remove the redundancies on a set of FDs, it is necessary to eliminate the attributes that can be obtained using FDs transitivity.
Data structuring methodology.

Data structuring is performed in 5 steps:

  1. Determination of list of simple FD's.
  2. Review attributes not classified in Step 1 and determine compound FDs.
  3. Eliminating the transitivity of the FDs schema.
  4. Construct database entities from simple FDs.
  5. Create database associations from compound FDs.

If we use this approach for our example, we obtain the following results:

1-Simple FDs of the STUDENTS database

Numetu - > Nometu, Dtnaiss, Cdsexe, Lbsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens, Nomens, Grade, Ancien

2-Unclassified attributes and compound FDs of the STUDENTS database.

The attribute "notes" was not classified in step 1, it thus intervenes in a compound FD. This FD is the following:

Numetu, Numat - > Note

3-Eliminating the transitivity on the FDs schema

Numetu - > Nometu, Dtnaiss, Cdsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens

4-Creating entities.

ETUDIANT (Numetu, Nometu, Dtnaiss, Cdsexe)
SEXE (Cdsexe, Lbsexe)
ENSEIGNANT (Numens, Nomens, Grade, Ancien)
MATIERE (Numat, Nomat, Coeff, Numens)

5-Creating associations.

NOTE (Numetu, Numat, Note)

Previous stage | Next Stage | DB Design