Databases Design: step 2 - Applications of Databases to Humanities and Social Sciences

Structuring the data.

The information contained in the data dictionary must be organized into uniform "sets" (entities, tables) where the repetition (redundancy) of the information must be minimal.

Attributes that are calculated (which have a calculation rule) should not be considered when structuring data. They are, indeed, a combination of other attributes, and including them in the structuring phase would introduce a form of redundancy and hence the possibilities of inconsistency. In our example, the attribute "Moyenne" belongs to that category.

We will say that a set of attributes is homogeneous if all of its constituent attributes have a "direct" link. Redundancy within a set is expected to be minimal because redundancy involves inconsistency. If the same information is repeated more than one time, it means:

Information will need to be keyboard, several times, which increases the risk of typing mistakes.
When the information is updated, changes must be made in multiple locations (tables), with the risk of partial updates.

To structure data dictionary attributes, we use functional dependency (FD) as a structuring element.

------------------------------------
For a formal explanation of the structuring process presented below, see Normalization in these pages.
------------------------------------

Functional Dependency Concepts (FD)

We say that there is a functional dependency between an attribute A1 and an attribute A2, we note A1 - > A2, if knowing the value of A1 we can associate only one value of A2. We also say that A1 determines A2. A1 is called the source of the functional dependency and A2 the goal.

Example of FD

In our example database, knowing the student's number, we know the student's name and, among other things, his date of birth. So we have the following FD:

Numetu - > Nometu, Dtnaiss.

but the FD: Numetu - > Note is incorrect.

In fact, knowing a student number, we can know the marks he got in each subject. Therefore, knowing the student number makes it impossible to know a particular mark and there is no FD.

Simple FD:
An FD will be considered simple, if it has a single source attribute. A simple FD characterizes an entity (table) whose source is the key and whose properties are the goal of the FD.

Compound FD:
An FD is said to be composed, if its source consists of a collection of multiple attributes. A compound FD describes an association between entities (tables) whose source is the key and whose properties are the goal of the FD. There must be no extraneous attributes in the source of a compound FD. For example, if we have the 2 compound FDs:

1- A1, A2, A3 -> A4
2- A1, A2 -> A4

The FD number 1 is incorrect because A3 is not necessary. The "correct" compound FD is: A1, A2 -> A4.

To represent entities and associations, we use the following notation:

NAME_OF_ENTITY (Source, Goal). The key is underlined.

For example:

FD	Entities
A1 - > A2 , A3 , A4	ENTITY1 (A1 , A2 , A3, A4)
A3 - > A4	ENTITY2 (A3 , A4)
A1 , A5 - > A6	ASSOC1 (A1 , A5 , A6)

Removing redundancy.

The use of FDs allows the structuring of the information contained in the data dictionary, but does not directly eliminate the potential redundancy of this information. Therefore, in the above table, A4 is redundant because it appears, like a property, in two entities: ENTITY1 and ENTITY2. It is thus necessary to remove this redundancy without losing any information: i.e. by deleting the entities.

In this example, we cannot delete A4 from ENTITY2 as this would involve deleting the entity and therefore losing information. On the other hand, A4 can be removed from ENTITY1, without loss of information, since it can be obtained (inferred) by means of A3, using the transitivity of the FD.

The "Optimal" structure (maximum consistency and minimum redundancy) of the previous FDs schema is thus:

A1 - > A2 , A3
A3 - > A4
A1 , A5 - > A6

So we have the rule:

To remove the redundancies on a set of FDs, it is necessary to eliminate the attributes that can be obtained using FDs transitivity.

Data structuring methodology.

Data structuring is performed in 5 steps:

Determining the list of simple FDs.
Examine the attributes not classified in Step 1 and identify the compound FDs.
Elimination of transitivity from FDs schema.
Build database entities using simple FDs.
Build database associations from compound FDs.

If we use this method for our example, we get these results:

1-Simple FDs of the STUDENTS database

Numetu - > Nometu, Dtnaiss, Cdsexe, Lbsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens, Nomens, Grade, Ancien

2-Unclassified attributes and compound FDs of the STUDENTS database.

The attribute "notes" was not classified in step 1, it thus intervenes in a compound FD. This FD is the following:

Numetu, Numat - > Note

3-Eliminating the transitivity on the FDs schema

Numetu - > Nometu, Dtnaiss, Cdsexe
Cdsexe - > Lbsexe
Numens - > Nomens, Grade, Ancien
Numat - > Nomat, Coeff, Numens

4-Creating entities.

ETUDIANT (Numetu, Nometu, Dtnaiss, Cdsexe)
SEXE (Cdsexe, Lbsexe)
ENSEIGNANT (Numens, Nomens, Grade, Ancien)
MATIERE (Numat, Nomat, Coeff, Numens)

5-Creating associations.

NOTE (Numetu, Numat, Note)

Previous stage | Next Stage | DB Design