Document Analysis
Identifying all the representative documents that contain the data you want to model is the first step in building a database. The term "documents" must be taken here in a broad sense. It can be either paper documents or magnetic documents (recording of interviews, disc or computer diskette)... or any other medium that can be used to store information.
Working documents
To illustrate the various stages of database design, we will use, as an example, the management of students taking the various courses of a degree.
The working documents used are as follows:
List of students | |||
Student number | Student name | Date of birth | Sex |
1 | Dupont, Charles | 18-03-1981 | H (Male) |
2 | Dubois, Jules | 02-11-1982 | H (Male) |
3 | Favier, Isabelle | 02-02-1979 | F (Female) |
... | ... | ... | ... |
Marks sheet | |||
Student Number : 1 Dupont, Charles | |||
Subject number | Name | Coefficient | Mark / 20 |
1 | Mathématiques | 3 | 10 |
2 | Informatique | 2 | 9 |
3 | Sociologie | 2 | 12,5 |
4 | Histoire | 1 | 13 |
5 | Géographie | 1 | 7 |
Average mark | 10,3 |
Teacher's List | ||||
Teacher id | Name | Grade | Seniority in rank | Subject taught |
10 | Bertrand, Pierre | ASS | 2 | Sociologie |
11 | Dupont, Auguste | MCF | 3 | Mathématiques |
... | ... | ... | ... | ... |
15 | Simon, Etienne | ASS | 5 | Histoire Géographie |
Management rules:
- A subject is taught by only one teacher.
- A student has only one mark per subject.
Data dictionary
From the documents, the analysis phase consists in extracting the elementary (non-decomposable) information that will constitute the future database.
The collection of all the elementary data, called attributes or fields, constitutes the data dictionary. Each attribute (field) in the data dictionary can be characterized by the following properties:
Property | Meaning |
Mnemonic | An abbreviation for the attribute name. |
Wording | A label containing the precise meaning and role of the attribute. |
Data type | Attribute type: integer, real, string, date... |
Integrity constraints | List of constraints on possible attribute values |
Calculation rule | The rule for calculating (obtaining) the corresponding attribute. |
The data dictionary for the previous documents is as follows:
Mnemonic | Wording | Type | Constraints | Calculation rule |
Ancien | Seniority in rank | Integer | >=0 | |
Cdsexe | Gender code | String(1) | H or F | |
Coeff | Subject coefficient | Integer | >0 and <6 | |
Dtnaiss | Date of birth | Date | ||
Grade | Teacher Grade | String(3) | ASS or MCF or PR | |
Lbsexe | Gender Label | String(7) | Male or Female | |
Moyenne | Average mark for the student | Real | >=0 and <=20 | SUM(Note*Coeff)/SUM(Coeff) |
Nomat | Subject name | String(15) | ||
Nomens | Name of teacher | String(20) | ||
Nometu | Name of student | String(20) | ||
Note | Mark obtained by the student in the subject | Real | >=0 and <=20 | |
Numat | Subject number | Integer | >0 | |
Numens | Teacher number | Integer | >0 | |
Numetu | Student number | Integer | >0 |
Remarks
The data dictionary is independent of the database management system that will be employed to implement the database. To define the data dictionary for a particular DBMS, it will therefore be necessary to translate the "Type", "Constraints" and "Calculation Rule" columns into the specific formalism for the DBMS.
For example, for the "Grade" field, and if you use Firebird, the "Type" column will become Char(3), "Constraints" will become Value in ('ASS','MCF','PR)…