Databases Design: step 1 - Applications of Databases to Humanities and Social Sciences

Document Analysis

Identifying all the representative documents that contain the data you want to model is the first step in building a database. The term "documents" must be taken here in a broad sense. It can be either paper documents or magnetic documents (recording of interviews, disc or computer diskette)... or any other medium that can be used to store information.

Working documents

To illustrate the various stages of database design, we will use, as an example, the management of students taking the various courses of a degree.

The working documents used are as follows:

List of students
Student number	Student name	Date of birth	Sex
1	Dupont, Charles	18-03-1981	H (Male)
2	Dubois, Jules	02-11-1982	H (Male)
3	Favier, Isabelle	02-02-1979	F (Female)
...	...	...	...

Marks sheet
Student Number : 1 Dupont, Charles
Subject number	Name	Coefficient	Mark / 20
1	Mathématiques	3	10
2	Informatique	2	9
3	Sociologie	2	12,5
4	Histoire	1	13
5	Géographie	1	7
Average mark	10,3

Teacher's List
Teacher id	Name	Grade	Seniority in rank	Subject taught
10	Bertrand, Pierre	ASS	2	Sociologie
11	Dupont, Auguste	MCF	3	Mathématiques
...	...	...	...	...
15	Simon, Etienne	ASS	5	Histoire Géographie

Management rules:

- A subject is taught by only one teacher.
- A student has only one mark per subject.

Data dictionary

From the documents, the analysis phase consists in extracting the elementary (non-decomposable) information that will constitute the future database.

The collection of all the elementary data, called attributes or fields, constitutes the data dictionary. Each attribute (field) in the data dictionary can be characterized by the following properties:

Property	Meaning
Mnemonic	An abbreviation for the attribute name.
Wording	A label containing the precise meaning and role of the attribute.
Data type	Attribute type: integer, real, string, date...
Integrity constraints	List of constraints on possible attribute values
Calculation rule	The rule for calculating (obtaining) the corresponding attribute.

The data dictionary for the previous documents is as follows:

Mnemonic	Wording	Type	Constraints	Calculation rule
Ancien	Seniority in rank	Integer	>=0
Cdsexe	Gender code	String(1)	H or F
Coeff	Subject coefficient	Integer	>0 and <6
Dtnaiss	Date of birth	Date
Grade	Teacher Grade	String(3)	ASS or MCF or PR
Lbsexe	Gender Label	String(7)	Male or Female
Moyenne	Average mark for the student	Real	>=0 and <=20	SUM(Note*Coeff)/SUM(Coeff)
Nomat	Subject name	String(15)
Nomens	Name of teacher	String(20)
Nometu	Name of student	String(20)
Note	Mark obtained by the student in the subject	Real	>=0 and <=20
Numat	Subject number	Integer	>0
Numens	Teacher number	Integer	>0
Numetu	Student number	Integer	>0

Remarks
The data dictionary is independent of the database management system that will be employed to implement the database. To define the data dictionary for a particular DBMS, it will therefore be necessary to translate the "Type", "Constraints" and "Calculation Rule" columns into the specific formalism for the DBMS.
For example, for the "Grade" field, and if you use Firebird, the "Type" column will become Char(3), "Constraints" will become Value in ('ASS','MCF','PR)…

Next stage | DB Design