Files and Database concepts

We will study Databases in two class meetings. This week we concentrate on basic DB concepts and design. Next week we will see database implementation in Microsoft Access.

The simplest way to think about data structure is defining each element of the data structure as follows:

character: as a set of bytes (binary digits) to represent any letter, number or symbol,
field or data element: as a set of characters to represent a characteristic of an entity (name, age, sex of a person),
record: as a set of fields to collect information about an entity (a student record, a bank account record),
file or table: as a set of records to collect information about a group of entities of the same type (student file or table, class roster table or file); and
database: as a set of files or tables to collect information about groups of related entities (the human resources data base will have employee, benefits, discounts, etc, tables.

You can see more on DB concepts here.

Database design concepts

A database is better defined as a set of entities and relations as shown in the next figure.

Entities are objects with a physical (car) or conceptual (culture) existence.
Database design is the process to define and represent entities and relations.

See figure 1 for a graphical representation of a data base.

Figure 1: a database representation

Entities have identifiers and attributes:

An identifier or key uniquely singles out one entity from all others of the same type, as for example the social security number of an employee, or the account number of a bank client.

Attributes describe or qualify the entities, as for example name, age, sex of an employee. As we saw previously a record of an entity type contains a key and attributes (fields or data elements) related to that entity type. For example a grade roster record will have the course number, faculty name, student name, id number, grades.

The grade roster as a whole comprises the grade roster file or table (we use the name table in the database environment, instead of file). Figure 2 shows traditional ways to represent flat files (non-database organized) and database designs: (a) graphical and (b)relational notation.

Figure 2: different database representations

I hope you noticed that two types of keys were identified in Figure 2: (a) primary key and (b) foreign key. Both uniquely identify an entity. A foreign key is a primary key of one table used as an attribute in another table.

The S (supplier) table has S# as its primary key. When S# is used in the table O (orders) is becomes an attribute (the supplier of an order), and is know as a foreign key to the O table. Please note that in database format there are no repetitive information in the tables, as there was in the flat file example. This is due to a process called normalization, that we will talk more later in this session.

Enough about entities. Relations can be of three types: one-to-one, one-to-many, and many-to-many. These types are better understood through examples:

One-to-one: a husband can only have one (legal) wife and vice-versa.
One-to-many: a (birth) mother can have many children, but a child can only have one (birth) mother. (I know it is becoming difficult to use these family examples. I will not even try for the many-to-many).
Many-to-many: a student can take many classes and a class can have many students.

Why is the type of the relations important? Because, depending of the type, the relations will be represented differently in the database.

Figure 3: simplified entity-relations diagram

The above figure is a simplified entity-relation diagram. Note that the relations between the three entities are all many-to-many. A product can be used in many projects, and a project can use many producs, etc.

We are now all set to learn some basic principles of database design, including a basic form of normalization required for database software work properly (including Access).

Figure 4: simple normalization

Once more, we can understand better these normalization principles through an example. Refer to Figure 3 and you will see three entities (supplier, product and project) in a many-to-many relation with each other. Now, look back to Figure 2 and lets understand how the normalization principles created, for example the relational notion representation of a database for Figure 3 entities.

The first rule of normalization tell as to create three tables, one for each entity -- in the example the S (supplier), P (product) and J (project) tables.

Rules 2 and 3 tell us to select a primary key to each table (S#, P#,J#) and assign the attributes to each entity to the corresponding table.

The first three simple normalization principles led us to define the entities. What is left to be done is to represent the relations, that we know in the case to be many-to-many. The simple normalization principle number 5 tells us that a many-to-many relation is to be represented by a table.

In the example the table representing the relation is the table O (orders) that have the primary key of the three entities as foreign keys. The only attribute of an order of one product, from one supplier, for one project is the quantity.

Theoretically, S#,P#,J# together would be the primary key to the O table, but in practice this would be a very long string of letters and numbers that could not be easily used. An artificial code, e.g. O# (order number) is created.