Advance database system (part 3)

Advance Database Systems

Overview of RDBMS

Contents

• The Three-Level ANSI-SPARC Architecture

• Relational Data Structure

• Relational Keys

The Three-Level ANSI-SPARC Architecture • The levels for three-level architecture comprising an External,

Conceptual, and an Internal level.

• The overall Description/Skeleton structure of the database is called the database schema.

• At the highest level, we have multiple external schemas(also called subschemas) that correspond to different views of the data.

• At the conceptual level, we have the conceptual schema, which describes all the entities, attributes, and relationships to get with integrity constraints.

• At the lowest level, we have the internal schema, which is a complete description of the internal model, containing the definitions of stored records, the methods of representation, the data fields, and the indexes and storage structures used. There is only one conceptual schema and one internal schema per database.

• The objective of the three-level architecture is to separate each user’s view of the database from the way the database is physically represented. There are several reasons why this separation is desirable:-

1. Each user should be able to access the same data, but have a different customized view of the data.

2. Users should not have to deal directly with physical database storage details.

4.Theinternalstructureofthedatabaseshouldbeunaffectedbychangestothephysicalaspectsofstorage,suchasthechangeovertoanewstoragedevice.

5.TheDatabaseAdministrator(DBA)shouldbeabletochangetheconceptualanddatabasestoragestructureswithoutaffectingtheusers’views.

• A major objective for the three-level architecture is to provide data independence, which means that upper levels are unaffected by changes to lower levels.

• There are two kinds of data independence:-

• 1- Logical Data Independence.

• 2- Physical Data Independence.

1- Logical Data Independence: -

• Changes to the conceptual schema, such as the addition or removal of new entities. attributes, or relationships, should be possible without having to change existing external schemas or having to rewrite application programs. Clearly, the users for whom the changes have been made need to be aware of them, but what is important is that other users should not be.

2- Physical Data Independence: -

• Changes to the internal schema, such as using different file storage structures, using different storage devices should be possible without having to change the conceptual or external schemas.

Relational Data Structure

Relational Keys • Keys are used to create relationship among different database tables.

• An entity type may have many instances, from a few to several thousands and even more.

• Now out of many instances, when and if we want to pick a particular/single instance, and many times we do need it, then key is the solution.

• For example, think of whole population of Pakistan, the data of all Pakistan is lying at one place, say with NADRA people. Now if at some time we need to identify a particular person out of all this data, how can we do that?

• While defining an entity we also generally define the key of that entity.

• A key can be simple, that is, consisting of single attribute, or it could be composite which consists of two or more attributes.

Super Key

Candidate Key• A super key for which no subset is a super key is called a candidate key, or

the minimal super key is the candidate key.

• It means that there are two conditions for the candidate key, One; It identifies the entity instances uniquely, as is required in case of super key, Second; It should be minimum, that is, no proper subset of candidate key is a key.

• So, If we have a simple super key, that is, that consists of single/simple attribute, it is definitely a candidate key, 100%.

• However, if we have a composite super key and if we take any attribute out of it and remaining part Is not a super key any more then that composite super key is also a candidate key since it is minimal super key.

• For example, one of the super keys that we identified from the entity STUDENT is “regno, name”, this super key is not a candidate key, since if we remove the regno attribute from this combination, name attribute alone is not able to identify the entity instances uniquely.

Primary Key• A candidate key chosen by the database designer to act as key is the

primary key.

• An entity type may have more than one candidate keys, in that case the database designer has to design at one of them as primary key, since there is always only a single primary key in an entity type.

• If there is just one candidate key then obviously the same will be declared as primary key. The primary key can also be defined as the successful candidate key.

• The relation that holds between super and candidate keys also holds between candidate and primary keys, that is, every primary key(PK) is a candidate key and every candidate key is a super key.

• A certain value that may be associated with any attribute is NULL, that means “not given” or “not defined”.

• A major characteristic of the Primary Key is that it cannot have the NULL value.

Unique Key

• A candidate key which can return a Record uniquely but may store a NULL value is called as Unique Key.

• Student Contact Number attribute in STUDENT table is known as Unique key.

Alternate Key

• Candidate keys which are not chosen as the primary key are known as alternate keys.

• For example, we have two candidate keys of EMPLOYEE in figure2, regNo and nId Number, if we select reg No as PK then then Id Number will be alternate key.

Foreign Key• Some times the information stored in a relation is linked to the

information stored in an other relation.

• If one of the relations is modified, the other must be checked, and perhaps modified, to keep the data consistent.

• Suppose that in addition to Students, we have a second relation:

• Enrolled (cId: string, sId: string, cGrade: Text)

• The sId field of Enrolled is called a foreign key and refers to Students.

• The foreign key in the referencing relation(Enrolled, in our example) must match the primary key of the referenced relation(Students).

• As the figure shows, there may well be some students who are not referenced from Enrolled (e.g., the student with sId= 50000)

• However, every sId value that appears in the instance of the Enrolled table appears in the primary key column of a row in the Students table.

• If we try to insert the tuple(55555, Art 104, A) into E1, the rule is violated because there is no tuple in S1 with the id 55555; the database system should reject such an insertion.

• Similarly, if we delete the tuple(53666, Jones, jones@cs, 18, 3.4) from S1, we violate the foreign key constraint because the tuple(53666, History 105, B) in E1 contains sid value 53666, the sid of the deleted Students tuple.

• The DBMS should disallow the deletion or, perhaps, also delete the Enrolled tuple that refers to the deleted Students tuple.

• Many times we need to access certain instances of an entity type using the values of

one or more attributes other than the PK.

• The difference in accessing instances using the value of a key or non-key attribute is

that the search on the value of PK will always return a single instance(if it exists), where

as uniqueness is not guaranteed in case of non-key attribute.

• Such attributes on which we need to access the instances of an entity type that may not

necessarily return unique instance is called the secondary key.

• For example, we want to see how many of our students belong to Multan, in that case

we will access those instances of the STUDENT entity type that contain “Multan” in their

address.

• In this case address will be called secondary key, since we are accessing instances on the basis of its value, and there is no compulsion that we will get a single instance.

• Keep one thing in mind here, that a particular access on the value of a secondary key MAY return a single instance, but that will be considered as chance.

• There is not the compulsion or it is not necessary for secondary key to return unique instance.

• But In case of super, candidate, primary and alternate keys it is compulsion that they will always return unique instance against a particular value.

Surrogate Key• A Surrogate Key is any column or set of columns that can be declared

as the primary key instead of more than two composite Primary keys that jointly makes a Cumber some key(CUMBERSOME meaning: Large Set). Example of Cumber some key and Surrogate key is shown in next slide.

Remember that

the primary key

MUST be unique

This is why treatment date

and time are included in the

composite primary key

But this makes a

very

Cumbersome

Key…

It would be better to create a

Surrogate Key like treatmentId

in PATIENT_TREATMENT

table

Education

Advance database system (part 3)