Remember the database modelling process?
› Requirements analysis
› Conceptual (E-R) model
Top-up/bottom-down design
‘Look for the nouns’
› Logical data model
Tables & Normalisation
› Physical data model
Column types; integrity constraints.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 4
Faculty seminars› Fortnightly research seminars.
› Rooms are usually booked in advance but the schedule is flexible.
› Potential speakers are contacted, talks requested and (hopefully!) scheduled. There is usually a significant delay between initial contact and scheduling
a seminar.
Speakers have various details, some public some private.
› Agreed talks have titles, may eventually have abstracts.
› Seminars database is to be administered from the web.
› Seminars are to be advertised primarily on a web page but may be in other forms.
› Publicity may be gained by making past seminar details available but these must obviously be separate from forthcoming talks!
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 7
Organisation and system requirements
Flesh out the details:
› A database containing seminar details is required.
Speakers give seminars in particular rooms at particular times that must all be advertised. Seminars are usually of fixed duration.
Speakers may give more than one seminar and it may be possible for seminars to be given by more than one speaker.
Speakers may give a seminar abstract which may be long.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 8
Organisation and system requirements …
› A public web page for advertising seminars is required. This must also include a way of advertising past seminars
in an obvious fashion. Since the abstract may or may not exist and may be quite
long it should be on a separate page.
› A method for entering speakers and seminars into the database is required. Administration is not public. Not all speakers contacted give seminars. Not all seminars have abstracts but all have titles,
speakers, date/time, location and duration.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 9
Organisation and system requirements …
› At this stage it’s not unreasonable to examine the
suitability of a Web-Database solution to the problem!
› Is it a suitable application?
Any obvious drawbacks?
Obvious advantages?
Does everything require a web interface or is another
‘thin client’ solution suitable?
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 10
Conceptual data model: Top-down...
› From the description the nouns suggest the following candidate entities:
seminar, room, speaker, schedule, title, abstract.
Some are weak entities that depend for their existence on other ‘stronger’ entities.
E.g. room: a seminar cannot occur without a room and attributes
of room are irrelevant to this simple DB.
Some are obvious attributes of entities:
E.g. title & abstract are attributes of a seminar
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 11
UML notation RDBMS E-R
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 12
Seminar
gives
Speaker
name
address
start room
duration
title
abstract
[1,n]
[0,n]
Seminar
PK room
PK start
duration
title
abstract
Speaker
PK firstName
PK lastName
PK title
address
phone
fax
Conceptual data model:
› Speakers ‘give’ 0…n seminars
0 until booked!
n they may come back!
› Seminars must have at least one speaker; may have more than one!
1…n
› Many candidate keys: Identify
speaker by name? Name+title? Name+title+institution? Email?
seminar by title? Title+speaker? Start time+room?
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 13
Conceptual web data model:› This is a similar analysis to the DB conceptual analysis (e.g.
‘looking for nouns’) but this time in the web-page description.
› There are 3 completeness rules that must be obeyed:1. Entity attribute completeness:
» Add entities and attributes whenever an attribute from the DB conceptual model is added to the WebDB conceptual model
2. Entity identity completeness:» Each entity must be accompanied by its primary key.
3. Referential completeness:» If an relationship (entity) is used then all entities that participate in
the relationship must be added.
› These ensure that each page can extract the data it needs from the database and that no necessary keys and/or entities are missing.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 14
Conceptual web data model: ‘Stage 1’ lists 4 web pages:1. Seminars advert page.
Lists all forthcoming and a selection of past speakers & seminars.
References to the DB conceptual model are: (speakers) title, firstName, lastName
(seminars) start, duration, room, title
› At this stage an idea for an additional feature crops up: It would also be nice to allow browsers to visit the speakers’ home and/or departmental
web pages.
This requires three more entities in the DB conceptual model: (speakers) institution, instWWW, WWW
These must be added to the conceptual model before proceeding! (Hopefully this kind of thing usually gets identified in ‘Stage 1’)
2. Seminar abstract page. Lists one seminar that has been specified some how.
References the same entities/attributes plus (seminars) abstract
Requires the referring page to supply the seminar relevant key value.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 15
Conceptual web data model: contd…3. Seminar speakers admin page.
– Creates a new entry in the speakers table and allows an old entry to be edited. Uses: (speakers)
title, firstName, lastName, address, email, phone, fax, institution, instWWW, WWW
– Links to a separate page for seminars associated with each speaker (just the key) (seminars) {PK} starts, room
4. Seminars admin page.– Creates a new entry in the seminars table and allows an
old entry to be edited:– (seminars) start, room, duration, title, abstract– (speakers) {PK} title, firstName, lastName
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 16
Conceptual web data model: contd…– Fortunately 2 entities & no relationships make the ‘rules’
(E&R page 291) easy to check– Each page that refers to an entity does include the entity’s primary
key!– Each attribute comes with its entity/key.– No relationships!
– Questions arising from the web conceptual model: Must we worry about referential integrity?
Later! Either the DB will take care of it or once the ‘Logical Web Data Model’ has been created (RDBMS tables designed) we can identify the problem areas!
There may be pages in the conceptual web model that do not refer to database entities – if an ‘E-R type’ diagram is drawn at this stage then they must be incorporated (e.g. following Eaglestone & Ridley.)
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 17
Conceptual web data model: contd…
– In our example the web page ‘entities’ do not correspond with distinct
DB entities.
– We can easily map these relationships using an extended E-R type
diagram or a full-on UML diagram where each page becomes a UML
entity with methods and properties
• I.e. the UML diagram also illustrates what the page does.
– For simplicity/brevity we’ll follow Eaglestone & Ridley and draw a new E-
R diagram where:
• Unidirectional arrows indicate hyperlink relationships between entities
(which are usually one way).
• Bidirectional arrows denote ‘uses’
» i.e. a page entity uses a DB entity.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 18
CO3041
Databases and the Web lecture 4 — © Kingston
University, UK 19
SeminargivesSpeaker
title
firstName
lastName
address email
phone fax
institution
instWWW
WWW
start roomduration
title
abstract
Advertising AbstractSeminar
admin
Speaker
admin
Logical data model:› Turn the basic UML E-R model into tables.
NB We added two more fields to speaker in the web data analysis stage … these are reflected in new UML…
› Only two tables means this stage is relatively straightforward: The 0…n relationship should be modelled like a ‘many to
many’ relationship: many speakers may be involved in one seminar;
many seminars may be related to one speaker We create an intermediate table that represents the relationship, E.g. ‘giving a seminar’: Table could be called ‘gives’ Inverse is ‘given by’
NB: Referential integrity is not violated by the ‘0’.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 20
Logical data model:
It’s also inconvenient to copy the whole (composite) primary keys from the ‘speakers’ and ‘seminars’ table to the ‘gives’ table; instead we can create a new key column in each table e.g. ‘speakerId’ and ‘seminarId’
Q: Looking ahead to step 6 Is there a convenient column type in MySQLfor this?
A: {SMALL,MEDIUM,LARGE}INT UNSIGNED AUTO_INCREMENT
Q: Is this always a good idea?
A: No! If the application requires frequent lookup conversions between the new ‘ID’ key and the ‘real’ key this leads to inefficient queries.› E.g. a ‘users’ table for a web authentication system would probably use the
email address as a unique identifier and need to convert to/from ID number each time … probably quicker to let SQL take care of the email lookup in this case.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 21
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 22
Seminar
PK seminarId
room
start
duration
title
abstract
Speaker
PK speakerId
firstName
lastName
title
address
phone
fax
institution
instWWW
WWW
gives
PK,FK1 seminarId
PK,FK2 speakerId
• Association entity models the relationship between our two tables.
Logical data model:
We should also check normalisation of tables at this point (althoughtables from E-R are usually normalised enough…)
› 1NF: Repeating groups?
› 2NF: “A relation that is in 1NF and every non-primary-key attribute is functionally dependant on the primary key.”
› 3NF: “A relation that is in 1NF & 2NF, and in which no non-primary-key attribute is transitively dependant on the primary key.”
… but bear in mind that while normalisation is desirable it can be useful occasionally to ‘denormalise’ tables for efficiency
› This depends on the application and implementation.
Normalise first to gain the benefits then denormalise for efficiency.
An exercise!
› Revise/practice yourselves!CO3041
Databases and the Web lecture 4 — © Kingston University, UK 23
Web data analysis: (public pages)
› The seminars advert page utilises fields (entities and attributes) from the database conceptual model as follows:
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 24
Seminars: LIST_OF
room: STRING
start: DATE/TIME
duration: NUMBER
seminars.title: STR
speakers.title: STR
firstName: STRING
lastName: STRING
institution: STRING
instWWW: STRING
WWW: STRING
Seminarsroom: STRING
start: DATE/TIME
duration: NUMBER
seminars.title: STRING
speakers.title: STRING
firstName: STRING
lastName: STRING
institution: STRING
instWWW: STRING
WWW: STRING
abstract: STRING
Abstract
URL
URL
URL
URL
Web data analysis: (private pages)
› The admin pages necessarily utilise most of the DB fields
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 25
title STRING
firstName: STRING
lastName: STRING
institution: STRING
address STRING
email STRING
telephone STRING
fax STRING
instWWW: STRING
WWW: STRING
speakerID NUMBER
Speakers Admin
title: STRING
abstract: STRING
room: STRING
start: STRING
duration: STRING
seminarID: NUMBER
speakerID: NUMBER
Seminars Admin
Web data analysis:
› We can do much more in this part of the analysis (see Eaglestone & Ridley) including
mock-ups of pages
textual page schema
identifying repeating structures
…
› An exercise for the group project!
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 26
These leave the ‘abstract’ layers of conceptual and logical
design behind.
‘Physical’ refers to the actual database and middleware to be
used … this might be influenced in the real world by
› personal preference/experience
› availability/cost
› ‘political’ viewpoints (e.g. open source, also c.f. cost)
› running costs
The seminars example must use open source web
server, database and middleware that runs on Sun Solaris.
› One obvious solution is Apache + PHP + MySQL
(there are others)CO3041
Databases and the Web lecture 4 — © Kingston University, UK 27
Physical database design:
› Specifying the speakers table structure
› CREATE TABLE speakers (speakerID SMALLINT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY,title VARCHAR(10) NOT NULL,firstName VARCHAR(20) NOT NULL,lastName VARCHAR(30) NOT NULL,institution VARCHAR(50),address TINYTEXT,email VARCHAR(50),telephone VARCHAR(25),fax VARCHAR(25),instWWW VARCHAR(100),WWW VARCHAR(100)
);
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 28
Physical database design:
› Specifying the seminars table structure
› CREATE TABLE seminars (seminarID SMALLINT UNSIGNED NOT NULL
AUTO_INCREMENT PRIMARY KEY,title VARCHAR(255) NOT NULL,abstract TEXT,starts DATETIME NOT NULL,duration TINYINT UNSIGNED NOT NULL
);
› CREATE TABLE gives (seminarID SMALLINT UNSIGNED NOT NULL,speakerID SMALLINT UNSIGNED NOT NULL,PRIMARY KEY (speakerID, seminarID)
);
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 29
Physical database design Foreign keys:› MySQL supports referential integrity rules via the REFERENCES part of
a column definition or FOREIGN KEY directives
› CREATE TABLE gives (seminarID SMALLINT UNSIGNED NOT NULL
REFERENCES seminars…,speakerID SMALLINT UNSIGNED NOT NULL
REFERENCES speakers…,PRIMARY KEY (speakerID, seminarID)
);
› but this only does anything when working with InnoDB tables not MyISAM tables
In MyISAM tables the REFERENCES rule is syntax-checked but nothing more …
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 30
From the MySQL manual
› CREATE TABLE parent (
id INT NOT NULL,
PRIMARY KEY (id)
) TYPE=INNODB;
› CREATE TABLE child (
id INT PRIMARY KEY,
parent_id INT,
INDEX par_ind (parent_id),
FOREIGN KEY (parent_id)
REFERENCES parent(id)
ON DELETE CASCADE
) TYPE=INNODB;
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 31
Physical web data design:
› Specifying the DB connectivity and how pages refer to each-other.
› For example:
Links in the advert page must pass a seminarID value to the abstract page.
The admin pages are similarly linked by seminarID and speakerID values.
› We’ll talk about how this may be achieved next weekbut will see it briefly in the exercise...
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 32
Physical web data model:
› The ‘finished’ product.
› So why the massively complex procedure?
Stage 1 allows you to specify what you’ll be doing so the client
cannot ‘change the goalposts’ + it’s measurable.
Stage 2/3 ensures consistency between the real world data and your
model of that data.
Stage 4/5 ensures your model is (kind of) optimal
Stage 6/7 separates the programming from the above stages … but
it’s made easier because of the earlier stages!
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 33
Without conceptual models entities/connections/
relationships can be missed or incorrectly represented.
E.g. The seminars database example
› Can you spot the problem? Hint: What’s the speaker-seminar
relationship?
› Originally envisioned for single-speaker seminars …
› Without the association entity (table) ‘gives’ it’s very difficult to
associate n speakers sensibly with 1 seminar!
It’s possible to cheat and glue an ‘other speakers’ field into the seminar
table … but that’s not “normal” and is bad design.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 34
Because the conceptual data model represents
the ‘real world’ it can often be used to ‘repurpose’
the data … I.e. provide a different interface
(perspective)
› E.g. The seminars database lends itself to:
Print format advertising (e.g. PDF or RTF)
XML distribution of seminar data.
RSS dissemination of ‘forthcoming events’.
CO3041
Databases and the Web lecture 4 — © Kingston University, UK 35