84
Databases http://www.lightenna.com/book/export/s5/101 1 of 84 25-2-07 4:23 pm Databases The Database Training course is structured into five sections. 1. Data models This is the Data models course theme. In this section we introduce concepts for modelling data and language for communicating those models. [ Complete set of notes PDF 870Kb]. 1.1. Introduction Databases are pervasive in modern society. So many of our actions and attributes are logged and stored in organised information repositories, or Databases [ Section notes PDF 227Kb]. 1.1.01. Databases Where do we come into contact with databases? Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories, budgets Department Students, courses, staff, payroll These are all examples of relatively simple databases. All of the information is textual or referential. 1.1.02. New technologies Not just traditional, numeric/textual Research 70s biased Digital media Video servers (atom/bbc/youtube) Multimedia databases Web site, collection of diverse data types Google, AltaVista Stock Exchange Futures, Currency markets, trends Databases comprising not only data, but modelling algortihms Microsoft's WinFS Databases don't have to store just text. Increasingly Database servers are storing, indexing and delivering rich-media content, explicitly images, audio and video.

Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

1 of 84 25-2-07 4:23 pm

DatabasesThe Database Training course is structured into five sections.

1. Data modelsThis is the Data models course theme. In this section we introduce concepts for modelling data andlanguage for communicating those models.[Complete set of notes PDF 870Kb].

1.1. IntroductionDatabases are pervasive in modern society. So many of our actions and attributes are logged and stored inorganised information repositories, or Databases[Section notes PDF 227Kb].

1.1.01. DatabasesWhere do we come into contact with databases?Supermarket inventories/EPOS

Supplies, shopping habits, store locations, accountsFilms (iMDB)

Cast lists, shooting schedules, histories, budgetsDepartment

Students, courses, staff, payroll

These are all examples of relatively simple databases. All of the information is textual or referential.

1.1.02. New technologiesNot just traditional, numeric/textualResearch 70s biasedDigital media

Video servers (atom/bbc/youtube)Multimedia databases

Web site, collection of diverse data typesGoogle, AltaVista

Stock ExchangeFutures, Currency markets, trendsDatabases comprising not only data, but modelling algortihms

Microsoft's WinFS

Databases don't have to store just text. Increasingly Database servers are storing, indexing and deliveringrich-media content, explicitly images, audio and video.

Page 2: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

2 of 84 25-2-07 4:23 pm

Microsoft's next generation File storage system (WinFS) is a relational database. From a user perspective,searching (the process of indexing content by keyword) is already mainstream. Users are moving awayfrom rigid directory structures (files and folders) and towards keyword-tagged content.

1.1.03. VariationsNot only in roleSize

Video server exampleComplexity

Ford Puma™ exampleFord staff organised by production line and care.g. Each staff member answers to a 'part' manager (engines, bodyshell, chassis) and a 'car' manager (Puma, Mondeo, Ka)

ExpenseUser profiles and demands

We've seen that databases are used in a variety of contexts. Those roles imply properties of each of thesystems. An interlinked text-only database (such as Unix/Linux's MAN pages) will require much lessstorage than a video archive.

Some databases are perceptually more complex. Ford's staff management model would be represented as amatrix (in this case 2 dimensional). Computers are very good at organising multi-dimensional space.

1.1.04. DefinitionEmbrace diversityData and semanticDatabase: related data, implicit meaningSample/subset of real world

real world with boundsMiniworld/Universe of Discourse

A single definition of a database is hard to come by. Dictionary.com defines a database as: a comprehensive collection of related data organized for convenient access, generally in a computer. The Wikipedia definition runs for several pages.

1.1.05. AbstractionPrevious lab exercises

Problem: reading data from a fileAbstraction themeBasis of good OOP and further good P

from encapsulation to software component analysisLayering, splitting data from design

Solution: grammar (language guide) and data

In some of your previous lab assignments, or practical experience, you may have been faced with the problem of caching information persistently in a file, later to be reloaded.

Page 3: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

3 of 84 25-2-07 4:23 pm

When writing the data into the file, we are storing more than just that information. We are storingimplicitly a design/grammar for that data. That implicit design is evident when accessing the file with anaive interface. If you try to read the data out in a different order, it fails.

A better solution is to split the way the information is stored from the actually information stored.

1.1.06. Data-design divideLeft-hand/right-hand divideLHS: Catalog

or Meta-dataor Intensionor Schemai.e. the Design of databaseTypes of data, organisation, constraints

RHS: Extensionor SnapshotThe data itselfInformation stored in the databaseTuples

1.1.07. DBMSDataBaseManagementSystemCollection of programs that enable users to:

Define - patterns, boundaries, designConstruct - populate to go liveManipulate - runtime changes

data in a structured, organised store.

1.1.08. Database Management Systems

Page 4: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

4 of 84 25-2-07 4:23 pm

PropertiesData models and independence

RequirementsCategorisation of Database users

DBMS componentsArchitecture

When considering the database systems as a whole, we need to look at all the components, including elements that interact with the DBMS (users, whom we categorise for simplicity).

This course will contain a discussion of the components that make up the system and the way they interact(system architecture).

1.1.09. ModelsData modelsRelational data model (Oracle)Object data model (ObjectStore)Legacy systems

Hierarchical data modelNetwork data model

A data model is an invention. It is a construct that allows us to share an understanding of how the systemworks. As with all good constructs, it's an abstraction; a simplification; a story.

In this course we're going to look at the Relational model, where the database is organised into tables (relationals) and each row (tuple) within that relation is coded (keyed) to allow referencing between the relationals.

The Relational model, inspite of being innovated in the 1970s is still the most popular, underpining mainstream modern databases such as Oracle 10i and MySQL 5.0

As programming languages are becoming increasingly Object orientated, programmers require a means of persistently storing their Objects. Object Orientated Databases (OODBs) exist to fulfil this purpose.OODBs may ultimately replace relational databases, but it's not clear at this stage when.

1.1.10. Other propertiesBeside Data modelNumber of usersNumber of sitesCostTypes of access pathsGenerality, or inversely specificity

MySQL is a highly general database system, in that it supports many different designs. My mobile phoneaddress book is a highly specific database system and as such is not easily extensible.

1.1.11. Independence

Page 5: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

5 of 84 25-2-07 4:23 pm

Based on File processingData definition implicit in

DataApplication program

Example of one specific databaseStructure embedded into access program

Coursework code re-use example

Earlier I made mention of this problem. Databases tie into the wider Software Engineering field. WithinSoftware Engineering, post-development issues of code re-use, maintenance, future evolution etc. necessitate a logical flexible approach to program design. Databases are such an approach. In order tostore information in a database you invest a small amount of time in explicitly structuring it, however you then get things like flexibility (data independence etc.) for free.

1.1.12. Program-data independenceGeneral databases

Separate Data definition and dataCatalog/meta-data & tuples

Data format/structure stored separatelyProgram-data independence (e.g. Y2K)

Changes in data formatAlter data (tuples)Alter grammar (catalog/meta-data)

We actually split program-data independence into:LogicalPhysical

...more in a second

1.1.13. Program-operation independenceIn object-orientated databases

Objects consist of attributes and operationsOperation defined by

Header/Interface/Prototype or SignatureImplementation

Program-operation independenceImplementation change hidden from user

Collectively data abstraction

1.1.14. Logical and physical program-data independence

Three tier data-model diagramMappings, Data independence

Logicalbetween conceptual and externalchanges to conceptual without changing

external schemas or application programs

Page 6: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

6 of 84 25-2-07 4:23 pm

Physicalbetween internal and conceptualchanges to internal without changing

conceptual or external schemas

1.1.15. DBMS RequirementsDataBase Management System

Abstraction (i.e. program-data independence)Conceptual representation (data models)Multiple views and User InterfacesData sharing and transaction processingAccess restrictionRedundancy removal/optimisationPersistent storage (Program objects) & IntegrityRelationship management & InferenceBackup and recovery

Here's a summary of what we need from a DBMS

1.1.16. Database UsersDatabase as 1y resource, DBMS as 2yDatabase administrators (DBA)Database designersEnd users

Naïve - canned transactions e.g. bank/airlineSophisticated - engineers, scientists, query editorsStand-alone (personal databases/MS Access)

1.1.17. Results of useKnock-on effects of database approachEnforcing standardsReduced Development timeAdaptive to change (design changes)

Page 7: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

7 of 84 25-2-07 4:23 pm

Up-to-date information (live database)Economies of scale

centralising commonly required resources

...and this is what you get for free. These are the consequences, largely positive, of adopting a databaseapproach to an information storage problem.

1.1.18. Design sideMeta-data

Database schema, intensionData ModelLeft hand side of database divide

Schema diagramEntity-relationship (ER diagrams)UML diagrams

1.1.19. Data sideData, under the column headingLess easy to look at (volume issue)Fundamentally less interesting (more specific)Variety of tools for looking at it:

HeidiSQL, PhpMyAdmin, SwordHere's what a snapshot looks like:

Page 8: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

8 of 84 25-2-07 4:23 pm

1.1.20. Data modelData model

Structure of the databaseCollection of basic operationsCollection of behaviours/user defined operations

Dependent on level of abstractionTier diagram

External (user views)Conceptual*Internal

1.1.20. General data model termsEntityRelationshipAttributesKeys

What follows here is an introduction to the terms which make up the language that we use to describe datamodels.

1.1.21. EntitySelected from real worldPopulate Miniworld/UoDEntity is an approximationTwo elements: Entity types and setsCollection of attributes

Object similarity, classes as entity typesEntities inter-relate

Page 9: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

9 of 84 25-2-07 4:23 pm

1.1.22. AttributesData type, domainSimple or compositeSingle or multivaluedStored or derivedNull

1.1.23. KeysMechanism for unique identificationUniqueness constraintStrong and weak entitiesKey attributeComposite keysMultiple keys

1.1.27. RelationshipsTypes and SetsParticipation by EntitiesDegree - e.g. binaryCardinality ratios - e.g. 1:1, 1:M

Determine occurrenceTuples example

Foreign keys

Page 10: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

10 of 84 25-2-07 4:23 pm

1.1.28. Database architectureClient-ServerDistributed databases

Fragmentation by attribute/tuple/relationLanguage and description

Storage Definition Language (SDL) DESIGNData Definition Language (DDL) DESIGNView Definition Language (VDL) DESIGNData Manipulation Language (DML) DATA

High level, can be embedded but precompiledProcedural, record-at-a-time, requires high level support

1.1.29. Structured Query

Page 11: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

11 of 84 25-2-07 4:23 pm

Structured Query Language (SQL or SEQUEL)Success of relational databases

Developed for SystemR at IBMANSI standardised

SQL1 or SQL-86, ongoing extensionSQL2 or SQL-92, current versionSQL3 (1999), SQL2003DDL, DML(low-level) and VDL

1.2. Data modelsA data model is a model that describes in an abstract way how data are represented in a business organization, an information system or a database management system - Wikipedia[Section notes PDF 310Kb].

1.2.01. IntroductionRelational modelAbstract operations on relations

Set theoretic operationsRelational-specific operations

Basic algebra operationsUnion, Intersection, DifferenceCross product

1.2.02. Relational modelTable as relationRow as tuple

real world entity or relationshipfact

Column as attributeDomain

The concept of a relation is abstract, therefore we have a number of different ways of visualising it.

Page 12: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

12 of 84 25-2-07 4:23 pm

1.2.03. RelationRelation schema R(A1, A2, A3.. An)

Design sideAssertion/declaration

Relation stateData sideset of n-tuples

each one an ordered list of values1NF: each value is atomic, no composite/multivalue

1.2.04. Abstract operationsDatabase lifecycle

design, populate, evolveInsert

tuple (a1,a2,a3…an)Delete

tuple (a1,a2,a3…an)Update (or modify)

tuple (a1,a2,a3…an)attribute to change, new value

Page 13: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

13 of 84 25-2-07 4:23 pm

All the operations described in the next few sections are abstract. We're going to see how valuable theycan be in processing real world data later.

1.2.05. Basic algebraTwo categories

Set theoretic operationsUnion, Intersection etc.

Relational specificSelect, project and join

At this stage we're talking about set theoretical operators on the Relational model, not SQL instructions which confusingly have identical names and only similar behaviour.

1.2.06. Select operationSELECT a subset of tuples from a relation

Uses selection conditionEvaluate each tuple to true of falseFalse tuples discarded

Sigma (s)output = s(cond)(input_relation)Relation schema: R(output) = R(input_relation)Commutative

1.2.07. Project operationPROJECT a subset of attributes for all tuples from a relation

Page 14: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

14 of 84 25-2-07 4:23 pm

Pi (p)p<attribute list>(R)

If sublist is only non-key attributesmight get duplicates

Removes duplicatesAttribute list:sublist example

The result set of the operation is itself a relational. That output relation will contain the same number ofrows as the input, however it may contain a different number of columns; fewer if a subset of attributes is projected; more if derived or aggregated attributes are included.

1.2.07. Sequences of operationsSelect followed by projectionArea clipping: rows then columnsp<attr list>(s(select_cond)(R))Rename operation (r)

Renames attributes list2 from list1r(new_attr_names)(R)

1.2.08. Rename operationAttribute renaming only

Cannot alter domain, or add/remove attrRename operation (r)

Renames attributes list2 from list1r(new_attr_names)(R)

Implicit renaming

Page 15: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

15 of 84 25-2-07 4:23 pm

Order dictated by relational schema

1.2.08. Set TheoreticBinary operation: two relations

Sets of tuplesUnion compatibility (same attributes)

Union (R u S)Intersection (R n S)

Commutative (R u (S u T) = (R u S) u T)

1.2.08b. Set differenceSet difference

Non-commutative (R-S != S-R)

1.2.09. Cross productCartesian product of two relationsR x S

Page 16: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

16 of 84 25-2-07 4:23 pm

Also known asCross productCross join

Cross product diagramIntroduction to complexity

Computationally explosive

1.2.10. Relational algebra/model notationRelational schema R(A1, A2,…,An)Relation state r or r(R)

Set of unordered tuplesr = {t1, t2,…,tn}

Each n-tuple is an ordered list of valuest = <v1, v2,…,vn>

ith value in t = vi called t[Ai]r(R) subset of (dom(A1) x dom(A2)... x dom(An))

1.2.11. ConstraintsDomain constraint

For all v in t of r(R)vi is an element of dom(Ai)

Entity constraintK = SKmint[K] != null

Key constraintSuperkey SK as identifying subset of attributest1[SK] != t2[SK]

1.2.12. Referential integrityGiven two relations R1 and R2

R1 contains a foreign key (FK) that referencesA primary key (PK) in R2

Page 17: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

17 of 84 25-2-07 4:23 pm

R1 referencing relation, R2 referenced relationShared domains: dom(FK) = dom(PK)Foreign exists: t1 in r(R1), t2 in r(R2)

t1[FK] = t2[PK] || NULL

1.3. JoinsIn this lecture we look at...[Section notes PDF 233Kb].

1.3.01. IntroductionRecap: pulling data out of individual relations

By row, by columnSelect and project

Access across multiple relationsMiniworld approximation

Fragmenting entities by cardinalityTuples as entity fragmentsRelationships within relations

JoinsJoin types (condition and unmatched)

1.3.02. Access across relationsRelational model allows multiple relations to exist within one database schemaRelations can be accessed individually or together (joins).Referential integrity

Relations relatingPulling data out of single relations

Select and projectPulling related data out of

Multiple relations using Join

1.3.03. Miniworld approximationUniverse of Discourse, or MiniworldMiniworld is an incomplete model of the real worldThe relational data model as a model for the miniworldApproximation

Separate and distinct entitiesSingle complex entitiesSeparate related entitiesCardinality of relationships

Each relation made up of attributesValues can be used as references

Page 18: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

18 of 84 25-2-07 4:23 pm

1.3.04. Pointing mechanismRelation has a Primary keyTuple contains Primary key valueForeign keys

Tuples can contain a reference to another relation's Primary keyJust numbers

One number identifies a single tuple in one relation (local), one number identifies a single tuple in anotherrelation (foreign).

1.3.04b. Pointing mechanism example in CC programming languageMemory addresses, or pointers

int a=0;int b=0;a = &b;

a points to b

In databases, typically done with unique identifiers (IDs) rather than memory addresses.

1.3.04c. Pointing mechanism with structuresForeign key importing

typedef struct car{ int ID; char[] make; char[] model; char[] derivative;

Page 19: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

19 of 84 25-2-07 4:23 pm

int optionID;} car;

typedef struct option{ int ID; char[] name; int price;} option;

car c;option o;//...data structure populatingc.optionID = o.ID;

1.3.05. Relational cardinality1:0 relationships

Single entityUniquely indentifiableCandidate keysPrimary Key

1:1 relationshipsTwo entities, A and B1 A relates to 1 B and vice versa

1:N relationshipsM:N relationships

1.3.06. Relationships in the relational modelTwo relations, A and BA side, B side, 1 side, N side1:1 relationships

Key can go on either side

1:N relationshipsKey cannot go on 1 sideHas to go on N side

Page 20: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

20 of 84 25-2-07 4:23 pm

M:N relationshipsNowhere obvious for the key to goCreate new pairing relation

1.3.07. JoinsPhase change, different point in lifecycleJoin operation

Combines related tuples, conditionallyFrom two relationsInto single tuples

Allows processing of relationshipsAmong multiple relations

1.3.08. Joins, canonical algebraic formConditional (on join condition)

Only combines tuples where trueCartesian product (conditionless)

example of conditionless joinall tuples combinedR �true S

�, Binary operatore.g. R �<join_condition> S

1.3.09. Join equivalenceEquivalent to sequence

Cartesian product (X)followed by Selection (s)

ACTUAL_DEPENDENTS =sSSN=ESSN(EMPNAMES X DEPENDENT)orACTUAL_DEPENDENTS =EMPNAMES � SSN=ESSN(DEPENDENT)

1.3.10. Join types (condition)

Page 21: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

21 of 84 25-2-07 4:23 pm

Theta: Ai q Bj(A from R, B from S)

q is comparison operator=,<,>,!=,>=Ai and Bj share the same domain

Equi: Ai = BjTheta join where q is =

Natural: Ai and Bj are the same attributein two separate relations (name and domain)* denotes natural joine.g. EMPNAMES * DEPENDENTS

1.3.11. Join types (inner and outer)Inner joins

not the only joinseliminate tuples without a matching counterparti.e. tuples with a null value for the join attribute are discarded

1.3.12. Outer joinsOuter joins control what's discarded

Keep unmatched tuples in eitherLeft, right, or both relationsLeft, right of full outer join correspondingly

Page 22: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

22 of 84 25-2-07 4:23 pm

1.4. ER diagramsIn this lecture we look at...[Section notes PDF 75Kb].

1.4.01. ER Diagrams and Relational mappingDesign communication techniques

ER diagramsER to relational mapping

Entities to ObjectsType Inheritance

EER diagramsUML

Web DB Integration

1.4.03. Design in the modern contextTeam based developmentDocumentation

Value of design over descriptionDB sketching (left hand side)

Concept more important than perfectionDesign iteration

Mini-world as approximationCategorisation to create entitiesVerb’ing to create actions/relationships

1.4.04. Database Left:right divideDesign

Catalog, Meta-data, Intension, or Database schemaEntity typeRelationship type

StateSet of occurences/instances, Extension, snapshot

Page 23: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

23 of 84 25-2-07 4:23 pm

Entity setRelationship set

1.4.05. Basic ER diagramTypically part of a system(Strong) Entities

ProductCustomerPayment

RelationshipsSale

1.4.06. Mapping ER to Relation DB tablesIntuitive mapping

Entities as tables

Page 24: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

24 of 84 25-2-07 4:23 pm

Attributes as columnsRelationships are more difficultKey sharing mechanism

Foreign key references primary keyWhere to put the foreign key forms the intuitive guide to the rest ofthe mapping

1.4.07. ER to Relational mappingStep-by-step approach

Strong entities1.Create relation including (simplified) attributes

Weak entities2.Create relation inc. attr, foreign/pri key of owner

Binary relationship S:T, 1:13.Choose relation, say S (with total participation) and inc. foreign/prikey of Tinc. relationship attributes

1.4.08. CardinalitySpecifies number of relationship instances a single entity can participate inS:T (1:1)An entity from table S can is related to one, and only one entity from table T1:1, 1:N, N:M

DEPARTMENT : EMPLOYEEEMPLOYEE : EMPLOYEEPROJECT : EMPLOYEE

1.4.09. ER to Relational mappingBinary relationship 1:N5.

Choose relation T (N-side) inc. foreign/pri key of SBinary relationship M:N6.

Create relation, inc. foreign/pri keys of S&TMultivalued7.

For each mv_attr, create new relation, inc. foreign/pri key of parentn-ary relationship8.

Create new relation, inc. all foreign/pri keys of participatingentities

1.4.10. ParticipationParticipation constraintsExistence of an entity dependant upon

being related to another entityvia relationship type (left hand/design)

Page 25: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

25 of 84 25-2-07 4:23 pm

Total (")/Existencedependency (double line)

Every student must be in a facultyFor every entity in the total set of students

Partial ($) (single line)Some students are student_representativesThere exists some entity(s) within the set of all…

1.4.11. Library exampleLibrary exampleEntities: Person, Book, LibrarianTime-perspective implications

1.5. UMLIn this lecture we look at...[Section notes PDF 86Kb].

1.5.01. UML diagramsNot just one type of diagramER Entities -> UML objectsAdds scope to include methods

Page 26: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

26 of 84 25-2-07 4:23 pm

1.5.03. Simple UML relationship diagramClasses

AttributesMethods

RelationshipsParticipationCardinalityRoles

1.5.04. UML inheritanceNotion of inheritance

Java parallel'is a' relationshipsDatabase Student

is a Computer Science Studentis an Engineering Faculty Student

is a Studentis a Person

Inheritance hierarchiesUML diagrams can be used to show inheritance

Page 27: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

27 of 84 25-2-07 4:23 pm

1.5.05. UML diagramClassesRelationshipsInheritance

2. DBMS SystemsThis is the DBMS Systems course theme.[Complete set of notes PDF 482Kb]

2.1. Queries

Page 28: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

28 of 84 25-2-07 4:23 pm

In this lecture we look at...[Section notes PDF 319Kb]

2.1.01 IntroductionMethods of getting data outThe need for queriesQBESQL (design side)

HistorySchemas and relations (CREATE)Data types and domainsDROP and ALTER

2.1.02. Querying interfacesHigh (view) levelQuery-By-Example (QBE)

Alternative to SQLTable driven (visually similar to relations)Rather than script driven, hence intuitive over learnedVisual or text based

User fills in templatesMicrosoft Access approach

2.1.02b. QBE visual exampleRecord advancingQuery designing

Finite domain attributesWeb search parallel

Page 29: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

29 of 84 25-2-07 4:23 pm

2.1.03. QBE text-format exampleP. print, I. insert, D. delete, U. update_VARNAME, copy field value into variable

2.1.05. SQLStructured Query Language

(SQL or SEQUEL)Wikipedia reference

Success of relational databasesDeveloped for SystemR at IBMANSI standardisedSQL-1986 (SQL1), ongoing extensionSQL-1992 (SQL2), current version (Oracle 9i)SQL-1999 (SQL3), regular expression matching, recursive queriesSQL-2003, XML features, auto-generated columns

2.1.05b. SQL command syntaxWhere follows here is a brief summary

Oracle syntaxSimilar but not identical to MySQL/MSSQL

General familiarityQuery writing best learnt by doing itLecture live-exampleCoursework 1 will be SQLOracle (9i) SQL referenceMySQL (5.0) SQL reference

2.1.05c. SQL in application

Page 30: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

30 of 84 25-2-07 4:23 pm

Keyword oriented languageKeywords not congruous with Relational modelLots of different ways to write SQL

Analogous to C/Java formattingif (b==2) { a=1; } else { a=0; }

Recommend using case to differentiate attributes and keywordsSELECT colour, size, shape FROM fruit WHERE weight>22;

Oracle user accounts on Teaching databaseNamespace references, e.g. shared.cars

2.1.06. SQL Create schemaData definition commandsCREATE

SCHEMA <schema_name> AUTHORIZATION <a>or workspace

Beware of namesName collisions produce odd behaviours

SQL Schema embraces Tables (relations), constraints, views, domains, authorizations

2.1.07. SQL Create tableCREATE TABLE

<schema_name>.<relation_name>(

<attribute_definitions><key><constraints>

)CREATE TABLE example (Oracle)Tables can (and should) be indexed by usere.g. <username>.<tablename>Normal login implies usernameNon-local table access

2.1.08. Data types and domains (Oracle)Numeric

ENUMNUMBER, NUMBER(i), NUMBER(i,j)

Formatted numbers, i precision, j scale(number of digits total, after decimal point)

Character-stringCHAR(n) - n is lengthVARCHAR2(n) - n is max

DESCRIBE output exampleMulti-database comparison of DatatypesDatabase legacy: limited storage necessitated efficient storageDoes it need to be efficient anymore?

Page 31: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

31 of 84 25-2-07 4:23 pm

You might consider all SQL types as being conceptually similar to attribute types in the relational model, although in reality the implementation of these types in a DBMS only approximates the mathematical purity of unordered domain sets etc.

2.1.08b. Data types and domains (MySQL)Numeric

TINYINT, INT, INT UNSIGNEDFLOAT, DOUBLE, DECIMALENUMCharacter-string

CHAR(n) - n is lengthVARCHAR(n) - n is maxTINYTEXT, TEXTBeware different default/maximum lengths to Oracle

BLOBMulti-database comparison of Datatypes

2.1.09. Time-based data typesDate and Time

DATETen positions, components YYYY-MM-DD

TIMEEight positions, components HH:MM:SS

TIME(i)Time fractional seconds precisionAdds i+1 positions

TIMESTAMPoptionally WITH TIME ZONE

Very sensitive to syntactical ambiguitiesday/month/year/hour/minute separators

2.1.10. DROPingDROP <object> <obj_name> <flags>DROP SCHEMA <schema_name> CASCADE

drops all workspace tables, domainsDROP TABLE <relation_name> RESTRICT

only drops table ifnot referenced in any constraints/views

Notion of cascadingTable links

2.1.11. ALTERingSchema evolutionDesign side

Page 32: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

32 of 84 25-2-07 4:23 pm

ALTER TABLE <schema_name>.<relation_name> ADD<var_name> <var_type>;Example

ALTER TABLE uni.student ADD hall VARCHAR(32);Upper and lower case syntaxNaming conventions

2.1.12. QueriesHelper interfaces

HeidiSQL/phpMyAdmin/Sword/SQLplusDesign/perform a lot of routine queries for youImportant to learn SQL, reinforcementDesigning select queries is more difficultVisual interfaces still lacking in this area

Select queries in SQLBasic singletsRenamingQueries with JoinsNested queries

2.1.13. SQL QueriesSELECT statementSimilar to relational data model SELECT then PROJECT

SELECT <attribute list>FROM <table list>WHERE <condition>;

Page 33: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

33 of 84 25-2-07 4:23 pm

2.1.14. SQL QueriesSELECT <attr_list>

FROM R,S,TWHERE DNO = 10

equivalent top<attr_list>(sDNO=10 (R X S X T))True-false evaluation tuple by tupleWHERE clause as compound logical statement

2.1.15. SQL QueriesProduces a relation/set of tuplesCan be used to extract a single tuplee.g. SELECT bday, age

FROM studentWHERE fname='Tim' AND lname='Smith'Result = (13-05-80, 20)

Argument quoting (')SQL poisoningNot nullNot numeric values

MySQL Attribute quoting (`)Hypothetical attribute `all`, all, and ALL

SQL poisoning is a vulnerability exposed by inadequate escaping of arguments/variables used to compose SQL queries.

E.g. Tim in previous example, could be Tim'; DELETE FROM student;' SELECT * FROM student WHERE 1

2.1.16. Renaming and referencingAS keyword(Partial) Attribute renaming in projection list

SELECT fname AS firstName, minit, lname AS surname...Role names for relations

SELECT S.FNAME, F.FNAME, S.LNAMEFROM STUDENT AS S, STUDENT AS FWHERE S.LNAME=F.LNAME

(Total) Attribute renaming in FROMSELECT s.firstName, s.surname

FROM student AS s(firstName,surname,DOB,NINO,tutor)Wildcards (SELECT s.* FROM...)

2.1.17. SQL TablesRelations are bags, not sets

Page 34: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

34 of 84 25-2-07 4:23 pm

e.g. projection of non-key attributesSet cannot contain duplicate item/repetitionDuplicates exist in bags and be:

SELECT DISTINCT (eliminated)SELECT ALL (ignored/kept)

2.1.18. Queries and JoinsRelational database allows inter-related dataSQL select FROM gives Cartesian productWHERE clause defines join condition

SELECT proj.pnum, mgr.ssnFROM project AS proj, employee AS mgrWHERE proj.mgrssn = mgr.ssn;

Alternatively, explicitly define join (note type)SELECT project.pnum, employee.ssnFROM project INNER JOIN employee ON project.mgrssn = employee.ssn;

2.1.18b. Outer joinsOuter joins are crucial in the real-worldDatabases often contain NULLs (3VL)Analysis of where the crucial data is across a relationshipPrevious example, only get project data for managed projects

SELECT project.*, employee.*FROM project INNER JOIN employeeON project.mgrssn = employee.ssn;

2.1.18c. Outer joins (cont)Scale of loss isn't always instantly obviousNULLs often used unpredicablyMay want project information, even if no employee attached as manager

Page 35: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

35 of 84 25-2-07 4:23 pm

SELECT project.*, employee.*FROM project LEFT OUTER JOIN employeeON project.mgrssn = employee.ssn;

2.1.19. 2y and 3y joinsQueries can encapsulate any number of relations

Even one relation many times (in different roles)Relationship chainAcross many relations

Tuples as Entities OR Relationshipse.g. Employee -> Works_on -> Project -> Department ->Manager

2.1.20. Recursive closureCan’t be done in SQL2Recursive relationshipsUnknown number of stepsSQL2 can’t generalise in single query

2.1.21. Nested queriesEssential one or more (inner) queries within an (outer) queryInner and outer queryNot to be confused with inner and outer joinsInner query can go in three places

SELECT clause (projection list)Must return a single value, then aliased as attribute in outer result

FROM clause

Page 36: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

36 of 84 25-2-07 4:23 pm

Inner query result used as standard table in FROM cross productWHERE clause

2.1.21b. Nested query exampleUse of query result as comparator for other (outer) query

SELECT DISTINCT courseFROM dept WHERE course IN (

SELECT d.courseFROM dept AS d, faculty AS f, student AS sWHERE d.ownfac=f.id AND s.owndept=d.idAND f.name='Eng' AND s.year='3'

) OR course IN (SELECT courseFROM deptWHERE code LIKE 'COMS3%');

2.1.22. Bridging SQL across 3 tiersThree tier database designChanging role of DBMSIndicesAggregate functions (conceptual)

Over bags and sub-bagsCreating and updating views (ext)SQL embedding

In this subsection we look at the different roles SQL play across the three tiers of database design. Wediscuss the areas in which SQL is lacking and how those difficiencies can be complemented by embedding SQL in other languages.

2.1.25. IndicesLow/Internal levelIndex by one attributeFor queries selecting by that attribute:

Page 37: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

37 of 84 25-2-07 4:23 pm

Faster tuple access (ordered tuples)Reduces database memory load

Small cross product relation, only crosses requisitesAccelerates query resolution time

CREATE INDEX Index_Name ON RELATION(Attribute);

2.1.26. Aggregate functionsRun over groups of tuplesTakes a projected attribute list as an argumentProduce relation with single tupleSUM, MAX, MIN, AVG, COUNTe.g. AggFunc over all tuples

SELECT SUM(SALARY), MAX(SALARY), MIN(SALARY)FROM EMPLOYEE;

Single attribute lists (distinct values)Multi-attribute lists (granularity of distinct values by pairing)

2.1.27. Aggregates over sub-bagsCan run over subsets of tuplesGROUP BY keywordSpecifies the grouping attributesNeed to also appear in projected attr_listShow result along side value for group attre.g. AggFunc over subgroups

SELECT dno, COUNT(*)FROM employeeGROUP BY dno

Quick SQL check, do all attributes in the SELECT projection list appear in the GROUP BY projection list.

2.1.28. Creating viewsViews are partial projectionsVirtual relations, or views of live relationsUpdate synchronised

CREATE VIEW <virtual_relname>AS <real_relation>

Real relation could be a query resultClever bit is the change propagationUPDATEs made to the view dataset are flooded back to relations

INSERT and DELETE behaviour needs to be definedNon-trivial as INSERT into view (virtual relation) may leave holes in real relation

2.1.29. Embedding SQL

Page 38: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

38 of 84 25-2-07 4:23 pm

SQL (alone) can do lots of clever things in one expressionBut can only execute a single expressionCan structure SQL commands into proper programming languagesJava Database Connection (JDBC)

javac, then java VMCOBOL, C or PASCAL

precompiled with PRO*COBOL or PRO*CProcedure Language (PL/SQL)

Oracle/MySQL procedural languageStored procedures can take parameters

2.2. InternalsIn this lecture we look at...[Section notes PDF 180Kb]

2.2.01. IntroductionDatabase internals (base tier)RAID technology

Reliability and performance improvementRecord and field basicsHeaders to hashingIndex structures

2.2.01b. Machine architecture (by distance)Distance from chip determines minimum latencySpeed of light is a constantImpact of bus frequencies

IDE (66,100,133 Hz)PCI, PCI-X (66,100,133 Hz)PCI Express (1Ghz to 12Ghz)

Impact of bus bandwidthsPCI (32/64 bit/cycle, 133MB/s)PCI Express (x16 8.0GB/s)

Page 39: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

39 of 84 25-2-07 4:23 pm

Here's a link from Intel showing a machine architecture with signal bandwidths: Intel diagram

2.2.01c. Machine architecture (by capacity)Capacity increased with distanceStaged architecture as compromiseSpeed, time/distanceAlso cost, heat, usage scale

2.2.02. Database internalsStored as files of records (of data values)

Auxiliary data structures/indices1y and 2y storage

memory hierarchy (pyramid diagram)volatility

Online and offline devicesPrimary file organisation, records on disk

Heap - unorderedSorted - ordered, sequential by sort keyHashed - ordered by hash keyB-trees - more complex

Page 40: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

40 of 84 25-2-07 4:23 pm

2.2.03. Disk fundamentalsDBMS task

linked to backup1y, 2y and 3y

e.g. DLT tapeChanging face of current technology

Impact of inexpensive harddisksFlash memory devices (CF, USB)

Random versus sequential accessLatency (rotational delay) andBandwidth (data transfer rate)

2.2.04. RAID technologyRedundant Array of Independent DisksData striping

Blocks (512 bytes), bits and transparencyReliability (1/n)

Mirroring/shadowingError correction codes/parity

Performance (n)Mirroring (2x read access)Multiple parallel access

2.2.05. RAID levels0 No redundant data1 Disk mirrors (performance gain)2 Hamming codes (also detect)3 Single parity disk4 Block level striping5 and parity/data distribution6 Reed-Soloman codes

2.2.06. Records and fieldsDBMS specific, generallyRecords (tuples) comprise fields (attributes)File is a sequence of recordsVariable length records

Variable length fieldsMulti-valued attributes/repeating fieldsOptional fieldsMixed file of different record types

2.2.07. Fields

Page 41: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

41 of 84 25-2-07 4:23 pm

records -> files -> disksFixed length for efficient accessNetworking issuesDelimit variable length fields (max)Explicit record/field lengthsSeparators (,;,:,$,?,%)Record headers and footersSpanning

block boundaries and redundancy

2.2.08. Primary organisationBias data manipulation to 1y memory

Load record to 1y, write backCache theorem

Data storage investment, rapidity of accessoptimisations based on frequent algorithmic use

Ordering, ordering field/key fieldHashing

2.2.09. Indexes/indicesAuxiliary structures/secondary access pathSingle level indexes (Key, Pointer)File of recordsOrdering key fieldPrimary, Secondard and Clustering

2.2.09b. Primary index examplePrimary index on simple tableOrdering key field (primary key) is IntegerPointers as addressesSparse, not dense

2.2.10. Primary Index file (as pairs list)

Page 42: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

42 of 84 25-2-07 4:23 pm

Two fields <K(i),P(i)>Ordering key field and pointer to blockSecond example, indexing candidate key Surname

K(1)="Barnes",P(1) -> block 1Barnes record is first/anchor entry in block 1

K(2)="Smith",P(2) -> block 6K(3)="Zeta",P(3) -> block 8

Dense (K(i) for every record), or SparseEnforce key constraint

2.2.10b. Clustering index exampleClustering indexOrdering key field (OKF) is non-keyEach entry points to multiple records

2.2.11. Clustering Index (as pairs list)Two fields <K(i),P(i)>Ordering non-key field and pointer to block

Internal structure e.g. linked list of recordsEach block may contain multiple records

K(1)="Barnes",P(1) -> block 1K(2)="Bates",P(2) -> block 2K(3)="Zeta",P(3) -> block 3

K(i) not required to havea distinct value for each recordnon-dense, sparse

2.2.11b. Secondary Index exampleIndependent of primary orderingCan't use block anchorsNeeds to be dense

Page 43: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

43 of 84 25-2-07 4:23 pm

2.2.12. More indicesSingle level index

ordered index filelimited by binary search

Multi level indicesbased on tree data structures (B+/B-trees)

faster reduction of search space (logfobi)

2.2.13. IndicesDatabase architecture

Intension/extensionIndexes separated from data file

Created/disgraded dynamicallyTypically 2y to avoid reordering records on disk

2.2.14. Query optimisationFaster query resolution

improved performancelower loadhardware cost:performance ratio

Moore's lawQuery process chainQuery optimisation

2.2.15. Query processingCompile-track familiarity

Scanner/tokeniser - break into tokensParser - semantic understanding, grammarValidated - check attribute names

Query treeExecution strategy, heuristic

Query optimisationIn (extended relational) canonical algebra form

Page 44: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

44 of 84 25-2-07 4:23 pm

2.2.16. Query optimisationSQL query

SELECT lname, fnameFROM employeeWHERE salary > (

SELECT MAX(salary)FROM employeeWHERE dno=5

);Worst-case

Process inner for each outerBest-baseCanonical algrebraic form

2.2.16b. Query optimisation implementationIndexing accelerates query resolutionClosed comparison (intra-tuple)

all variables/attributes within single tuplee.g. x < 100

Open comparison (inter-tuple)variables span multiple tuples

Essentially a sorting problemInternal sorting covered (pre-requisites)Need external sort for non-cached lists

2.2.17. Query optimisationExternal sorting

Stems from large disk (2y), small memory (1y)Sort-merge strategy

Sort runs (small sets of total data file)Then merge runs back together

Used inSELECT, to accelerate selection (by index)PROJECT, to eliminate duplicatesJOIN, UNION and INTERSECTION

3. Database DesignThis is the Database Design course theme.[Complete set of notes PDF 295Kb].

3.1. Functional DependencyIn this lecture we look at...

Page 45: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

45 of 84 25-2-07 4:23 pm

[Section notes PDF 64Kb]

3.1.01. IntroductionWhat is relational design?

Notion of attribute distributionConceptual-level optimisation

How do we asses the quality of a design?

3.1.02. Value in designAllocated arbitrarily by DBD under ER/EERGoodness at

Internal/storage level (base relations only)Reducing nulls - obvious storage benefits /frequentReducing redundancy - for efficient storage/anomalies

Conceptual levelSemantics of the attributes /single entity:relationNo spurious tuple generation /no match Attr,-PK/FK

3.1.03. Initial stateDatabase designUniversal relation

R = {A1, A2, …, An}Set of functional dependencies F

Decompose R using F toD = {R1, R2, …, Rn}D is a decomposition of R under F

3.1.05. AimsAttribute preservation

Union of all decomposed relations = originalLossless/non-additive join

For every extension, total join of r(Ri) yeilds r(R)no spurious/erroneous tuples

3.1.06. Aims (also)Dependency preservation

Constraints on the databaseX Y in F of R, appearsdirectly in Ri

Attributes X È Y allcontained in Ri

Page 46: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

46 of 84 25-2-07 4:23 pm

Each relation Ri in 3NF

But what’s a dependency?

3.1.07. Functional dependencyConstraint between two sets of attributes

Formal method for grouping attributesDB as one single universal relation/-literal

R = {A1,A2,…,An}Two sets of attributes, X ÍR,Y Í R

Functional dependency (FD or f.d.) X face=Arial> Y

If t1[X] = t2[X], then t1[Y] = t2[Y]Values of the Y attribute depend on value of XX functionally determines Y, not reverse necessarily

3.1.08. Dependency derivationRules of inferencereflexive: if X Ê Y then X face=Symbol>® Yaugment: {X Y} XZ face=Symbol>® YZtransitive: {X Y,Y face=Symbol>® Z} X face=Arial> Z

Armstrong demonstrated completefor closures

3.1.09. Functional dependencyIf X is a key (primary and/or candidate)

All tuples ti have a unique value for XNo two tuples (t1,t2) share a value of X

Therefore X YFor any subset of attributes Y

ExamplesSSN {Fname, Minit, Lname}{Country of issue, Driving license no} face=Arial> SSNMobile area code Mobilenetwork (not anymore)

Page 47: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

47 of 84 25-2-07 4:23 pm

3.1.10. ProcessTypically start with set of f.d., F

determined from semantics of attributesThen use IR1,2,3 to infer additional f.d.sDetermine left hand sides (Xs)

Then determine all attributes dependent on XFor each set of attributes X,

determine X+ :the set of attributes f.d’ed by X on F

3.1.11. AlgorithmCompute the closure of X under F: X+

xplus = x;do

oldxplus = xplus;for (each f.d. Y Z in F)

if (xplus Ê Y) then xplus= xplus È Z;

while (xplus = = oldxplus);

3.1.12. Function dependencyConsider a relation schema R(A,B,C,D) and a set F of functionaldependencies

Aim to find all keys (minimal superkeys),by determining closures of all possible X subsets of R’s attributes,e.g.

A+, B+, C+, D+,AB+, AC+, AD+, BC+, BD+, CD+ABC+, ABD+, BCD+ABCD+

3.1.13. Worked exampleLet R be a relational schema R(A, B, C, D)Simple set of f.d.sAB C, C face=Symbol>® D, D face=Arial> ACalculate singletons

A+, B+, C+, D+,Pairs

Page 48: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

48 of 84 25-2-07 4:23 pm

AB+, AC+,…Triples

and so on

3.1.14. Worked exampleCompute sets of closures

AB C, C face=Symbol>® D, D face=Arial> A

1. SingletonsA+ AB+ BC+ CDAD+ AD

Question: are any singletons superkeys?

3.1.15. F.d. closure example2. Pairs (note commutative)

AB+ ABCDAC+ ACDAD+ ADBC+ ABCDBD+ ABCDCD+ ACD

Superkeys?

3.1.16. F.d. closure example3. Triples

ABC+ ABCDABD+ ABCDBCD+ ABCD

Keys?

4. Quadruples

Page 49: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

49 of 84 25-2-07 4:23 pm

ABCD+ ABCD

3.1.17. F.d. closure summarySuperkeys:

AB, BC, BD, ABC, ABD, BCD, ABCDMinimal superkeys (keys)

AB, BC, BD

3.2. Normal FormsIn this lecture we look at...[Section notes PDF 121Kb]

3.2.01. Orthogonal designInformation Principle:

The entire information content of the database is represented in oneand only one way, namely as explicit values in column positions intables

Implies that two relations cannot have the same meaningunless they explicitly have the same design/attributes (includingname)

3.2.02. NormalizationReduced redundancyOrganised data efficientlyImproves data consistency

Reduces chance of update anomaliesData duplicated, then updated in only one location

Only duplicate primary keyAll non-key data stored only once

Data spread across multiple tables, instead of one Universal relation R

3.2.03. Good or bad?Depends on ApplicationOLTP (Transaction processing)

Lots of small transactionsNeed to execute updates quickly

OLAP (Analytical processing/DSS)Largely Read-onlyRedundant data copies facilitate Business Intellegence applications,e.g. star schema (later)

3NF considered ‘normalised’

Page 50: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

50 of 84 25-2-07 4:23 pm

save special cases

3.2.04. Normal forms (1NF)First Normal form (1NF)

Disallows multivalued attributesPart of the basic relational model

Domain must include only atomic valuessimple, indivisible

Value of attribute-tuple in extension of schemat[Ai] Î dom(Ai)

3.2.05. Normalisation (1NF)Remove fields containing comma separated listsMulti-valued attribute (AMV) of RiCreate new relation (RNEW)

with FK to Ri[PK]RNEW(UID, AMV, FKI)

3.2.06. Normalisation (1NF)On weak entity

On strong entity

3.2.07. Normal forms (2NF)A relation Ri is in 2NF if:

Every nonprime attribute A in Ri isfully functionally dependent on 1y key of R

Page 51: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

51 of 84 25-2-07 4:23 pm

If all keys are singletons, guaranteed

If Ri has composite key areall non-key attributes fully functionally dependenton all attributes of composite key?

3.2.08. Normal forms (2NF)Second normal form (2NF)

Full functional dependency X face=Arial> Y

A Î X, (X - {A}) �Y

If any attribute A is removed from XThen X Y no longer holds

Partial functional dependencyA Î X, (X - {A}) face=Symbol>® Y

3.2.09. Normal forms (2NF)In context

Not 2NF: AB face=Arial> C, A C

AB C is not in 2NF,because B can be removed

Not 2NF: AB CDE, B face=Symbol>® DE

because attributes D&E are dependent on part of the composite key(B of AB), not all of it

3.2.10. Normalisation (2NF)Split attributes not depended on all of the primary key into separate relations

Page 52: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

52 of 84 25-2-07 4:23 pm

3.2.11. Normal forms (BCNF)Boyce-Codd Normal form (BCNF)

Simpler, stricter 3NFBCNF 3NF3NF does not imply BCNF

nontrivial functional dependency X face=Arial> YThen X must be a superkey

3.2.12. Normal forms (3NF)Third Normal form (3NF)Derived/based on transitive dependencyFor all nontrivial functional dependenciesX face=Arial> AEither X must be a superkeyOr A is a prime attribute(member of a key)

3.2.13. Normal forms in contextAB C, C face=Symbol>® D, D face=Arial> AIn context

3NF? YesBecause AB is a superkey andD and A are prime attributes

BCNF? NoBecause C and D are not superkeys(even though AB is)

3.2.14. Normalisation (3NF)CarMakes not in 3NF because:

singleton key Anon-trivial fd B C

B not superkey, C not prime attribute

Page 53: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

53 of 84 25-2-07 4:23 pm

3.3. OODBIn this lecture we look at...[Section notes PDF 34Kb]

3.3.01. IntroductionDatabase architectures, beyondWhy OODBMS?ObjectStoreCORBA object distribution standard

3.3.02. Large DBMSComplex entity fragmentation

across many relationsBreaks the miniworld-realworld dichotomyRequires conceptual abstracting layerDifficult to retrieve all information for xCompounded by version control

3.3.03. Object orientationObject components (icv triples)

Object Identity (OID), Ireplaces primary key

Type constructor, chow the object state is constructed from sub-compe.g. atom, tuple (struct), set, list, bag, array

Object state, vObject behaviour/action

3.3.04. Desirable featuresEncapsulation

Abstract data types

Page 54: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

54 of 84 25-2-07 4:23 pm

Information hidingObject classes and behavior

Defined by operations (methods)Inheritance and hierarchiesStrong typing (no illegal casting)

don’t think about inheritance just yet

3.3.05. Desirable featuresPersistence

Objects exist after terminationNaming and reachability mechanismLate binding in Java

Performanceuser-def functions executed on server, not client

Extension into relational modelDomains of objects, not just valuesDomain hierarchies etc.

3.3.06. Desirable featuresPolymorphism

aka. operator overloadingsame method name/symbolmultiple implementations

Easy link to Programming languagespopular OO language like Java/C++Better than PL/SQL integrationMuch better than PL-JDBC integration!

Highly suitable for multimedia data

3.3.07. Undesirable featuresObject Ids

Artificial Real-Mini, double edged swordExposes inner workings (suspended abstraction)

Lack of integrity constraintsNo concept of normalisation/formsExtreme encapsulation

e.g. creation of many accessor/mutator methodsLack of (better) standardisation

3.3.08. More undesirablesOriginally no mechanism for specifyingrelationships between objectsIn RDBMS – relationships are tuplesIn OODBMS – relationships should be properties of objects

Page 55: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

55 of 84 25-2-07 4:23 pm

3.3.09. ObjectStorePackages supporting Java or C++C++ package uses:

C++ class definition for DDLinsert(e), remove(e) and create collections, for DML

Bidirectional relationship facilityPersistency transparency

Identical pointers to persistent and transient objects

3.3.10. CORBACommon Object Request Broker ArchitectureObject communication (unifying paradigm)

DistributedHeterogeneousNetwork, OS and language transparency

Java implementation org.omg.CORBAAlso C, C++, ADA, SMALLTALK

3.4. Type Inheritance and EER diagramsIn this lecture we look at...[Section notes PDF 117Kb]

3.4.01. IntroductionDesign/schema side (Entity types)Object-orientated concepts

Java, C++ or UMLSub/superclasses and inheritance

EER diagramsEER to Relational mapping

3.4.02. OOInheritance concept

Attributes (and methods)Subtypes and supertypesSpecialisation and GeneralisationER diagrams

show entities/entity setsEER diagrams

show type inheritance

additional 8th step to ER->Relational mapping

Page 56: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

56 of 84 25-2-07 4:23 pm

3.4.03. ObjectsBasic guide to JavaObject, classes as blueprintsObject, collection of methods and attributesMiniworld model of real world thingsObject, entity in database terms

3.4.04. AbstractSimilar objectsCar Park exampleStudent exampleShared properties/attributesGeneralisationReverse, specialisation

3.4.05. RelationshipsUsing English as model‘Is a’ (inheritance)‘Has a’ (containment)Nouns as objectsVerbs as methodsAdjectives as variables (sort of)

3.4.06. ClassesSuperclasses (Student)Subclasses (Engineer, Geographer, Medic)InheritanceSubclass inherits superclass attributes

Union of specific/local and general attributesInheritance chains

Person -> Student -> Engineer -> Computer Scientist

3.4.07. EER Fruit examplePartial participationDisjoint subclassesA fruit may be either a pear or an apple or a banana, or none of them. A fruit may not be a pear and a banana, an apple and a banana, an apple and a pear....

Page 57: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

57 of 84 25-2-07 4:23 pm

3.4.08. EER Wine exampleTotal, disjointEquivalent to Java Abstract classesA Wine has to be either Red, White or Rosé cannot be both more

3.4.09. More extended (EER)Specialisation lattices

and HierarchiesMultiple inheritanceUnion of two superclasses (u in circle)In addition to basic ER notation

3.4.10. EER diagramatic notationSubset symbol to illustratesub/superclass relationshipdirection of relationshipCircle to link super to subclasses

DisjointOverlappingUnion

3.4.11. Disjointness constraintDisjointness (d in circle) – single honours

Page 58: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

58 of 84 25-2-07 4:23 pm

Overlapping (o in circle) – joint honours/sportsMembership condition on same attribute

attribute-defined specialisationdefining attributeimplies disjointness

versus user-definedeach entity type specifically defined by user

3.4.12. Completeness constraintTotal specialisation

Every entity in the superclass must be a member of atleast 1 subclassDouble line (as ER)

Partial specialisationSome entites may belong to atleast 1 subclass, or none at allSingle line

Yields 4 possibilities(Total-Dis, Total-Over, Partial-Dis, Partial-Over)

3.4.13. EER Chip exampleTotal, overlappingA Chip may has to be at least one of FPA Unit, Reg Block, L1 Cache, and may be more than one type

3.4.14. EER Multiple inheritanceType hierarchiesSpecialisation latticesWell, sir, the Supreme Court of the United States has determined that the tomato is for legal and commercial purposes both a fruit and a vegetable. So we can legally refer to tomato juice as 'vegetable' juice.

Page 59: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

59 of 84 25-2-07 4:23 pm

Candice, General Foods

3.4.15. EER to Relational MappingInitially following 7 ER stagesStage 84 different options

Optimal solution based on problemLet C be superclass, S1…m subclasses

3.4.16. Stage 8Create relation for C, and relations for S1..m each with a foreign key to C(primary key)Create relations for S1..m each including all attributes of C and its primary key

3.4.17. Stage 8Create a single relation including all attributes of C u S1..m and a type/discriminating attribute

only for disjoint subclassesCreate a single relation as above, but include a boolean type flag for each subclass

works for overlapping, and also disjoint

3.5. System DesignIn this lecture we look at...[Section notes PDF 64Kb]

3.5.01. Databases in ApplicationWhere’s the data?Programmer driven futureOODBMS limitationsRDBMS longevity

Page 60: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

60 of 84 25-2-07 4:23 pm

System design byData store, delivery, interface

Case study

3.5.02. Where’s the data?Previously covered distance from User to Data (and reason for it)Client-Server data model creates DBMS

P2P alternativeAccountabilityDistribution (BitTorrent, eDonkey)Caching

3.5.03. Where’s the data?Answer: everywhereBut where is it meaningful?Answer: for whom?

Quality paradigmLarge projects require large teams

Team overhead (ref 2nd year)Code responsibilitiesData/data model resp.Object responsibilities

3.5.04. Web application data supportWeb application programmingGoal, dynamically produced XHTMLClient side designer-programmer split

CSS, XHTMLServer side programmer-programmer split

Old school: query design, integratorNew school: MVC (Model-View-Controller)

Controller – user inputModel – modelling of external worldView – visual feedback

Page 61: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

61 of 84 25-2-07 4:23 pm

3.5.05. CMSContent Management Systempart of other coursesCMS is a DBMSZope/Plone and ZODBe107, Drupal and SeagullZend MVC Framework (pre-beta)

3.5.06. OODBMS limitationsFuture unknownRDBMS supports

Application data sharingPhysical/logical data independence/viewsConcurrency controlConstraints

at inception these requirements not knownRDBMS mathematical basis àextensibleCrude Type Inheritance (see EER mapping)OODBMS as construction kit

Page 62: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

62 of 84 25-2-07 4:23 pm

3.5.07. Weaknesses in RDBMSData type supportUnwieldy, created 3VL (nulls)Type Inheritance and RelationshipsTuple:Entity fragmentation

not to be confused with ‘fragmentation’Entity approximation requires joins

3.5.08. System designClient specificationsVariance amongst Mobile devicesRich-media Content deliveryWhere’s the data? (M – media database)Where’s it going? (C – mobile browser)How’s it going to get there? (query design)What’s it going to look like? (V – XHTML)

3.5.09. Muddy bootsThe real world of databasesMassive Excel spreadsheetsAccess MigrationNormalisationUpdate implicationsVisual language of the Internet limitationsFuture of browser components

4. Distributed systemsThis is the Distributed systems course theme.

[Complete set of notes PDF 109Kb]

4.1. TransactionIn this lecture we look at...[Section notes PDF 86Kb]

4.1.01. Distributed DatabasesTransactionsUnpredictable failure

Commit and rollback

Page 63: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

63 of 84 25-2-07 4:23 pm

Stored proceduresBrief PL overview

Cursors

4.1.02. TransactionsReal world database actionsRarely single stepFlight reservation example

Add passenger details to rosterCharge passenger credit cardUpdate seats availableOrder extra vegetarian meal

4.1.03. TransactionsReal world database actionsRarely single stepFlight reservation example

Add passenger details to rosterCharge passenger credit cardUpdate seats availableOrder extra vegetarian meal

4.1.04. Desirable properties of transactionsACID testAtomicity

transaction as smallest unit of processingtransactions complete entirely or not at all

consequences of partial completion in flight example

Consistencycomplete execution preserves database constrained state/integritye.g. Should a transaction create an entity with a foreign key then thereference entity must exist (see 4 constraints)

4.1.05. ACID test continuedIsolation

not interfered with by any other concurrent transactions

Page 64: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

64 of 84 25-2-07 4:23 pm

Durable (permanency)commited changes persist in the database, not vulernable to failure

4.1.06. CommitNotion of Commit (durability)Transaction failures

From flight reservation exampleAdd passenger details to rosterCharge passenger credit cardUpdate seats available: No seats remainingOrder extra vegetarian meal

Rollback

4.1.07. PL/SQL overviewLanguage format

DeclarationsExecutionExceptionsHandling I/OFunctionsCursors

4.1.08. PL/SQLBlocks broken into three parts

DeclarationVariables declared and initialised

ExecutionVariables manipulated/actioned

ExceptionError raised and handled during exec

4.1.09. DeclarationDECLARE

age NUMBER;name VARCHAR(20);surname employee.fname%TYPE;addr student.termAddress%TYPE;

4.1.10. ExecutionBEGIN (not in order)

Page 65: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

65 of 84 25-2-07 4:23 pm

/* sql_statements */UPDATE employee SET salary = salary+1;

/* conditionals */IF (age < 0) THEN

age: = 0;ELSE

age: = age + 1;END IF;

/* transaction processing */COMMIT; ROLLBACK;

/* loops */ /* cursors */[END;] (if no exception handling)

4.1.11. Exception passingBeginnings of PL I/OCREATE TABLE temp (logmessage varchar(80));

Can create transfer/bridge relation outside

Within block (e.g. within exception handler)WHEN invalid_age THEN

INSERT INTO temp VALUES( ‘Cannot have negative ages’);END;

SELECT * FROM temp;To review error messages

4.1.12. Exception handlingDECLARE

invalid_age exception;

BEGINIF (age < 0) THEN

RAISE invalid_ageEND IF;

EXCEPTION

Page 66: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

66 of 84 25-2-07 4:23 pm

WHEN invalid_age THENINSERT INTO temp VALUES( ‘Cannot have negative ages’);

END;

4.1.13. CursorsCursors

Tuple by tuple processing of relationsThree phases (two)

DeclareUseException (as per normal raise)

4.1.14. ImpactPL blocks coherently change database stateNo runtime I/ODifficult to debugSQL tested independently

4.1.15. PL CursorsDECLAREname_attr EMPLOYEE.NAME%TYPE;ssn_attr EMPLOYEE.SSN%TYPE;/* cursor declaration */CURSOR myEmployeeCursor IS

SELECT NAME,SSN FROM EMPLOYEEWHERE DNO=1FOR UPDATE;

emp_tuple myEmployeeCursor%ROWTYPE;

4.1.16. Cursors executionBEGIN/* open cursor */OPEN myEmployeeCursor;/* can pull a tuple attributes into variables */FETCH myEmployeeCursor INTO name_attr,ssn_attr;/* or pull tuple into tuple variable */FETCH myEmployeeCursor INTO emp_tuple;CLOSE myEmployeeCursor;

[LOOP…END LOOP example on handout]

Page 67: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

67 of 84 25-2-07 4:23 pm

4.1.17. IntroductionConcurrent transactionsDistributed databases (DDB)FragmentationDesirable transaction propertiesConcurrency control techniques

LockingTimestamps

4.1.18. NotationLanguage

PL too complex/long-windedSimplified database model

Database as collection of named itemsGranularity, or size of data itemDisk block based, each block X

Basic transaction language (BTL)read_item(X);write_item(X);Basic algebra, X=X+N;

4.1.19. Transaction processingDBMS Multiuser system

Multiple terminals/clientsSingle processor, client side execution

Single centralised databaseMultiprocessor, serverResolving many transactions simultaneously

Concurrency issueCoverage by previous courses (e.g. COMS12100)PL/SQL scripts (Transactions) as processes

Interleaved execution

4.1.20. TransactionsTwo transactions, T1 and T2Overlapping read-sets and write-setsInterleaved executionConcurrency control requiredPL/SQL example

Commit; and rollback;

4.1.21. Concurrency issues

Page 68: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

68 of 84 25-2-07 4:23 pm

Three potential problemsLost updateDirty readIncorrect summary

All exemplified using BTLTransaction diagrams to make clearerC-like syntax for familiarityMany possible examples of each problem

4.1.22. Lost updateT1read_item(X);X=X-N;

write_item(X);read_item(Y);

Y=Y+N;write_item(Y);T2

read_item(X);X=X+M;

write_item(X);

Page 69: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

69 of 84 25-2-07 4:23 pm

4.1.23. Dirty read (or Temporary update)T1read_item(X);X=X-N;write_item(X);

<T1 fails><T1 rollback>

read_item(X);X=X+N;write_item(X);T2

read_item(X);X=X+M;write_item(X);

4.1.24. Incorrect summaryT1

read_item(X);X=X-N;write_item(X);

Page 70: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

70 of 84 25-2-07 4:23 pm

read_item(Y);Y=Y-N;write_item(Y);T2sum=0;read_item(A)sum=sum+A;

read_item(X);sum=sum+X;read_item(Y);sum=sum+Y;

4.1.25. SerializabilitySchedule S is a collection of transactions (Ti)Serial schedule S1

Transactions executed one after the otherPerformed in a serial orderNo interleavingCommit or abort of active transaction (Ti) triggersexecution of the next (Ti+1)If transactions are independent

all serial schedules are correct

4.1.26. SerializabilitySerial schedules/histories

No concurrencyUnfair timeslicing

Non-serial schedule S2 of n transactionsSerializable if

equivalent to some serial schedule of the same n transactionscorrect

Page 71: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

71 of 84 25-2-07 4:23 pm

n! serial schedules, more non-serial

4.1.27. DistributionDDB, collection of

multiple logically interrelated databasesdistributed over a computer networkDDBMS

Multiprocessor environmentsShared memoryShared diskShared nothing

4.1.28. AdvantagesDistribution transparency

Multiple transparency levelsNetworkLocation/dept autonomyNamingReplicationFragmentation

Reliability and availabilityPerformance, data localisationExpansion

4.1.29. FragmentationBreaking the database into

logical unitsfor distribution (DDB design)

Global directory to keep track/abstractFragmentation schema/allocation schema

RelationalHorizontal

Derived (referential), complete (by union)VerticalHybrid

4.1.30. Concurrency control in DDBsMultiple copiesFailure of individual sites (hosts/servers)Failure of network/links

Page 72: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

72 of 84 25-2-07 4:23 pm

Transaction processingDistributed commitDeadlock

Primary/coordinator site - voting

4.1.31. Distributed commitCoordinator electedCoordinator prepares

writes log to disk, open sockets, sends out queriesProcess

Coordinator sends ‘Ready-commit’ messagePeers send back ‘Ready-OK’Coordinator sends ‘Commit’ messagePeers send back ‘Commit-OK’ message

4.1.32. Query processingData transfer costs of query processing

Local biasHigh remote access costVast data quantities to build intermediate relations

DecompositionSubqueries resolved locally

4.1.33. Concurrency controlMust avoid 3+ problems

Lost update, dirty read, incorrect summaryDeadlock/livelock - dining example

Data item granularitySolutions

Protocols, validationLockingTimestamps

4.1.34. Definition of termsBinary (two-state) lockslocked, unlocked associated with item XMutual exclusionFour requirements

Must lock before accessMust unlock after all accessNo relocking of already lockedNo unlocking of already unlocked

Page 73: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

73 of 84 25-2-07 4:23 pm

4.1.35. DefinitionMultiple mode lockingRead/write locksaka. shared/exclusive locksLess restrictive (CREW)read_lock(X), write_lock(X), unlock(X)

e.g. acquire read/write_locknot reading or writing the lock state

4.1.36. Rules of Multimode locksMust hold read/write_lock to readMust hold write_lock to writeMust unlock after all accessCannot upgrade/downgrade locks

Cannot request new lock while holding one

Upgrading permissable (read lock to write)if currently holding sole read access

Downgrading permissable (write lock to read)if currently holding write lock

4.2. ConcprotoIn this lecture we look at...[Section notes PDF 37Kb]

4.2.01. IntroductionConcurrency control protocolsConcurrency techniques

Locks, Protocols, TimestampsMultimode locking with conversion

Guarenteeing serializabilityAssociated costTimestamps and ordering

4.2.02. Guarenteeing serializability

Page 74: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

74 of 84 25-2-07 4:23 pm

Two phase locking protocol (2PL)Growing/expanding

Acquisition of all locksOr upgrading of existing locks

ShrinkingRelease of locksOr downgrading

Guarentees serializabilityequivalency without checking schedules

4.2.03. A typical transaction pairT1

read_lock(Y);read_item(Y);unlock(Y);

write_lock(X);read_item(X);X=X+Y;write_item(X);unlock(X);

4.2.04. 2PL: Guaranteed serializableT1

read_lock(Y);read_item(Y);write_lock(X);unlock(Y);read_item(X);X=X+Y;write_item(X);unlock(X);

4.2.05. Guarantee cost

Page 75: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

75 of 84 25-2-07 4:23 pm

T2 ends up waiting for read access to XEither after T1 finished

T1 cannot release X even though it has finished using itIncorrect phase (still expanding)

Or before T1 has used itT1 has to claim X during expansion, even if it doesn’t useit until later

Cost: limits the amount of concurrency

4.2.06. AlternativesConcurrency control

Locks limit concurrencyBusy waiting

Timestamp ordering (TO)Order transaction execution

for a particular equivalent serial scheduleof transactions ordered by timestamp value

Note: difference to lock serial equivalentNo locks, no deadlock

4.2.07. TimestampsUnique identifier for transaction (T)Assigned in order of submission

Timelinear time, current date/sys clock - one per cycle

Countercounter, finite bitspace, wrap-around issues

Timestamp aka. Transaction start timeTS(T)

4.2.08. TimestampingDBMS associates two TS with each item

Read_TS(X): gets read timestamp of item Xtimestamp of most recent successful read on X= TS(T) where T is youngest read transaction

Write_TS(X): gets write timestamp of item Xas for read timestamp

Page 76: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

76 of 84 25-2-07 4:23 pm

4.2.09. TimestampingTransaction T issues read_item(X)

TO algorithm compares TS(T) with Write_TS(X)Ensures transaction order execution not violated

If successful, Write_TS(X) <=TS(T)

Read_TS(X) = MAXTS(T), current Read_TS(X)If fail, Write_TS(X) > TS(T)

T aborted, rolled-back and resubmitted with new TSCascading rollback

4.2.10. TimestampingTransaction T issues write_item(X)

TO algorithm compares TS(T) with Read_TS(X)and compares TS(T) withWrite_TS(X)

If successful, op_TS(X) <= TS(T)Write_TS(X) = TS(T)

If fail, op_TS(X) > TS(T)T aborted, cascade etc.

All operations focus on not violating the execution order defined by thetimestamp ordering

4.2.11. UpdatesInsertion

2PL: DBMS secures exclusive write-lockTOA: op_TS(X) set to TS(creating transaction)

Deletion2PL: as insertTOA: waits to ensure later transactions don’t access

Phantom problemRecord being inserted matches inclusion conditionsof another transaction(e.g. selection by dno=5)Locking doesn’t guarantee inclusion

(need index locking)

5. Real world

Page 77: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

77 of 84 25-2-07 4:23 pm

This is the Real world course theme.

[Complete set of notes PDF 206Kb]

5.1. WebIn this lecture we look at...[Section notes PDF 133Kb]

5.1.01. Databases for the InternetPath from DB to UserInformation flowData formats (OO)Format transitionsLimitations/channel

right nowThe Future

Page 78: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

78 of 84 25-2-07 4:23 pm

5.1.02. OOObject orientated approach

Consistent/optimised development modelGood approximation of real worldCloser link to mini-world

Java and PHPDB persistenceUML

5.1.03. Java and PHP in contextJava

JSP (server-side)Javascript (client-side)

PHPServer side only

JSON or XMLObject communication

Ideal scenarioJava – load times

5.1.04. In a perfect worldHomogenous data format/data modelDB stores objects insteadObjects transferred

RobustLightweightFastConsistent (more later in Transactions)Caching

5.1.05. In the real worldHeterogenous data modelObject translation/wrappersDifferent languages features at different layersMinimal subset of OO functionality available end-to-endGoing to look at information flow/functionality provided

5.1.06. UserLimitations of being human

short term memorylong term familiarity

language of the Internet

Page 79: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

79 of 84 25-2-07 4:23 pm

hypertext linkingform filling

Advantages of being humanimpatience, no waitingwants instant response

5.1.07. BrowserHttp requestsForms

PostGet

Form fieldsBy name, by IDHidden

Javascript/DOM tree

5.1.08. InternetCommunication mediumGood for transferring dataNot good for transforming datae.g. Light in aire.g. Signal over CAT5e/UTP cable

5.1.09. Web serverStraight HTML pagesDynamic HTML pages

PHP exampleJSP example

As above with RDBMS integrationPHP PDO example

As above with ObjectsPHP DBDO example

5.1.10. load, edit, submit, act timeline

Page 80: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

80 of 84 25-2-07 4:23 pm

5.1.11. href click, versus form postProtocol stackBasic up-downShortcutsBrowser cacheWeb server

assembled page cachephp object cache

DB optimised queries

Page 81: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

81 of 84 25-2-07 4:23 pm

5.1.12. Examples from the webGoogle Maps

linkCar selector and Dealer locator

link

5.2. Decision SupportIn this lecture we look at...[Section notes PDF 85Kb]

5.2.01. IntroductionDecision support systems (DSS)Duplicates of live systems, historical archivingPrimarily read-onlyLoad and refresh operationsIntegrity

Page 82: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

82 of 84 25-2-07 4:23 pm

Assumptions about initial dataLarge, indexed, redundancy

5.2.02. DSS ManagementDesign

LogicalTemporal keys, required to distinquish historical data (since:tocurrent & during:within interval)

Physical (Hash indexes, Bitmap indexes)Controlled Redundancy

Synchronisation/update propogationSynchronous (update driven)Asynchronous (query driven)

5.2.03. Data PreparationExtract

pulling from live database system(s)CleansingTransformation and Consolidation

migrating from live or legacy system design

to DSS designLoad (DSS live/query-able)Refresh (latest update)

5.2.04. QueryingBoolean expression complexity

heavy WHERE clausesJoin complexity

Normalised databases, many tablesFacts distributed across tablesJoins required to answer complex questions

Function and Analytic complexityOften require non-DBMS functionsSmaller queries with interleaved code

5.2.05. Data WarehouseSpecific example of DSSSubject-orientated

e.g. customers/productsNon-volatile

once inserted, items cannot be updated

Page 83: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

83 of 84 25-2-07 4:23 pm

Time variantTemporal keys

Accuracy and granularity issues

5.2.06. DB Company organisationBy example

5.2.07. Dimensional SchemaConsider product, customer, sales dataEach sale represents a specific event

when a product was purchasedwhen a customer bought somethingwhen a sale was recorded

Each can be thought of as an axis

or dimension (3D)Each occurred at a moment in time (4D)

5.2.08. Star schemae and HypercubesData centralised in ‘fact’ tableReferencing creates star patternDimensions as satellite tablesNormalising creates snowflake schema

Page 84: Databases 1. Data models 1.1. Introduction · Supermarket inventories/EPOS Supplies, shopping habits, store locations, accounts Films (iMDB) Cast lists, shooting schedules, histories,

Databases http://www.lightenna.com/book/export/s5/101

84 of 84 25-2-07 4:23 pm

5.2.09. HypercubesHypercube is also a multi-processor topology inspired by a 4D shapeUsed by Intel’s iPSC/2Good at certain database operationse.g. Duplicate removalMIMD