Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Databases http://www.lightenna.com/book/export/s5/101
1 of 84 25-2-07 4:23 pm
DatabasesThe Database Training course is structured into five sections.
1. Data modelsThis is the Data models course theme. In this section we introduce concepts for modelling data andlanguage for communicating those models.[Complete set of notes PDF 870Kb].
1.1. IntroductionDatabases are pervasive in modern society. So many of our actions and attributes are logged and stored inorganised information repositories, or Databases[Section notes PDF 227Kb].
1.1.01. DatabasesWhere do we come into contact with databases?Supermarket inventories/EPOS
Supplies, shopping habits, store locations, accountsFilms (iMDB)
Cast lists, shooting schedules, histories, budgetsDepartment
Students, courses, staff, payroll
These are all examples of relatively simple databases. All of the information is textual or referential.
1.1.02. New technologiesNot just traditional, numeric/textualResearch 70s biasedDigital media
Video servers (atom/bbc/youtube)Multimedia databases
Web site, collection of diverse data typesGoogle, AltaVista
Stock ExchangeFutures, Currency markets, trendsDatabases comprising not only data, but modelling algortihms
Microsoft's WinFS
Databases don't have to store just text. Increasingly Database servers are storing, indexing and deliveringrich-media content, explicitly images, audio and video.
Databases http://www.lightenna.com/book/export/s5/101
2 of 84 25-2-07 4:23 pm
Microsoft's next generation File storage system (WinFS) is a relational database. From a user perspective,searching (the process of indexing content by keyword) is already mainstream. Users are moving awayfrom rigid directory structures (files and folders) and towards keyword-tagged content.
1.1.03. VariationsNot only in roleSize
Video server exampleComplexity
Ford Puma™ exampleFord staff organised by production line and care.g. Each staff member answers to a 'part' manager (engines, bodyshell, chassis) and a 'car' manager (Puma, Mondeo, Ka)
ExpenseUser profiles and demands
We've seen that databases are used in a variety of contexts. Those roles imply properties of each of thesystems. An interlinked text-only database (such as Unix/Linux's MAN pages) will require much lessstorage than a video archive.
Some databases are perceptually more complex. Ford's staff management model would be represented as amatrix (in this case 2 dimensional). Computers are very good at organising multi-dimensional space.
1.1.04. DefinitionEmbrace diversityData and semanticDatabase: related data, implicit meaningSample/subset of real world
real world with boundsMiniworld/Universe of Discourse
A single definition of a database is hard to come by. Dictionary.com defines a database as: a comprehensive collection of related data organized for convenient access, generally in a computer. The Wikipedia definition runs for several pages.
1.1.05. AbstractionPrevious lab exercises
Problem: reading data from a fileAbstraction themeBasis of good OOP and further good P
from encapsulation to software component analysisLayering, splitting data from design
Solution: grammar (language guide) and data
In some of your previous lab assignments, or practical experience, you may have been faced with the problem of caching information persistently in a file, later to be reloaded.
Databases http://www.lightenna.com/book/export/s5/101
3 of 84 25-2-07 4:23 pm
When writing the data into the file, we are storing more than just that information. We are storingimplicitly a design/grammar for that data. That implicit design is evident when accessing the file with anaive interface. If you try to read the data out in a different order, it fails.
A better solution is to split the way the information is stored from the actually information stored.
1.1.06. Data-design divideLeft-hand/right-hand divideLHS: Catalog
or Meta-dataor Intensionor Schemai.e. the Design of databaseTypes of data, organisation, constraints
RHS: Extensionor SnapshotThe data itselfInformation stored in the databaseTuples
1.1.07. DBMSDataBaseManagementSystemCollection of programs that enable users to:
Define - patterns, boundaries, designConstruct - populate to go liveManipulate - runtime changes
data in a structured, organised store.
1.1.08. Database Management Systems
Databases http://www.lightenna.com/book/export/s5/101
4 of 84 25-2-07 4:23 pm
PropertiesData models and independence
RequirementsCategorisation of Database users
DBMS componentsArchitecture
When considering the database systems as a whole, we need to look at all the components, including elements that interact with the DBMS (users, whom we categorise for simplicity).
This course will contain a discussion of the components that make up the system and the way they interact(system architecture).
1.1.09. ModelsData modelsRelational data model (Oracle)Object data model (ObjectStore)Legacy systems
Hierarchical data modelNetwork data model
A data model is an invention. It is a construct that allows us to share an understanding of how the systemworks. As with all good constructs, it's an abstraction; a simplification; a story.
In this course we're going to look at the Relational model, where the database is organised into tables (relationals) and each row (tuple) within that relation is coded (keyed) to allow referencing between the relationals.
The Relational model, inspite of being innovated in the 1970s is still the most popular, underpining mainstream modern databases such as Oracle 10i and MySQL 5.0
As programming languages are becoming increasingly Object orientated, programmers require a means of persistently storing their Objects. Object Orientated Databases (OODBs) exist to fulfil this purpose.OODBs may ultimately replace relational databases, but it's not clear at this stage when.
1.1.10. Other propertiesBeside Data modelNumber of usersNumber of sitesCostTypes of access pathsGenerality, or inversely specificity
MySQL is a highly general database system, in that it supports many different designs. My mobile phoneaddress book is a highly specific database system and as such is not easily extensible.
1.1.11. Independence
Databases http://www.lightenna.com/book/export/s5/101
5 of 84 25-2-07 4:23 pm
Based on File processingData definition implicit in
DataApplication program
Example of one specific databaseStructure embedded into access program
Coursework code re-use example
Earlier I made mention of this problem. Databases tie into the wider Software Engineering field. WithinSoftware Engineering, post-development issues of code re-use, maintenance, future evolution etc. necessitate a logical flexible approach to program design. Databases are such an approach. In order tostore information in a database you invest a small amount of time in explicitly structuring it, however you then get things like flexibility (data independence etc.) for free.
1.1.12. Program-data independenceGeneral databases
Separate Data definition and dataCatalog/meta-data & tuples
Data format/structure stored separatelyProgram-data independence (e.g. Y2K)
Changes in data formatAlter data (tuples)Alter grammar (catalog/meta-data)
We actually split program-data independence into:LogicalPhysical
...more in a second
1.1.13. Program-operation independenceIn object-orientated databases
Objects consist of attributes and operationsOperation defined by
Header/Interface/Prototype or SignatureImplementation
Program-operation independenceImplementation change hidden from user
Collectively data abstraction
1.1.14. Logical and physical program-data independence
Three tier data-model diagramMappings, Data independence
Logicalbetween conceptual and externalchanges to conceptual without changing
external schemas or application programs
Databases http://www.lightenna.com/book/export/s5/101
6 of 84 25-2-07 4:23 pm
Physicalbetween internal and conceptualchanges to internal without changing
conceptual or external schemas
1.1.15. DBMS RequirementsDataBase Management System
Abstraction (i.e. program-data independence)Conceptual representation (data models)Multiple views and User InterfacesData sharing and transaction processingAccess restrictionRedundancy removal/optimisationPersistent storage (Program objects) & IntegrityRelationship management & InferenceBackup and recovery
Here's a summary of what we need from a DBMS
1.1.16. Database UsersDatabase as 1y resource, DBMS as 2yDatabase administrators (DBA)Database designersEnd users
Naïve - canned transactions e.g. bank/airlineSophisticated - engineers, scientists, query editorsStand-alone (personal databases/MS Access)
1.1.17. Results of useKnock-on effects of database approachEnforcing standardsReduced Development timeAdaptive to change (design changes)
Databases http://www.lightenna.com/book/export/s5/101
7 of 84 25-2-07 4:23 pm
Up-to-date information (live database)Economies of scale
centralising commonly required resources
...and this is what you get for free. These are the consequences, largely positive, of adopting a databaseapproach to an information storage problem.
1.1.18. Design sideMeta-data
Database schema, intensionData ModelLeft hand side of database divide
Schema diagramEntity-relationship (ER diagrams)UML diagrams
1.1.19. Data sideData, under the column headingLess easy to look at (volume issue)Fundamentally less interesting (more specific)Variety of tools for looking at it:
HeidiSQL, PhpMyAdmin, SwordHere's what a snapshot looks like:
Databases http://www.lightenna.com/book/export/s5/101
8 of 84 25-2-07 4:23 pm
1.1.20. Data modelData model
Structure of the databaseCollection of basic operationsCollection of behaviours/user defined operations
Dependent on level of abstractionTier diagram
External (user views)Conceptual*Internal
1.1.20. General data model termsEntityRelationshipAttributesKeys
What follows here is an introduction to the terms which make up the language that we use to describe datamodels.
1.1.21. EntitySelected from real worldPopulate Miniworld/UoDEntity is an approximationTwo elements: Entity types and setsCollection of attributes
Object similarity, classes as entity typesEntities inter-relate
Databases http://www.lightenna.com/book/export/s5/101
9 of 84 25-2-07 4:23 pm
1.1.22. AttributesData type, domainSimple or compositeSingle or multivaluedStored or derivedNull
1.1.23. KeysMechanism for unique identificationUniqueness constraintStrong and weak entitiesKey attributeComposite keysMultiple keys
1.1.27. RelationshipsTypes and SetsParticipation by EntitiesDegree - e.g. binaryCardinality ratios - e.g. 1:1, 1:M
Determine occurrenceTuples example
Foreign keys
Databases http://www.lightenna.com/book/export/s5/101
10 of 84 25-2-07 4:23 pm
1.1.28. Database architectureClient-ServerDistributed databases
Fragmentation by attribute/tuple/relationLanguage and description
Storage Definition Language (SDL) DESIGNData Definition Language (DDL) DESIGNView Definition Language (VDL) DESIGNData Manipulation Language (DML) DATA
High level, can be embedded but precompiledProcedural, record-at-a-time, requires high level support
1.1.29. Structured Query
Databases http://www.lightenna.com/book/export/s5/101
11 of 84 25-2-07 4:23 pm
Structured Query Language (SQL or SEQUEL)Success of relational databases
Developed for SystemR at IBMANSI standardised
SQL1 or SQL-86, ongoing extensionSQL2 or SQL-92, current versionSQL3 (1999), SQL2003DDL, DML(low-level) and VDL
1.2. Data modelsA data model is a model that describes in an abstract way how data are represented in a business organization, an information system or a database management system - Wikipedia[Section notes PDF 310Kb].
1.2.01. IntroductionRelational modelAbstract operations on relations
Set theoretic operationsRelational-specific operations
Basic algebra operationsUnion, Intersection, DifferenceCross product
1.2.02. Relational modelTable as relationRow as tuple
real world entity or relationshipfact
Column as attributeDomain
The concept of a relation is abstract, therefore we have a number of different ways of visualising it.
Databases http://www.lightenna.com/book/export/s5/101
12 of 84 25-2-07 4:23 pm
1.2.03. RelationRelation schema R(A1, A2, A3.. An)
Design sideAssertion/declaration
Relation stateData sideset of n-tuples
each one an ordered list of values1NF: each value is atomic, no composite/multivalue
1.2.04. Abstract operationsDatabase lifecycle
design, populate, evolveInsert
tuple (a1,a2,a3…an)Delete
tuple (a1,a2,a3…an)Update (or modify)
tuple (a1,a2,a3…an)attribute to change, new value
Databases http://www.lightenna.com/book/export/s5/101
13 of 84 25-2-07 4:23 pm
All the operations described in the next few sections are abstract. We're going to see how valuable theycan be in processing real world data later.
1.2.05. Basic algebraTwo categories
Set theoretic operationsUnion, Intersection etc.
Relational specificSelect, project and join
At this stage we're talking about set theoretical operators on the Relational model, not SQL instructions which confusingly have identical names and only similar behaviour.
1.2.06. Select operationSELECT a subset of tuples from a relation
Uses selection conditionEvaluate each tuple to true of falseFalse tuples discarded
Sigma (s)output = s(cond)(input_relation)Relation schema: R(output) = R(input_relation)Commutative
1.2.07. Project operationPROJECT a subset of attributes for all tuples from a relation
Databases http://www.lightenna.com/book/export/s5/101
14 of 84 25-2-07 4:23 pm
Pi (p)p<attribute list>(R)
If sublist is only non-key attributesmight get duplicates
Removes duplicatesAttribute list:sublist example
The result set of the operation is itself a relational. That output relation will contain the same number ofrows as the input, however it may contain a different number of columns; fewer if a subset of attributes is projected; more if derived or aggregated attributes are included.
1.2.07. Sequences of operationsSelect followed by projectionArea clipping: rows then columnsp<attr list>(s(select_cond)(R))Rename operation (r)
Renames attributes list2 from list1r(new_attr_names)(R)
1.2.08. Rename operationAttribute renaming only
Cannot alter domain, or add/remove attrRename operation (r)
Renames attributes list2 from list1r(new_attr_names)(R)
Implicit renaming
Databases http://www.lightenna.com/book/export/s5/101
15 of 84 25-2-07 4:23 pm
Order dictated by relational schema
1.2.08. Set TheoreticBinary operation: two relations
Sets of tuplesUnion compatibility (same attributes)
Union (R u S)Intersection (R n S)
Commutative (R u (S u T) = (R u S) u T)
1.2.08b. Set differenceSet difference
Non-commutative (R-S != S-R)
1.2.09. Cross productCartesian product of two relationsR x S
Databases http://www.lightenna.com/book/export/s5/101
16 of 84 25-2-07 4:23 pm
Also known asCross productCross join
Cross product diagramIntroduction to complexity
Computationally explosive
1.2.10. Relational algebra/model notationRelational schema R(A1, A2,…,An)Relation state r or r(R)
Set of unordered tuplesr = {t1, t2,…,tn}
Each n-tuple is an ordered list of valuest = <v1, v2,…,vn>
ith value in t = vi called t[Ai]r(R) subset of (dom(A1) x dom(A2)... x dom(An))
1.2.11. ConstraintsDomain constraint
For all v in t of r(R)vi is an element of dom(Ai)
Entity constraintK = SKmint[K] != null
Key constraintSuperkey SK as identifying subset of attributest1[SK] != t2[SK]
1.2.12. Referential integrityGiven two relations R1 and R2
R1 contains a foreign key (FK) that referencesA primary key (PK) in R2
Databases http://www.lightenna.com/book/export/s5/101
17 of 84 25-2-07 4:23 pm
R1 referencing relation, R2 referenced relationShared domains: dom(FK) = dom(PK)Foreign exists: t1 in r(R1), t2 in r(R2)
t1[FK] = t2[PK] || NULL
1.3. JoinsIn this lecture we look at...[Section notes PDF 233Kb].
1.3.01. IntroductionRecap: pulling data out of individual relations
By row, by columnSelect and project
Access across multiple relationsMiniworld approximation
Fragmenting entities by cardinalityTuples as entity fragmentsRelationships within relations
JoinsJoin types (condition and unmatched)
1.3.02. Access across relationsRelational model allows multiple relations to exist within one database schemaRelations can be accessed individually or together (joins).Referential integrity
Relations relatingPulling data out of single relations
Select and projectPulling related data out of
Multiple relations using Join
1.3.03. Miniworld approximationUniverse of Discourse, or MiniworldMiniworld is an incomplete model of the real worldThe relational data model as a model for the miniworldApproximation
Separate and distinct entitiesSingle complex entitiesSeparate related entitiesCardinality of relationships
Each relation made up of attributesValues can be used as references
Databases http://www.lightenna.com/book/export/s5/101
18 of 84 25-2-07 4:23 pm
1.3.04. Pointing mechanismRelation has a Primary keyTuple contains Primary key valueForeign keys
Tuples can contain a reference to another relation's Primary keyJust numbers
One number identifies a single tuple in one relation (local), one number identifies a single tuple in anotherrelation (foreign).
1.3.04b. Pointing mechanism example in CC programming languageMemory addresses, or pointers
int a=0;int b=0;a = &b;
a points to b
In databases, typically done with unique identifiers (IDs) rather than memory addresses.
1.3.04c. Pointing mechanism with structuresForeign key importing
typedef struct car{ int ID; char[] make; char[] model; char[] derivative;
Databases http://www.lightenna.com/book/export/s5/101
19 of 84 25-2-07 4:23 pm
int optionID;} car;
typedef struct option{ int ID; char[] name; int price;} option;
car c;option o;//...data structure populatingc.optionID = o.ID;
1.3.05. Relational cardinality1:0 relationships
Single entityUniquely indentifiableCandidate keysPrimary Key
1:1 relationshipsTwo entities, A and B1 A relates to 1 B and vice versa
1:N relationshipsM:N relationships
1.3.06. Relationships in the relational modelTwo relations, A and BA side, B side, 1 side, N side1:1 relationships
Key can go on either side
1:N relationshipsKey cannot go on 1 sideHas to go on N side
Databases http://www.lightenna.com/book/export/s5/101
20 of 84 25-2-07 4:23 pm
M:N relationshipsNowhere obvious for the key to goCreate new pairing relation
1.3.07. JoinsPhase change, different point in lifecycleJoin operation
Combines related tuples, conditionallyFrom two relationsInto single tuples
Allows processing of relationshipsAmong multiple relations
1.3.08. Joins, canonical algebraic formConditional (on join condition)
Only combines tuples where trueCartesian product (conditionless)
example of conditionless joinall tuples combinedR �true S
�, Binary operatore.g. R �<join_condition> S
1.3.09. Join equivalenceEquivalent to sequence
Cartesian product (X)followed by Selection (s)
ACTUAL_DEPENDENTS =sSSN=ESSN(EMPNAMES X DEPENDENT)orACTUAL_DEPENDENTS =EMPNAMES � SSN=ESSN(DEPENDENT)
1.3.10. Join types (condition)
Databases http://www.lightenna.com/book/export/s5/101
21 of 84 25-2-07 4:23 pm
Theta: Ai q Bj(A from R, B from S)
q is comparison operator=,<,>,!=,>=Ai and Bj share the same domain
Equi: Ai = BjTheta join where q is =
Natural: Ai and Bj are the same attributein two separate relations (name and domain)* denotes natural joine.g. EMPNAMES * DEPENDENTS
1.3.11. Join types (inner and outer)Inner joins
not the only joinseliminate tuples without a matching counterparti.e. tuples with a null value for the join attribute are discarded
1.3.12. Outer joinsOuter joins control what's discarded
Keep unmatched tuples in eitherLeft, right, or both relationsLeft, right of full outer join correspondingly
Databases http://www.lightenna.com/book/export/s5/101
22 of 84 25-2-07 4:23 pm
1.4. ER diagramsIn this lecture we look at...[Section notes PDF 75Kb].
1.4.01. ER Diagrams and Relational mappingDesign communication techniques
ER diagramsER to relational mapping
Entities to ObjectsType Inheritance
EER diagramsUML
Web DB Integration
1.4.03. Design in the modern contextTeam based developmentDocumentation
Value of design over descriptionDB sketching (left hand side)
Concept more important than perfectionDesign iteration
Mini-world as approximationCategorisation to create entitiesVerb’ing to create actions/relationships
1.4.04. Database Left:right divideDesign
Catalog, Meta-data, Intension, or Database schemaEntity typeRelationship type
StateSet of occurences/instances, Extension, snapshot
Databases http://www.lightenna.com/book/export/s5/101
23 of 84 25-2-07 4:23 pm
Entity setRelationship set
1.4.05. Basic ER diagramTypically part of a system(Strong) Entities
ProductCustomerPayment
RelationshipsSale
1.4.06. Mapping ER to Relation DB tablesIntuitive mapping
Entities as tables
Databases http://www.lightenna.com/book/export/s5/101
24 of 84 25-2-07 4:23 pm
Attributes as columnsRelationships are more difficultKey sharing mechanism
Foreign key references primary keyWhere to put the foreign key forms the intuitive guide to the rest ofthe mapping
1.4.07. ER to Relational mappingStep-by-step approach
Strong entities1.Create relation including (simplified) attributes
Weak entities2.Create relation inc. attr, foreign/pri key of owner
Binary relationship S:T, 1:13.Choose relation, say S (with total participation) and inc. foreign/prikey of Tinc. relationship attributes
1.4.08. CardinalitySpecifies number of relationship instances a single entity can participate inS:T (1:1)An entity from table S can is related to one, and only one entity from table T1:1, 1:N, N:M
DEPARTMENT : EMPLOYEEEMPLOYEE : EMPLOYEEPROJECT : EMPLOYEE
1.4.09. ER to Relational mappingBinary relationship 1:N5.
Choose relation T (N-side) inc. foreign/pri key of SBinary relationship M:N6.
Create relation, inc. foreign/pri keys of S&TMultivalued7.
For each mv_attr, create new relation, inc. foreign/pri key of parentn-ary relationship8.
Create new relation, inc. all foreign/pri keys of participatingentities
1.4.10. ParticipationParticipation constraintsExistence of an entity dependant upon
being related to another entityvia relationship type (left hand/design)
Databases http://www.lightenna.com/book/export/s5/101
25 of 84 25-2-07 4:23 pm
Total (")/Existencedependency (double line)
Every student must be in a facultyFor every entity in the total set of students
Partial ($) (single line)Some students are student_representativesThere exists some entity(s) within the set of all…
1.4.11. Library exampleLibrary exampleEntities: Person, Book, LibrarianTime-perspective implications
1.5. UMLIn this lecture we look at...[Section notes PDF 86Kb].
1.5.01. UML diagramsNot just one type of diagramER Entities -> UML objectsAdds scope to include methods
Databases http://www.lightenna.com/book/export/s5/101
26 of 84 25-2-07 4:23 pm
1.5.03. Simple UML relationship diagramClasses
AttributesMethods
RelationshipsParticipationCardinalityRoles
1.5.04. UML inheritanceNotion of inheritance
Java parallel'is a' relationshipsDatabase Student
is a Computer Science Studentis an Engineering Faculty Student
is a Studentis a Person
Inheritance hierarchiesUML diagrams can be used to show inheritance
Databases http://www.lightenna.com/book/export/s5/101
27 of 84 25-2-07 4:23 pm
1.5.05. UML diagramClassesRelationshipsInheritance
2. DBMS SystemsThis is the DBMS Systems course theme.[Complete set of notes PDF 482Kb]
2.1. Queries
Databases http://www.lightenna.com/book/export/s5/101
28 of 84 25-2-07 4:23 pm
In this lecture we look at...[Section notes PDF 319Kb]
2.1.01 IntroductionMethods of getting data outThe need for queriesQBESQL (design side)
HistorySchemas and relations (CREATE)Data types and domainsDROP and ALTER
2.1.02. Querying interfacesHigh (view) levelQuery-By-Example (QBE)
Alternative to SQLTable driven (visually similar to relations)Rather than script driven, hence intuitive over learnedVisual or text based
User fills in templatesMicrosoft Access approach
2.1.02b. QBE visual exampleRecord advancingQuery designing
Finite domain attributesWeb search parallel
Databases http://www.lightenna.com/book/export/s5/101
29 of 84 25-2-07 4:23 pm
2.1.03. QBE text-format exampleP. print, I. insert, D. delete, U. update_VARNAME, copy field value into variable
2.1.05. SQLStructured Query Language
(SQL or SEQUEL)Wikipedia reference
Success of relational databasesDeveloped for SystemR at IBMANSI standardisedSQL-1986 (SQL1), ongoing extensionSQL-1992 (SQL2), current version (Oracle 9i)SQL-1999 (SQL3), regular expression matching, recursive queriesSQL-2003, XML features, auto-generated columns
2.1.05b. SQL command syntaxWhere follows here is a brief summary
Oracle syntaxSimilar but not identical to MySQL/MSSQL
General familiarityQuery writing best learnt by doing itLecture live-exampleCoursework 1 will be SQLOracle (9i) SQL referenceMySQL (5.0) SQL reference
2.1.05c. SQL in application
Databases http://www.lightenna.com/book/export/s5/101
30 of 84 25-2-07 4:23 pm
Keyword oriented languageKeywords not congruous with Relational modelLots of different ways to write SQL
Analogous to C/Java formattingif (b==2) { a=1; } else { a=0; }
Recommend using case to differentiate attributes and keywordsSELECT colour, size, shape FROM fruit WHERE weight>22;
Oracle user accounts on Teaching databaseNamespace references, e.g. shared.cars
2.1.06. SQL Create schemaData definition commandsCREATE
SCHEMA <schema_name> AUTHORIZATION <a>or workspace
Beware of namesName collisions produce odd behaviours
SQL Schema embraces Tables (relations), constraints, views, domains, authorizations
2.1.07. SQL Create tableCREATE TABLE
<schema_name>.<relation_name>(
<attribute_definitions><key><constraints>
)CREATE TABLE example (Oracle)Tables can (and should) be indexed by usere.g. <username>.<tablename>Normal login implies usernameNon-local table access
2.1.08. Data types and domains (Oracle)Numeric
ENUMNUMBER, NUMBER(i), NUMBER(i,j)
Formatted numbers, i precision, j scale(number of digits total, after decimal point)
Character-stringCHAR(n) - n is lengthVARCHAR2(n) - n is max
DESCRIBE output exampleMulti-database comparison of DatatypesDatabase legacy: limited storage necessitated efficient storageDoes it need to be efficient anymore?
Databases http://www.lightenna.com/book/export/s5/101
31 of 84 25-2-07 4:23 pm
You might consider all SQL types as being conceptually similar to attribute types in the relational model, although in reality the implementation of these types in a DBMS only approximates the mathematical purity of unordered domain sets etc.
2.1.08b. Data types and domains (MySQL)Numeric
TINYINT, INT, INT UNSIGNEDFLOAT, DOUBLE, DECIMALENUMCharacter-string
CHAR(n) - n is lengthVARCHAR(n) - n is maxTINYTEXT, TEXTBeware different default/maximum lengths to Oracle
BLOBMulti-database comparison of Datatypes
2.1.09. Time-based data typesDate and Time
DATETen positions, components YYYY-MM-DD
TIMEEight positions, components HH:MM:SS
TIME(i)Time fractional seconds precisionAdds i+1 positions
TIMESTAMPoptionally WITH TIME ZONE
Very sensitive to syntactical ambiguitiesday/month/year/hour/minute separators
2.1.10. DROPingDROP <object> <obj_name> <flags>DROP SCHEMA <schema_name> CASCADE
drops all workspace tables, domainsDROP TABLE <relation_name> RESTRICT
only drops table ifnot referenced in any constraints/views
Notion of cascadingTable links
2.1.11. ALTERingSchema evolutionDesign side
Databases http://www.lightenna.com/book/export/s5/101
32 of 84 25-2-07 4:23 pm
ALTER TABLE <schema_name>.<relation_name> ADD<var_name> <var_type>;Example
ALTER TABLE uni.student ADD hall VARCHAR(32);Upper and lower case syntaxNaming conventions
2.1.12. QueriesHelper interfaces
HeidiSQL/phpMyAdmin/Sword/SQLplusDesign/perform a lot of routine queries for youImportant to learn SQL, reinforcementDesigning select queries is more difficultVisual interfaces still lacking in this area
Select queries in SQLBasic singletsRenamingQueries with JoinsNested queries
2.1.13. SQL QueriesSELECT statementSimilar to relational data model SELECT then PROJECT
SELECT <attribute list>FROM <table list>WHERE <condition>;
Databases http://www.lightenna.com/book/export/s5/101
33 of 84 25-2-07 4:23 pm
2.1.14. SQL QueriesSELECT <attr_list>
FROM R,S,TWHERE DNO = 10
equivalent top<attr_list>(sDNO=10 (R X S X T))True-false evaluation tuple by tupleWHERE clause as compound logical statement
2.1.15. SQL QueriesProduces a relation/set of tuplesCan be used to extract a single tuplee.g. SELECT bday, age
FROM studentWHERE fname='Tim' AND lname='Smith'Result = (13-05-80, 20)
Argument quoting (')SQL poisoningNot nullNot numeric values
MySQL Attribute quoting (`)Hypothetical attribute `all`, all, and ALL
SQL poisoning is a vulnerability exposed by inadequate escaping of arguments/variables used to compose SQL queries.
E.g. Tim in previous example, could be Tim'; DELETE FROM student;' SELECT * FROM student WHERE 1
2.1.16. Renaming and referencingAS keyword(Partial) Attribute renaming in projection list
SELECT fname AS firstName, minit, lname AS surname...Role names for relations
SELECT S.FNAME, F.FNAME, S.LNAMEFROM STUDENT AS S, STUDENT AS FWHERE S.LNAME=F.LNAME
(Total) Attribute renaming in FROMSELECT s.firstName, s.surname
FROM student AS s(firstName,surname,DOB,NINO,tutor)Wildcards (SELECT s.* FROM...)
2.1.17. SQL TablesRelations are bags, not sets
Databases http://www.lightenna.com/book/export/s5/101
34 of 84 25-2-07 4:23 pm
e.g. projection of non-key attributesSet cannot contain duplicate item/repetitionDuplicates exist in bags and be:
SELECT DISTINCT (eliminated)SELECT ALL (ignored/kept)
2.1.18. Queries and JoinsRelational database allows inter-related dataSQL select FROM gives Cartesian productWHERE clause defines join condition
SELECT proj.pnum, mgr.ssnFROM project AS proj, employee AS mgrWHERE proj.mgrssn = mgr.ssn;
Alternatively, explicitly define join (note type)SELECT project.pnum, employee.ssnFROM project INNER JOIN employee ON project.mgrssn = employee.ssn;
2.1.18b. Outer joinsOuter joins are crucial in the real-worldDatabases often contain NULLs (3VL)Analysis of where the crucial data is across a relationshipPrevious example, only get project data for managed projects
SELECT project.*, employee.*FROM project INNER JOIN employeeON project.mgrssn = employee.ssn;
2.1.18c. Outer joins (cont)Scale of loss isn't always instantly obviousNULLs often used unpredicablyMay want project information, even if no employee attached as manager
Databases http://www.lightenna.com/book/export/s5/101
35 of 84 25-2-07 4:23 pm
SELECT project.*, employee.*FROM project LEFT OUTER JOIN employeeON project.mgrssn = employee.ssn;
2.1.19. 2y and 3y joinsQueries can encapsulate any number of relations
Even one relation many times (in different roles)Relationship chainAcross many relations
Tuples as Entities OR Relationshipse.g. Employee -> Works_on -> Project -> Department ->Manager
2.1.20. Recursive closureCan’t be done in SQL2Recursive relationshipsUnknown number of stepsSQL2 can’t generalise in single query
2.1.21. Nested queriesEssential one or more (inner) queries within an (outer) queryInner and outer queryNot to be confused with inner and outer joinsInner query can go in three places
SELECT clause (projection list)Must return a single value, then aliased as attribute in outer result
FROM clause
Databases http://www.lightenna.com/book/export/s5/101
36 of 84 25-2-07 4:23 pm
Inner query result used as standard table in FROM cross productWHERE clause
2.1.21b. Nested query exampleUse of query result as comparator for other (outer) query
SELECT DISTINCT courseFROM dept WHERE course IN (
SELECT d.courseFROM dept AS d, faculty AS f, student AS sWHERE d.ownfac=f.id AND s.owndept=d.idAND f.name='Eng' AND s.year='3'
) OR course IN (SELECT courseFROM deptWHERE code LIKE 'COMS3%');
2.1.22. Bridging SQL across 3 tiersThree tier database designChanging role of DBMSIndicesAggregate functions (conceptual)
Over bags and sub-bagsCreating and updating views (ext)SQL embedding
In this subsection we look at the different roles SQL play across the three tiers of database design. Wediscuss the areas in which SQL is lacking and how those difficiencies can be complemented by embedding SQL in other languages.
2.1.25. IndicesLow/Internal levelIndex by one attributeFor queries selecting by that attribute:
Databases http://www.lightenna.com/book/export/s5/101
37 of 84 25-2-07 4:23 pm
Faster tuple access (ordered tuples)Reduces database memory load
Small cross product relation, only crosses requisitesAccelerates query resolution time
CREATE INDEX Index_Name ON RELATION(Attribute);
2.1.26. Aggregate functionsRun over groups of tuplesTakes a projected attribute list as an argumentProduce relation with single tupleSUM, MAX, MIN, AVG, COUNTe.g. AggFunc over all tuples
SELECT SUM(SALARY), MAX(SALARY), MIN(SALARY)FROM EMPLOYEE;
Single attribute lists (distinct values)Multi-attribute lists (granularity of distinct values by pairing)
2.1.27. Aggregates over sub-bagsCan run over subsets of tuplesGROUP BY keywordSpecifies the grouping attributesNeed to also appear in projected attr_listShow result along side value for group attre.g. AggFunc over subgroups
SELECT dno, COUNT(*)FROM employeeGROUP BY dno
Quick SQL check, do all attributes in the SELECT projection list appear in the GROUP BY projection list.
2.1.28. Creating viewsViews are partial projectionsVirtual relations, or views of live relationsUpdate synchronised
CREATE VIEW <virtual_relname>AS <real_relation>
Real relation could be a query resultClever bit is the change propagationUPDATEs made to the view dataset are flooded back to relations
INSERT and DELETE behaviour needs to be definedNon-trivial as INSERT into view (virtual relation) may leave holes in real relation
2.1.29. Embedding SQL
Databases http://www.lightenna.com/book/export/s5/101
38 of 84 25-2-07 4:23 pm
SQL (alone) can do lots of clever things in one expressionBut can only execute a single expressionCan structure SQL commands into proper programming languagesJava Database Connection (JDBC)
javac, then java VMCOBOL, C or PASCAL
precompiled with PRO*COBOL or PRO*CProcedure Language (PL/SQL)
Oracle/MySQL procedural languageStored procedures can take parameters
2.2. InternalsIn this lecture we look at...[Section notes PDF 180Kb]
2.2.01. IntroductionDatabase internals (base tier)RAID technology
Reliability and performance improvementRecord and field basicsHeaders to hashingIndex structures
2.2.01b. Machine architecture (by distance)Distance from chip determines minimum latencySpeed of light is a constantImpact of bus frequencies
IDE (66,100,133 Hz)PCI, PCI-X (66,100,133 Hz)PCI Express (1Ghz to 12Ghz)
Impact of bus bandwidthsPCI (32/64 bit/cycle, 133MB/s)PCI Express (x16 8.0GB/s)
Databases http://www.lightenna.com/book/export/s5/101
39 of 84 25-2-07 4:23 pm
Here's a link from Intel showing a machine architecture with signal bandwidths: Intel diagram
2.2.01c. Machine architecture (by capacity)Capacity increased with distanceStaged architecture as compromiseSpeed, time/distanceAlso cost, heat, usage scale
2.2.02. Database internalsStored as files of records (of data values)
Auxiliary data structures/indices1y and 2y storage
memory hierarchy (pyramid diagram)volatility
Online and offline devicesPrimary file organisation, records on disk
Heap - unorderedSorted - ordered, sequential by sort keyHashed - ordered by hash keyB-trees - more complex
Databases http://www.lightenna.com/book/export/s5/101
40 of 84 25-2-07 4:23 pm
2.2.03. Disk fundamentalsDBMS task
linked to backup1y, 2y and 3y
e.g. DLT tapeChanging face of current technology
Impact of inexpensive harddisksFlash memory devices (CF, USB)
Random versus sequential accessLatency (rotational delay) andBandwidth (data transfer rate)
2.2.04. RAID technologyRedundant Array of Independent DisksData striping
Blocks (512 bytes), bits and transparencyReliability (1/n)
Mirroring/shadowingError correction codes/parity
Performance (n)Mirroring (2x read access)Multiple parallel access
2.2.05. RAID levels0 No redundant data1 Disk mirrors (performance gain)2 Hamming codes (also detect)3 Single parity disk4 Block level striping5 and parity/data distribution6 Reed-Soloman codes
2.2.06. Records and fieldsDBMS specific, generallyRecords (tuples) comprise fields (attributes)File is a sequence of recordsVariable length records
Variable length fieldsMulti-valued attributes/repeating fieldsOptional fieldsMixed file of different record types
2.2.07. Fields
Databases http://www.lightenna.com/book/export/s5/101
41 of 84 25-2-07 4:23 pm
records -> files -> disksFixed length for efficient accessNetworking issuesDelimit variable length fields (max)Explicit record/field lengthsSeparators (,;,:,$,?,%)Record headers and footersSpanning
block boundaries and redundancy
2.2.08. Primary organisationBias data manipulation to 1y memory
Load record to 1y, write backCache theorem
Data storage investment, rapidity of accessoptimisations based on frequent algorithmic use
Ordering, ordering field/key fieldHashing
2.2.09. Indexes/indicesAuxiliary structures/secondary access pathSingle level indexes (Key, Pointer)File of recordsOrdering key fieldPrimary, Secondard and Clustering
2.2.09b. Primary index examplePrimary index on simple tableOrdering key field (primary key) is IntegerPointers as addressesSparse, not dense
2.2.10. Primary Index file (as pairs list)
Databases http://www.lightenna.com/book/export/s5/101
42 of 84 25-2-07 4:23 pm
Two fields <K(i),P(i)>Ordering key field and pointer to blockSecond example, indexing candidate key Surname
K(1)="Barnes",P(1) -> block 1Barnes record is first/anchor entry in block 1
K(2)="Smith",P(2) -> block 6K(3)="Zeta",P(3) -> block 8
Dense (K(i) for every record), or SparseEnforce key constraint
2.2.10b. Clustering index exampleClustering indexOrdering key field (OKF) is non-keyEach entry points to multiple records
2.2.11. Clustering Index (as pairs list)Two fields <K(i),P(i)>Ordering non-key field and pointer to block
Internal structure e.g. linked list of recordsEach block may contain multiple records
K(1)="Barnes",P(1) -> block 1K(2)="Bates",P(2) -> block 2K(3)="Zeta",P(3) -> block 3
K(i) not required to havea distinct value for each recordnon-dense, sparse
2.2.11b. Secondary Index exampleIndependent of primary orderingCan't use block anchorsNeeds to be dense
Databases http://www.lightenna.com/book/export/s5/101
43 of 84 25-2-07 4:23 pm
2.2.12. More indicesSingle level index
ordered index filelimited by binary search
Multi level indicesbased on tree data structures (B+/B-trees)
faster reduction of search space (logfobi)
2.2.13. IndicesDatabase architecture
Intension/extensionIndexes separated from data file
Created/disgraded dynamicallyTypically 2y to avoid reordering records on disk
2.2.14. Query optimisationFaster query resolution
improved performancelower loadhardware cost:performance ratio
Moore's lawQuery process chainQuery optimisation
2.2.15. Query processingCompile-track familiarity
Scanner/tokeniser - break into tokensParser - semantic understanding, grammarValidated - check attribute names
Query treeExecution strategy, heuristic
Query optimisationIn (extended relational) canonical algebra form
Databases http://www.lightenna.com/book/export/s5/101
44 of 84 25-2-07 4:23 pm
2.2.16. Query optimisationSQL query
SELECT lname, fnameFROM employeeWHERE salary > (
SELECT MAX(salary)FROM employeeWHERE dno=5
);Worst-case
Process inner for each outerBest-baseCanonical algrebraic form
2.2.16b. Query optimisation implementationIndexing accelerates query resolutionClosed comparison (intra-tuple)
all variables/attributes within single tuplee.g. x < 100
Open comparison (inter-tuple)variables span multiple tuples
Essentially a sorting problemInternal sorting covered (pre-requisites)Need external sort for non-cached lists
2.2.17. Query optimisationExternal sorting
Stems from large disk (2y), small memory (1y)Sort-merge strategy
Sort runs (small sets of total data file)Then merge runs back together
Used inSELECT, to accelerate selection (by index)PROJECT, to eliminate duplicatesJOIN, UNION and INTERSECTION
3. Database DesignThis is the Database Design course theme.[Complete set of notes PDF 295Kb].
3.1. Functional DependencyIn this lecture we look at...
Databases http://www.lightenna.com/book/export/s5/101
45 of 84 25-2-07 4:23 pm
[Section notes PDF 64Kb]
3.1.01. IntroductionWhat is relational design?
Notion of attribute distributionConceptual-level optimisation
How do we asses the quality of a design?
3.1.02. Value in designAllocated arbitrarily by DBD under ER/EERGoodness at
Internal/storage level (base relations only)Reducing nulls - obvious storage benefits /frequentReducing redundancy - for efficient storage/anomalies
Conceptual levelSemantics of the attributes /single entity:relationNo spurious tuple generation /no match Attr,-PK/FK
3.1.03. Initial stateDatabase designUniversal relation
R = {A1, A2, …, An}Set of functional dependencies F
Decompose R using F toD = {R1, R2, …, Rn}D is a decomposition of R under F
3.1.05. AimsAttribute preservation
Union of all decomposed relations = originalLossless/non-additive join
For every extension, total join of r(Ri) yeilds r(R)no spurious/erroneous tuples
3.1.06. Aims (also)Dependency preservation
Constraints on the databaseX Y in F of R, appearsdirectly in Ri
Attributes X È Y allcontained in Ri
Databases http://www.lightenna.com/book/export/s5/101
46 of 84 25-2-07 4:23 pm
Each relation Ri in 3NF
But what’s a dependency?
3.1.07. Functional dependencyConstraint between two sets of attributes
Formal method for grouping attributesDB as one single universal relation/-literal
R = {A1,A2,…,An}Two sets of attributes, X ÍR,Y Í R
Functional dependency (FD or f.d.) X face=Arial> Y
If t1[X] = t2[X], then t1[Y] = t2[Y]Values of the Y attribute depend on value of XX functionally determines Y, not reverse necessarily
3.1.08. Dependency derivationRules of inferencereflexive: if X Ê Y then X face=Symbol>® Yaugment: {X Y} XZ face=Symbol>® YZtransitive: {X Y,Y face=Symbol>® Z} X face=Arial> Z
Armstrong demonstrated completefor closures
3.1.09. Functional dependencyIf X is a key (primary and/or candidate)
All tuples ti have a unique value for XNo two tuples (t1,t2) share a value of X
Therefore X YFor any subset of attributes Y
ExamplesSSN {Fname, Minit, Lname}{Country of issue, Driving license no} face=Arial> SSNMobile area code Mobilenetwork (not anymore)
Databases http://www.lightenna.com/book/export/s5/101
47 of 84 25-2-07 4:23 pm
3.1.10. ProcessTypically start with set of f.d., F
determined from semantics of attributesThen use IR1,2,3 to infer additional f.d.sDetermine left hand sides (Xs)
Then determine all attributes dependent on XFor each set of attributes X,
determine X+ :the set of attributes f.d’ed by X on F
3.1.11. AlgorithmCompute the closure of X under F: X+
xplus = x;do
oldxplus = xplus;for (each f.d. Y Z in F)
if (xplus Ê Y) then xplus= xplus È Z;
while (xplus = = oldxplus);
3.1.12. Function dependencyConsider a relation schema R(A,B,C,D) and a set F of functionaldependencies
Aim to find all keys (minimal superkeys),by determining closures of all possible X subsets of R’s attributes,e.g.
A+, B+, C+, D+,AB+, AC+, AD+, BC+, BD+, CD+ABC+, ABD+, BCD+ABCD+
3.1.13. Worked exampleLet R be a relational schema R(A, B, C, D)Simple set of f.d.sAB C, C face=Symbol>® D, D face=Arial> ACalculate singletons
A+, B+, C+, D+,Pairs
Databases http://www.lightenna.com/book/export/s5/101
48 of 84 25-2-07 4:23 pm
AB+, AC+,…Triples
and so on
3.1.14. Worked exampleCompute sets of closures
AB C, C face=Symbol>® D, D face=Arial> A
1. SingletonsA+ AB+ BC+ CDAD+ AD
Question: are any singletons superkeys?
3.1.15. F.d. closure example2. Pairs (note commutative)
AB+ ABCDAC+ ACDAD+ ADBC+ ABCDBD+ ABCDCD+ ACD
Superkeys?
3.1.16. F.d. closure example3. Triples
ABC+ ABCDABD+ ABCDBCD+ ABCD
Keys?
4. Quadruples
Databases http://www.lightenna.com/book/export/s5/101
49 of 84 25-2-07 4:23 pm
ABCD+ ABCD
3.1.17. F.d. closure summarySuperkeys:
AB, BC, BD, ABC, ABD, BCD, ABCDMinimal superkeys (keys)
AB, BC, BD
3.2. Normal FormsIn this lecture we look at...[Section notes PDF 121Kb]
3.2.01. Orthogonal designInformation Principle:
The entire information content of the database is represented in oneand only one way, namely as explicit values in column positions intables
Implies that two relations cannot have the same meaningunless they explicitly have the same design/attributes (includingname)
3.2.02. NormalizationReduced redundancyOrganised data efficientlyImproves data consistency
Reduces chance of update anomaliesData duplicated, then updated in only one location
Only duplicate primary keyAll non-key data stored only once
Data spread across multiple tables, instead of one Universal relation R
3.2.03. Good or bad?Depends on ApplicationOLTP (Transaction processing)
Lots of small transactionsNeed to execute updates quickly
OLAP (Analytical processing/DSS)Largely Read-onlyRedundant data copies facilitate Business Intellegence applications,e.g. star schema (later)
3NF considered ‘normalised’
Databases http://www.lightenna.com/book/export/s5/101
50 of 84 25-2-07 4:23 pm
save special cases
3.2.04. Normal forms (1NF)First Normal form (1NF)
Disallows multivalued attributesPart of the basic relational model
Domain must include only atomic valuessimple, indivisible
Value of attribute-tuple in extension of schemat[Ai] Î dom(Ai)
3.2.05. Normalisation (1NF)Remove fields containing comma separated listsMulti-valued attribute (AMV) of RiCreate new relation (RNEW)
with FK to Ri[PK]RNEW(UID, AMV, FKI)
3.2.06. Normalisation (1NF)On weak entity
On strong entity
3.2.07. Normal forms (2NF)A relation Ri is in 2NF if:
Every nonprime attribute A in Ri isfully functionally dependent on 1y key of R
Databases http://www.lightenna.com/book/export/s5/101
51 of 84 25-2-07 4:23 pm
If all keys are singletons, guaranteed
If Ri has composite key areall non-key attributes fully functionally dependenton all attributes of composite key?
3.2.08. Normal forms (2NF)Second normal form (2NF)
Full functional dependency X face=Arial> Y
A Î X, (X - {A}) �Y
If any attribute A is removed from XThen X Y no longer holds
Partial functional dependencyA Î X, (X - {A}) face=Symbol>® Y
3.2.09. Normal forms (2NF)In context
Not 2NF: AB face=Arial> C, A C
AB C is not in 2NF,because B can be removed
Not 2NF: AB CDE, B face=Symbol>® DE
because attributes D&E are dependent on part of the composite key(B of AB), not all of it
3.2.10. Normalisation (2NF)Split attributes not depended on all of the primary key into separate relations
Databases http://www.lightenna.com/book/export/s5/101
52 of 84 25-2-07 4:23 pm
3.2.11. Normal forms (BCNF)Boyce-Codd Normal form (BCNF)
Simpler, stricter 3NFBCNF 3NF3NF does not imply BCNF
nontrivial functional dependency X face=Arial> YThen X must be a superkey
3.2.12. Normal forms (3NF)Third Normal form (3NF)Derived/based on transitive dependencyFor all nontrivial functional dependenciesX face=Arial> AEither X must be a superkeyOr A is a prime attribute(member of a key)
3.2.13. Normal forms in contextAB C, C face=Symbol>® D, D face=Arial> AIn context
3NF? YesBecause AB is a superkey andD and A are prime attributes
BCNF? NoBecause C and D are not superkeys(even though AB is)
3.2.14. Normalisation (3NF)CarMakes not in 3NF because:
singleton key Anon-trivial fd B C
B not superkey, C not prime attribute
Databases http://www.lightenna.com/book/export/s5/101
53 of 84 25-2-07 4:23 pm
3.3. OODBIn this lecture we look at...[Section notes PDF 34Kb]
3.3.01. IntroductionDatabase architectures, beyondWhy OODBMS?ObjectStoreCORBA object distribution standard
3.3.02. Large DBMSComplex entity fragmentation
across many relationsBreaks the miniworld-realworld dichotomyRequires conceptual abstracting layerDifficult to retrieve all information for xCompounded by version control
3.3.03. Object orientationObject components (icv triples)
Object Identity (OID), Ireplaces primary key
Type constructor, chow the object state is constructed from sub-compe.g. atom, tuple (struct), set, list, bag, array
Object state, vObject behaviour/action
3.3.04. Desirable featuresEncapsulation
Abstract data types
Databases http://www.lightenna.com/book/export/s5/101
54 of 84 25-2-07 4:23 pm
Information hidingObject classes and behavior
Defined by operations (methods)Inheritance and hierarchiesStrong typing (no illegal casting)
don’t think about inheritance just yet
3.3.05. Desirable featuresPersistence
Objects exist after terminationNaming and reachability mechanismLate binding in Java
Performanceuser-def functions executed on server, not client
Extension into relational modelDomains of objects, not just valuesDomain hierarchies etc.
3.3.06. Desirable featuresPolymorphism
aka. operator overloadingsame method name/symbolmultiple implementations
Easy link to Programming languagespopular OO language like Java/C++Better than PL/SQL integrationMuch better than PL-JDBC integration!
Highly suitable for multimedia data
3.3.07. Undesirable featuresObject Ids
Artificial Real-Mini, double edged swordExposes inner workings (suspended abstraction)
Lack of integrity constraintsNo concept of normalisation/formsExtreme encapsulation
e.g. creation of many accessor/mutator methodsLack of (better) standardisation
3.3.08. More undesirablesOriginally no mechanism for specifyingrelationships between objectsIn RDBMS – relationships are tuplesIn OODBMS – relationships should be properties of objects
Databases http://www.lightenna.com/book/export/s5/101
55 of 84 25-2-07 4:23 pm
3.3.09. ObjectStorePackages supporting Java or C++C++ package uses:
C++ class definition for DDLinsert(e), remove(e) and create collections, for DML
Bidirectional relationship facilityPersistency transparency
Identical pointers to persistent and transient objects
3.3.10. CORBACommon Object Request Broker ArchitectureObject communication (unifying paradigm)
DistributedHeterogeneousNetwork, OS and language transparency
Java implementation org.omg.CORBAAlso C, C++, ADA, SMALLTALK
3.4. Type Inheritance and EER diagramsIn this lecture we look at...[Section notes PDF 117Kb]
3.4.01. IntroductionDesign/schema side (Entity types)Object-orientated concepts
Java, C++ or UMLSub/superclasses and inheritance
EER diagramsEER to Relational mapping
3.4.02. OOInheritance concept
Attributes (and methods)Subtypes and supertypesSpecialisation and GeneralisationER diagrams
show entities/entity setsEER diagrams
show type inheritance
additional 8th step to ER->Relational mapping
Databases http://www.lightenna.com/book/export/s5/101
56 of 84 25-2-07 4:23 pm
3.4.03. ObjectsBasic guide to JavaObject, classes as blueprintsObject, collection of methods and attributesMiniworld model of real world thingsObject, entity in database terms
3.4.04. AbstractSimilar objectsCar Park exampleStudent exampleShared properties/attributesGeneralisationReverse, specialisation
3.4.05. RelationshipsUsing English as model‘Is a’ (inheritance)‘Has a’ (containment)Nouns as objectsVerbs as methodsAdjectives as variables (sort of)
3.4.06. ClassesSuperclasses (Student)Subclasses (Engineer, Geographer, Medic)InheritanceSubclass inherits superclass attributes
Union of specific/local and general attributesInheritance chains
Person -> Student -> Engineer -> Computer Scientist
3.4.07. EER Fruit examplePartial participationDisjoint subclassesA fruit may be either a pear or an apple or a banana, or none of them. A fruit may not be a pear and a banana, an apple and a banana, an apple and a pear....
Databases http://www.lightenna.com/book/export/s5/101
57 of 84 25-2-07 4:23 pm
3.4.08. EER Wine exampleTotal, disjointEquivalent to Java Abstract classesA Wine has to be either Red, White or Rosé cannot be both more
3.4.09. More extended (EER)Specialisation lattices
and HierarchiesMultiple inheritanceUnion of two superclasses (u in circle)In addition to basic ER notation
3.4.10. EER diagramatic notationSubset symbol to illustratesub/superclass relationshipdirection of relationshipCircle to link super to subclasses
DisjointOverlappingUnion
3.4.11. Disjointness constraintDisjointness (d in circle) – single honours
Databases http://www.lightenna.com/book/export/s5/101
58 of 84 25-2-07 4:23 pm
Overlapping (o in circle) – joint honours/sportsMembership condition on same attribute
attribute-defined specialisationdefining attributeimplies disjointness
versus user-definedeach entity type specifically defined by user
3.4.12. Completeness constraintTotal specialisation
Every entity in the superclass must be a member of atleast 1 subclassDouble line (as ER)
Partial specialisationSome entites may belong to atleast 1 subclass, or none at allSingle line
Yields 4 possibilities(Total-Dis, Total-Over, Partial-Dis, Partial-Over)
3.4.13. EER Chip exampleTotal, overlappingA Chip may has to be at least one of FPA Unit, Reg Block, L1 Cache, and may be more than one type
3.4.14. EER Multiple inheritanceType hierarchiesSpecialisation latticesWell, sir, the Supreme Court of the United States has determined that the tomato is for legal and commercial purposes both a fruit and a vegetable. So we can legally refer to tomato juice as 'vegetable' juice.
Databases http://www.lightenna.com/book/export/s5/101
59 of 84 25-2-07 4:23 pm
Candice, General Foods
3.4.15. EER to Relational MappingInitially following 7 ER stagesStage 84 different options
Optimal solution based on problemLet C be superclass, S1…m subclasses
3.4.16. Stage 8Create relation for C, and relations for S1..m each with a foreign key to C(primary key)Create relations for S1..m each including all attributes of C and its primary key
3.4.17. Stage 8Create a single relation including all attributes of C u S1..m and a type/discriminating attribute
only for disjoint subclassesCreate a single relation as above, but include a boolean type flag for each subclass
works for overlapping, and also disjoint
3.5. System DesignIn this lecture we look at...[Section notes PDF 64Kb]
3.5.01. Databases in ApplicationWhere’s the data?Programmer driven futureOODBMS limitationsRDBMS longevity
Databases http://www.lightenna.com/book/export/s5/101
60 of 84 25-2-07 4:23 pm
System design byData store, delivery, interface
Case study
3.5.02. Where’s the data?Previously covered distance from User to Data (and reason for it)Client-Server data model creates DBMS
P2P alternativeAccountabilityDistribution (BitTorrent, eDonkey)Caching
3.5.03. Where’s the data?Answer: everywhereBut where is it meaningful?Answer: for whom?
Quality paradigmLarge projects require large teams
Team overhead (ref 2nd year)Code responsibilitiesData/data model resp.Object responsibilities
3.5.04. Web application data supportWeb application programmingGoal, dynamically produced XHTMLClient side designer-programmer split
CSS, XHTMLServer side programmer-programmer split
Old school: query design, integratorNew school: MVC (Model-View-Controller)
Controller – user inputModel – modelling of external worldView – visual feedback
Databases http://www.lightenna.com/book/export/s5/101
61 of 84 25-2-07 4:23 pm
3.5.05. CMSContent Management Systempart of other coursesCMS is a DBMSZope/Plone and ZODBe107, Drupal and SeagullZend MVC Framework (pre-beta)
3.5.06. OODBMS limitationsFuture unknownRDBMS supports
Application data sharingPhysical/logical data independence/viewsConcurrency controlConstraints
at inception these requirements not knownRDBMS mathematical basis àextensibleCrude Type Inheritance (see EER mapping)OODBMS as construction kit
Databases http://www.lightenna.com/book/export/s5/101
62 of 84 25-2-07 4:23 pm
3.5.07. Weaknesses in RDBMSData type supportUnwieldy, created 3VL (nulls)Type Inheritance and RelationshipsTuple:Entity fragmentation
not to be confused with ‘fragmentation’Entity approximation requires joins
3.5.08. System designClient specificationsVariance amongst Mobile devicesRich-media Content deliveryWhere’s the data? (M – media database)Where’s it going? (C – mobile browser)How’s it going to get there? (query design)What’s it going to look like? (V – XHTML)
3.5.09. Muddy bootsThe real world of databasesMassive Excel spreadsheetsAccess MigrationNormalisationUpdate implicationsVisual language of the Internet limitationsFuture of browser components
4. Distributed systemsThis is the Distributed systems course theme.
[Complete set of notes PDF 109Kb]
4.1. TransactionIn this lecture we look at...[Section notes PDF 86Kb]
4.1.01. Distributed DatabasesTransactionsUnpredictable failure
Commit and rollback
Databases http://www.lightenna.com/book/export/s5/101
63 of 84 25-2-07 4:23 pm
Stored proceduresBrief PL overview
Cursors
4.1.02. TransactionsReal world database actionsRarely single stepFlight reservation example
Add passenger details to rosterCharge passenger credit cardUpdate seats availableOrder extra vegetarian meal
4.1.03. TransactionsReal world database actionsRarely single stepFlight reservation example
Add passenger details to rosterCharge passenger credit cardUpdate seats availableOrder extra vegetarian meal
4.1.04. Desirable properties of transactionsACID testAtomicity
transaction as smallest unit of processingtransactions complete entirely or not at all
consequences of partial completion in flight example
Consistencycomplete execution preserves database constrained state/integritye.g. Should a transaction create an entity with a foreign key then thereference entity must exist (see 4 constraints)
4.1.05. ACID test continuedIsolation
not interfered with by any other concurrent transactions
Databases http://www.lightenna.com/book/export/s5/101
64 of 84 25-2-07 4:23 pm
Durable (permanency)commited changes persist in the database, not vulernable to failure
4.1.06. CommitNotion of Commit (durability)Transaction failures
From flight reservation exampleAdd passenger details to rosterCharge passenger credit cardUpdate seats available: No seats remainingOrder extra vegetarian meal
Rollback
4.1.07. PL/SQL overviewLanguage format
DeclarationsExecutionExceptionsHandling I/OFunctionsCursors
4.1.08. PL/SQLBlocks broken into three parts
DeclarationVariables declared and initialised
ExecutionVariables manipulated/actioned
ExceptionError raised and handled during exec
4.1.09. DeclarationDECLARE
age NUMBER;name VARCHAR(20);surname employee.fname%TYPE;addr student.termAddress%TYPE;
4.1.10. ExecutionBEGIN (not in order)
Databases http://www.lightenna.com/book/export/s5/101
65 of 84 25-2-07 4:23 pm
/* sql_statements */UPDATE employee SET salary = salary+1;
/* conditionals */IF (age < 0) THEN
age: = 0;ELSE
age: = age + 1;END IF;
/* transaction processing */COMMIT; ROLLBACK;
/* loops */ /* cursors */[END;] (if no exception handling)
4.1.11. Exception passingBeginnings of PL I/OCREATE TABLE temp (logmessage varchar(80));
Can create transfer/bridge relation outside
Within block (e.g. within exception handler)WHEN invalid_age THEN
INSERT INTO temp VALUES( ‘Cannot have negative ages’);END;
SELECT * FROM temp;To review error messages
4.1.12. Exception handlingDECLARE
invalid_age exception;
BEGINIF (age < 0) THEN
RAISE invalid_ageEND IF;
EXCEPTION
Databases http://www.lightenna.com/book/export/s5/101
66 of 84 25-2-07 4:23 pm
WHEN invalid_age THENINSERT INTO temp VALUES( ‘Cannot have negative ages’);
END;
4.1.13. CursorsCursors
Tuple by tuple processing of relationsThree phases (two)
DeclareUseException (as per normal raise)
4.1.14. ImpactPL blocks coherently change database stateNo runtime I/ODifficult to debugSQL tested independently
4.1.15. PL CursorsDECLAREname_attr EMPLOYEE.NAME%TYPE;ssn_attr EMPLOYEE.SSN%TYPE;/* cursor declaration */CURSOR myEmployeeCursor IS
SELECT NAME,SSN FROM EMPLOYEEWHERE DNO=1FOR UPDATE;
emp_tuple myEmployeeCursor%ROWTYPE;
4.1.16. Cursors executionBEGIN/* open cursor */OPEN myEmployeeCursor;/* can pull a tuple attributes into variables */FETCH myEmployeeCursor INTO name_attr,ssn_attr;/* or pull tuple into tuple variable */FETCH myEmployeeCursor INTO emp_tuple;CLOSE myEmployeeCursor;
[LOOP…END LOOP example on handout]
Databases http://www.lightenna.com/book/export/s5/101
67 of 84 25-2-07 4:23 pm
4.1.17. IntroductionConcurrent transactionsDistributed databases (DDB)FragmentationDesirable transaction propertiesConcurrency control techniques
LockingTimestamps
4.1.18. NotationLanguage
PL too complex/long-windedSimplified database model
Database as collection of named itemsGranularity, or size of data itemDisk block based, each block X
Basic transaction language (BTL)read_item(X);write_item(X);Basic algebra, X=X+N;
4.1.19. Transaction processingDBMS Multiuser system
Multiple terminals/clientsSingle processor, client side execution
Single centralised databaseMultiprocessor, serverResolving many transactions simultaneously
Concurrency issueCoverage by previous courses (e.g. COMS12100)PL/SQL scripts (Transactions) as processes
Interleaved execution
4.1.20. TransactionsTwo transactions, T1 and T2Overlapping read-sets and write-setsInterleaved executionConcurrency control requiredPL/SQL example
Commit; and rollback;
4.1.21. Concurrency issues
Databases http://www.lightenna.com/book/export/s5/101
68 of 84 25-2-07 4:23 pm
Three potential problemsLost updateDirty readIncorrect summary
All exemplified using BTLTransaction diagrams to make clearerC-like syntax for familiarityMany possible examples of each problem
4.1.22. Lost updateT1read_item(X);X=X-N;
write_item(X);read_item(Y);
Y=Y+N;write_item(Y);T2
read_item(X);X=X+M;
write_item(X);
Databases http://www.lightenna.com/book/export/s5/101
69 of 84 25-2-07 4:23 pm
4.1.23. Dirty read (or Temporary update)T1read_item(X);X=X-N;write_item(X);
<T1 fails><T1 rollback>
read_item(X);X=X+N;write_item(X);T2
read_item(X);X=X+M;write_item(X);
4.1.24. Incorrect summaryT1
read_item(X);X=X-N;write_item(X);
Databases http://www.lightenna.com/book/export/s5/101
70 of 84 25-2-07 4:23 pm
read_item(Y);Y=Y-N;write_item(Y);T2sum=0;read_item(A)sum=sum+A;
read_item(X);sum=sum+X;read_item(Y);sum=sum+Y;
4.1.25. SerializabilitySchedule S is a collection of transactions (Ti)Serial schedule S1
Transactions executed one after the otherPerformed in a serial orderNo interleavingCommit or abort of active transaction (Ti) triggersexecution of the next (Ti+1)If transactions are independent
all serial schedules are correct
4.1.26. SerializabilitySerial schedules/histories
No concurrencyUnfair timeslicing
Non-serial schedule S2 of n transactionsSerializable if
equivalent to some serial schedule of the same n transactionscorrect
Databases http://www.lightenna.com/book/export/s5/101
71 of 84 25-2-07 4:23 pm
n! serial schedules, more non-serial
4.1.27. DistributionDDB, collection of
multiple logically interrelated databasesdistributed over a computer networkDDBMS
Multiprocessor environmentsShared memoryShared diskShared nothing
4.1.28. AdvantagesDistribution transparency
Multiple transparency levelsNetworkLocation/dept autonomyNamingReplicationFragmentation
Reliability and availabilityPerformance, data localisationExpansion
4.1.29. FragmentationBreaking the database into
logical unitsfor distribution (DDB design)
Global directory to keep track/abstractFragmentation schema/allocation schema
RelationalHorizontal
Derived (referential), complete (by union)VerticalHybrid
4.1.30. Concurrency control in DDBsMultiple copiesFailure of individual sites (hosts/servers)Failure of network/links
Databases http://www.lightenna.com/book/export/s5/101
72 of 84 25-2-07 4:23 pm
Transaction processingDistributed commitDeadlock
Primary/coordinator site - voting
4.1.31. Distributed commitCoordinator electedCoordinator prepares
writes log to disk, open sockets, sends out queriesProcess
Coordinator sends ‘Ready-commit’ messagePeers send back ‘Ready-OK’Coordinator sends ‘Commit’ messagePeers send back ‘Commit-OK’ message
4.1.32. Query processingData transfer costs of query processing
Local biasHigh remote access costVast data quantities to build intermediate relations
DecompositionSubqueries resolved locally
4.1.33. Concurrency controlMust avoid 3+ problems
Lost update, dirty read, incorrect summaryDeadlock/livelock - dining example
Data item granularitySolutions
Protocols, validationLockingTimestamps
4.1.34. Definition of termsBinary (two-state) lockslocked, unlocked associated with item XMutual exclusionFour requirements
Must lock before accessMust unlock after all accessNo relocking of already lockedNo unlocking of already unlocked
Databases http://www.lightenna.com/book/export/s5/101
73 of 84 25-2-07 4:23 pm
4.1.35. DefinitionMultiple mode lockingRead/write locksaka. shared/exclusive locksLess restrictive (CREW)read_lock(X), write_lock(X), unlock(X)
e.g. acquire read/write_locknot reading or writing the lock state
4.1.36. Rules of Multimode locksMust hold read/write_lock to readMust hold write_lock to writeMust unlock after all accessCannot upgrade/downgrade locks
Cannot request new lock while holding one
Upgrading permissable (read lock to write)if currently holding sole read access
Downgrading permissable (write lock to read)if currently holding write lock
4.2. ConcprotoIn this lecture we look at...[Section notes PDF 37Kb]
4.2.01. IntroductionConcurrency control protocolsConcurrency techniques
Locks, Protocols, TimestampsMultimode locking with conversion
Guarenteeing serializabilityAssociated costTimestamps and ordering
4.2.02. Guarenteeing serializability
Databases http://www.lightenna.com/book/export/s5/101
74 of 84 25-2-07 4:23 pm
Two phase locking protocol (2PL)Growing/expanding
Acquisition of all locksOr upgrading of existing locks
ShrinkingRelease of locksOr downgrading
Guarentees serializabilityequivalency without checking schedules
4.2.03. A typical transaction pairT1
read_lock(Y);read_item(Y);unlock(Y);
write_lock(X);read_item(X);X=X+Y;write_item(X);unlock(X);
4.2.04. 2PL: Guaranteed serializableT1
read_lock(Y);read_item(Y);write_lock(X);unlock(Y);read_item(X);X=X+Y;write_item(X);unlock(X);
4.2.05. Guarantee cost
Databases http://www.lightenna.com/book/export/s5/101
75 of 84 25-2-07 4:23 pm
T2 ends up waiting for read access to XEither after T1 finished
T1 cannot release X even though it has finished using itIncorrect phase (still expanding)
Or before T1 has used itT1 has to claim X during expansion, even if it doesn’t useit until later
Cost: limits the amount of concurrency
4.2.06. AlternativesConcurrency control
Locks limit concurrencyBusy waiting
Timestamp ordering (TO)Order transaction execution
for a particular equivalent serial scheduleof transactions ordered by timestamp value
Note: difference to lock serial equivalentNo locks, no deadlock
4.2.07. TimestampsUnique identifier for transaction (T)Assigned in order of submission
Timelinear time, current date/sys clock - one per cycle
Countercounter, finite bitspace, wrap-around issues
Timestamp aka. Transaction start timeTS(T)
4.2.08. TimestampingDBMS associates two TS with each item
Read_TS(X): gets read timestamp of item Xtimestamp of most recent successful read on X= TS(T) where T is youngest read transaction
Write_TS(X): gets write timestamp of item Xas for read timestamp
Databases http://www.lightenna.com/book/export/s5/101
76 of 84 25-2-07 4:23 pm
4.2.09. TimestampingTransaction T issues read_item(X)
TO algorithm compares TS(T) with Write_TS(X)Ensures transaction order execution not violated
If successful, Write_TS(X) <=TS(T)
Read_TS(X) = MAXTS(T), current Read_TS(X)If fail, Write_TS(X) > TS(T)
T aborted, rolled-back and resubmitted with new TSCascading rollback
4.2.10. TimestampingTransaction T issues write_item(X)
TO algorithm compares TS(T) with Read_TS(X)and compares TS(T) withWrite_TS(X)
If successful, op_TS(X) <= TS(T)Write_TS(X) = TS(T)
If fail, op_TS(X) > TS(T)T aborted, cascade etc.
All operations focus on not violating the execution order defined by thetimestamp ordering
4.2.11. UpdatesInsertion
2PL: DBMS secures exclusive write-lockTOA: op_TS(X) set to TS(creating transaction)
Deletion2PL: as insertTOA: waits to ensure later transactions don’t access
Phantom problemRecord being inserted matches inclusion conditionsof another transaction(e.g. selection by dno=5)Locking doesn’t guarantee inclusion
(need index locking)
5. Real world
Databases http://www.lightenna.com/book/export/s5/101
77 of 84 25-2-07 4:23 pm
This is the Real world course theme.
[Complete set of notes PDF 206Kb]
5.1. WebIn this lecture we look at...[Section notes PDF 133Kb]
5.1.01. Databases for the InternetPath from DB to UserInformation flowData formats (OO)Format transitionsLimitations/channel
right nowThe Future
Databases http://www.lightenna.com/book/export/s5/101
78 of 84 25-2-07 4:23 pm
5.1.02. OOObject orientated approach
Consistent/optimised development modelGood approximation of real worldCloser link to mini-world
Java and PHPDB persistenceUML
5.1.03. Java and PHP in contextJava
JSP (server-side)Javascript (client-side)
PHPServer side only
JSON or XMLObject communication
Ideal scenarioJava – load times
5.1.04. In a perfect worldHomogenous data format/data modelDB stores objects insteadObjects transferred
RobustLightweightFastConsistent (more later in Transactions)Caching
5.1.05. In the real worldHeterogenous data modelObject translation/wrappersDifferent languages features at different layersMinimal subset of OO functionality available end-to-endGoing to look at information flow/functionality provided
5.1.06. UserLimitations of being human
short term memorylong term familiarity
language of the Internet
Databases http://www.lightenna.com/book/export/s5/101
79 of 84 25-2-07 4:23 pm
hypertext linkingform filling
Advantages of being humanimpatience, no waitingwants instant response
5.1.07. BrowserHttp requestsForms
PostGet
Form fieldsBy name, by IDHidden
Javascript/DOM tree
5.1.08. InternetCommunication mediumGood for transferring dataNot good for transforming datae.g. Light in aire.g. Signal over CAT5e/UTP cable
5.1.09. Web serverStraight HTML pagesDynamic HTML pages
PHP exampleJSP example
As above with RDBMS integrationPHP PDO example
As above with ObjectsPHP DBDO example
5.1.10. load, edit, submit, act timeline
Databases http://www.lightenna.com/book/export/s5/101
80 of 84 25-2-07 4:23 pm
5.1.11. href click, versus form postProtocol stackBasic up-downShortcutsBrowser cacheWeb server
assembled page cachephp object cache
DB optimised queries
Databases http://www.lightenna.com/book/export/s5/101
81 of 84 25-2-07 4:23 pm
5.1.12. Examples from the webGoogle Maps
linkCar selector and Dealer locator
link
5.2. Decision SupportIn this lecture we look at...[Section notes PDF 85Kb]
5.2.01. IntroductionDecision support systems (DSS)Duplicates of live systems, historical archivingPrimarily read-onlyLoad and refresh operationsIntegrity
Databases http://www.lightenna.com/book/export/s5/101
82 of 84 25-2-07 4:23 pm
Assumptions about initial dataLarge, indexed, redundancy
5.2.02. DSS ManagementDesign
LogicalTemporal keys, required to distinquish historical data (since:tocurrent & during:within interval)
Physical (Hash indexes, Bitmap indexes)Controlled Redundancy
Synchronisation/update propogationSynchronous (update driven)Asynchronous (query driven)
5.2.03. Data PreparationExtract
pulling from live database system(s)CleansingTransformation and Consolidation
migrating from live or legacy system design
to DSS designLoad (DSS live/query-able)Refresh (latest update)
5.2.04. QueryingBoolean expression complexity
heavy WHERE clausesJoin complexity
Normalised databases, many tablesFacts distributed across tablesJoins required to answer complex questions
Function and Analytic complexityOften require non-DBMS functionsSmaller queries with interleaved code
5.2.05. Data WarehouseSpecific example of DSSSubject-orientated
e.g. customers/productsNon-volatile
once inserted, items cannot be updated
Databases http://www.lightenna.com/book/export/s5/101
83 of 84 25-2-07 4:23 pm
Time variantTemporal keys
Accuracy and granularity issues
5.2.06. DB Company organisationBy example
5.2.07. Dimensional SchemaConsider product, customer, sales dataEach sale represents a specific event
when a product was purchasedwhen a customer bought somethingwhen a sale was recorded
Each can be thought of as an axis
or dimension (3D)Each occurred at a moment in time (4D)
5.2.08. Star schemae and HypercubesData centralised in ‘fact’ tableReferencing creates star patternDimensions as satellite tablesNormalising creates snowflake schema
Databases http://www.lightenna.com/book/export/s5/101
84 of 84 25-2-07 4:23 pm
5.2.09. HypercubesHypercube is also a multi-processor topology inspired by a 4D shapeUsed by Intel’s iPSC/2Good at certain database operationse.g. Duplicate removalMIMD