38
© © 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer Manthey Manthey Foundations Foundations of IM of IM 1 1 Database Management Systems Database Management Systems Database Management Systems Database Management Systems Chapter Chapter 5 5 Foundations Foundations of Information Systems (WS 2008/09) of Information Systems (WS 2008/09)

Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 11

Database Management SystemsDatabase Management Systems

Database Management SystemsDatabase Management SystemsDatabase Management Systems

–– ChapterChapter 5 5 ––

Foundations of Information Systems (WS 2008/09)FoundationsFoundations of Information Systems (WS 2008/09)of Information Systems (WS 2008/09)

Page 2: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 22

DBMS DBMS architecturearchitecture

User InterfaceUser InterfaceUser Interface

Schema ManagerSchema ManagerSchema Manager Query ManagerQuery ManagerQuery Manager Transaction ManagerTransaction Transaction ManagerManager

Storage ManagerStorage Storage ManagerManager

Data DictionaryData DictionaryDatabaseDatabase

LogLog

Main Main components components of a of a database management database management systemsystem (DBMS) and (DBMS) and their interactiontheir interaction::

Page 3: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 33

Storage manager basicsStorage manager basics

User InterfaceUser InterfaceUser Interface

Schema ManagerSchema ManagerSchema Manager Query ManagerQuery ManagerQuery Manager Transaction ManagerTransaction Transaction ManagerManager

Storage ManagerStorage Storage ManagerManager

Data DictionaryData DictionaryDatabaseDatabase

LogLog

Page 4: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 44

Storage hierarchy Storage hierarchy of a of a computercomputer

RegisterRegister

CacheCache

Main Main MemoryMemory

Disc Disc StorageStorage

Archive Archive StorageStorage

11--10 ns10 ns

1010--100 ns100 ns

100100--1000 ns1000 ns

10 ms10 ms

secsec

Access Access timestimes((approxapprox.):.):

1 ms = 1/1000 sec1 μs = 1/1000 ms1 ns = 1/1000 μs

11 msms = 1/1000 sec= 1/1000 sec1 1 μμs = 1/1000 mss = 1/1000 ms11 nsns = = 1/1000 1/1000 μμss

„Access gap“:factor 105

This is whereThis is wherequeries are queries are executedexecuted!!

This is whereThis is wheredata are data are storedstored!!

Page 5: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 55

Database Database bufferbuffer and and storage hierarchystorage hierarchy

Main Main memory memory ((processorprocessor))

Background Background store store (disc)(disc)((transienttransient))

((persistentpersistent))

replacereplace

fetchfetch

buffered pagesbuffered pages

•• OperationsOperations on on datadata are alwaysare always performedperformed in in mainmain memory onlymemory only!!•• ConsequenceConsequence: : Data Data to to be manipulated have be manipulated have to to be read into main store beforebe read into main store before

modification modification ((„„bufferingbuffering““) and ) and have have to to be written be written back back sometimes later sometimes later to to background storage background storage media media by the by the DBMS. DBMS.

•• For For thethe DBMS, DBMS, the machinethe machine‘‘s s main memory thus serves as main memory thus serves as a a bufferbuffer..

Page 6: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 66

StoringStoring tuplestuples in in files files on on pagespages

General General principle principle of of mappingmapping relations to relations to pagespages (on disc):(on disc):

•• per per relationrelation: A : A filefile subdivided into several subdivided into several subsequentsubsequent pagespages

•• per per pagepage: An : An internalinternal record record table table containing containing pointerspointers to all to all tuplestuples//records records on on this pagethis page

•• AdressingAdressing individual tuplesindividual tuples via via aatuple identifiertuple identifier (TID)(TID)

•• TID TID consistsconsists of of twotwo partsparts::•• A A page number page number (absolute (absolute addressaddress) ) •• A A pointer pointer in in the record the record table table pointing pointing toto

a a particular position within the pageparticular position within the pagewhere the respective tuple starts where the respective tuple starts (relative (relative addressaddress))

•• IndirectionIndirection isis usefuluseful duringduring reorganisationreorganisationof of thethe pagepage..

•• •• ••1 2 31 2 3

TIDTID

4711 24711 2

50015001 Foundations Foundations . . .. . .

50415041 Databases Databases . . .. . .

40524052 Logics Logics . . .. . .

page page # 4711# 4711

Page 7: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 77

Access Access pathspaths and and indexesindexes

•• DBMS DBMS normally make use normally make use of of various internal auxiliary data structuresvarious internal auxiliary data structures in orderin orderto to speed speed up up access access to to larger tableslarger tables. Such additional . Such additional data structures are calleddata structures are calledaccess pathsaccess paths..

•• The most important The most important such such access aids are calledaccess aids are called indexesindexes, , special forms special forms ofofsearch trees offering direct access search trees offering direct access to to values values of of particularly important particularly important andandfrequently accessed attributes frequently accessed attributes of a table (of a table (rather than sequentially searchingrather than sequentially searchingall all fields fields of all of all rowsrows). ). OftenOften, at least , at least the primary key attributethe primary key attribute(s) of a table(s) of a tableare supported by are supported by an an index for index for rapid rapid accessaccess..

•• TheThe notionnotion ‚‚index'index' isis motiviatedmotiviated by the concept by the concept of of index pagesindex pages of of booksbooks. . HereHere,,an additional an additional part part at at the the end of end of the book contains keywordthe book contains keyword--page reference listspage reference listsfor avoiding for avoiding to to search for certain keywords sequentiallysearch for certain keywords sequentially..

•• SQLSQL offersoffers specialspecial DDLDDL--commands for creating commands for creating and and deleting indexes deleting indexes to to oneoneor several columns or several columns of a table:of a table:

CREATE INDEX <index-name> ON <table-name> <list-of-column-names>

DROP INDEX <index-name>

CREATE CREATE INDEXINDEX <index<index--name> ON <tablename> ON <table--name> <listname> <list--ofof--columncolumn--names>names>

DROP DROP INDEXINDEX <index<index--name>name>

Page 8: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 88

•• Per Per relationrelation, , one index normally supports the primary keyone index normally supports the primary key ((oftenoften created automacreated automa--ticallytically):):

•• In In additionaddition, , severalseveral secondary indexessecondary indexes may exist for other frequently used columnsmay exist for other frequently used columns,,even if these columnseven if these columns//attributes attributes do do not possess the key propertynot possess the key property. .

•• For For implementingimplementing indexesindexes, , special data structures are used by the special data structures are used by the DBMS, DBMS, the choicethe choiceof of which can be influenced by specially authorizedwhich can be influenced by specially authorized DB DB administrators administrators in in largelargecommercialcommercial systems :systems :

•• Search treesSearch trees::•• ISAMISAM--filesfiles•• BB--treestrees and and variantsvariants•• RR--trees trees and and variantsvariants ((higherhigher--dimensional dimensional index structuresindex structures))

•• HashHash--tables tables ((tables associated with special access functionstables associated with special access functions))

•• When using When using an an indexindex, , there is always there is always a a „„tradeofftradeoff"" between resourcesbetween resources::A A reduction reduction in in access access time has to time has to be measured against be measured against an an increase increase in in administrationadministrationtime time forfor indexindex maintenancemaintenance and in and in storage used by the storage used by the extra extra index dataindex data. .

Indexes: Indexes: OverviewOverview

PrimärindexPrimPrimäärindexrindex

Page 9: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 99

•• simplestsimplest form of form of indexingindexing::

•• EachEach entry entry on on the the index pagesindex pages consists consists of of values values of of the indexed attribute the indexed attribute ((searchsearchkey key S). S). Each entry Each entry on on the the data pagesdata pages is is a a value value of of another attribute another attribute ((data data D). D).

•• On On the index pagesthe index pages, , search key values alternate with search key values alternate with pointerspointers to to correspondingcorrespondingdata pagesdata pages. .

•• SearchSearch withinwithin a a pagepage isis donedone sequentiallysequentially! ! ModificationsModifications on on index pagesindex pages maymayleadlead to to high high administration overheadadministration overhead ((rebalancing rebalancing of of neighbourneighbour pagespages,,redirectionredirection of of pointerspointers, , creation creation of of newnew pagespages, , moving moving of of keykey valuesvalues))

ISAMISAM--OrganisationOrganisation

SS11 SS22 . . . . . . SSkk . . . . . . SSll . . . S. . . Smm

DD1 1 DD22 . . . . . . DDss DDt t . . . . . . DDss DDv v . . . . . . DDww. . .. . .

Index-Sequential Access Method (ISAM)IndexIndex--SequentialSequential Access Access Method Method (ISAM)(ISAM)

≥≥ SS11 << SS22

index pageindex page

data pagesdata pages

. . .. . .. . .. . .

Page 10: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1010

pointerpointer

search keysearch key

For For primary indexprimary index::data entrydata entry

For For secondary indexsecondary index::TIDTID

BB--treetree: : IdeaIdea

Each node corresponds Each node corresponds to to one page one page in in storestore

Tree is Tree is balancedbalanced: : equally long branchesequally long branches

More than More than 50% of 50% of each page filled with dataeach page filled with data

Generalization Generalization of of the the ISAMISAM--ideaideato multiple to multiple levels levels of of indexing indexing plusplusmixing data mixing data and and index valuesindex values::

BB--TreeTree (Bayer, (Bayer, McCreight McCreight 1970)1970)

Page 11: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1111

Storage structuresStorage structures: : Summary Summary of of main ideasmain ideas

•• A A databasedatabase physically consists physically consists of a of a selection selection of of filesfiles accessed solely by theaccessed solely by thestorage manager storage manager of of the the DBMS.DBMS.

•• These These files reside files reside on on largelarge, , persistent storage persistent storage media, media, normally normally on on discsdiscs. . This This part part of a of a computercomputer‘‘s s store is called store is called secondary storagesecondary storage or background storeor background store, , as opposed as opposed to to its comparatively smallits comparatively small, , but but fast fast primaryprimary or main or main storestore..

•• There is There is no such no such thing as rows or columns thing as rows or columns in in filesfiles, , but but just just sequences sequences of of indiviindivi--dual dual symbolssymbols ((actually actually just just bitsbits!). !). ThusThus, , the the DBMS has to DBMS has to

•• codecode relational relational tuplestuples//fields as bit sequences when storing awayfields as bit sequences when storing away•• decodedecode them again when fetching them for manipulationthem again when fetching them for manipulation

•• There are There are a a few more basic principles few more basic principles of of physical structure physical structure of a of a databasedatabase::•• The beginning The beginning of of each tuple each tuple ((stored as stored as a a bit sequencebit sequence) ) is markedis marked

by by a a reference called reference called tuple identifiertuple identifier (TID).(TID).•• Tuples are summarized into larger units called Tuples are summarized into larger units called pagespages –– when when

accessing the database accessing the database no no tuplestuples, , but entire pages are fetchedbut entire pages are fetched..•• Pages Pages in turn in turn are organized are organized in form of in form of search treessearch trees, , called called indexesindexes. .

Page 12: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1212

Query Query manager basicsmanager basics

User InterfaceUser InterfaceUser Interface

Schema ManagerSchema ManagerSchema Manager Query ManagerQuery ManagerQuery Manager Transaction ManagerTransaction Transaction ManagerManager

Storage ManagerStorage Storage ManagerManager

Data DictionaryData DictionaryDatabaseDatabase

LogLog

Page 13: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1313

Query Query processingprocessing: : OverviewOverview

ScanningScanningParsingParsing

View expansionView expansion

OptimizationOptimizationOptimization

Code Code generationgenerationInterpretationInterpretation

Executable programExecutable program

Execution Execution planplan

SQL SQL queryquery

RARA--expressionexpression

1.1.

2.2.

3.3.

•• One of One of the most important the most important taskstasks of a DBMS of a DBMS isis efficient efficient processingprocessing of of queriesqueries..

•• In In principleprinciple, , queriesqueries areare treated treated likelike programsprograms writtenwritten in a in a programprogramlanguagelanguage, i.e., , i.e., theythey areare givengiven to ato acompilercompiler..

•• The most important phase The most important phase of of the compilation process is the the compilation process is the attemptattempt of of improving the improving the efficiency efficiency of of thethe final final execution phaseexecution phase, , called called queryquery optimizationoptimization..

Page 14: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1414

•• The optimization phase tries The optimization phase tries to to improve efficiencyimprove efficiency of of executionexecution, , even though it noteven though it notalways may reach always may reach an an „„optimaloptimal““ solution solution ((thus the notion is misleadingthus the notion is misleading).).

•• From From an an initialinitial internalinternal representation representation of of thethe queryquery in RA, an in RA, an equivalentequivalent but morebut moreefficiently executable efficiently executable algebra expressionalgebra expression is generated by equivalence preservingis generated by equivalence preservingtransformationstransformations: :

•• Afterwards Afterwards a a suitable suitable implementation variantimplementation variant is chosen for eachis chosen for each RARA--operatoroperator..ThisThis isis donedone byby exploiting severalexploiting several specialspecial parametersparameters of of the operandthe operand relations relations such such asas sizesize, , selectivityselectivity, , sortingsorting, , indexingindexing etc. etc. –– oftenoften cost models are usedcost models are used, , tootoo. .

•• At At the the end of end of thethe optimization phaseoptimization phase a a query executionquery execution planplan ((shortshort: QEP) : QEP) isis genegene--ratedrated whichwhich cancan bebe furtherfurther improvedimproved individuallyindividually ((„„byby handhand““) ) byby a DB a DB specialistspecialistwhenwhen usingusing a a commercialcommercial DBMS ("DBMS ("database tuningdatabase tuning").").

•• StrategiesStrategies and and heuristics used by heuristics used by a a commercial query optimizercommercial query optimizer are often veryare often verybadly documented as they constitute badly documented as they constitute a a „„well well hiddenhidden" " secret secret of of the the resp. resp. vendorvendor. .

Query Query optimizationoptimization: : PhasesPhases

Logical optimizationLogicalLogical optimizationoptimization

Physical optimizationPhysical Physical optimizationoptimization

Page 15: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1515

ExampleExample of of logicallogical optimization optimization (1)(1)

In which semester are those students enrolledwhich attend courses read by professor Sokrates?

In In whichwhich semestersemester areare thosethose students enrolledstudents enrolledwhich attend courseswhich attend courses readread byby professorprofessor Sokrates?Sokrates?

SELECT DISTINCT s.SemesterFROM Students s,

Attends a,Courses c,Professors p

WHERE p.Name = 'Sokrates'AND p.PersNr = c.ReadByAND a.CourseNr = c.CourseNrAND a.MatrNr = s.MatrNr

SELECT DISTINCTSELECT DISTINCT s.Semesters.SemesterFROMFROM StudentsStudents s,s,

Attends Attends a,a,Courses Courses c,c,Professors pProfessors p

WHEREWHERE p.Name = 'Sokrates'p.Name = 'Sokrates'ANDAND p.p.PersNrPersNr = c.= c.ReadByReadByANDAND a.a.CourseNrCourseNr = c.= c.CourseNrCourseNrANDAND a.a.MatrNrMatrNr = s.= s.MatrNrMatrNr

in SQLin SQL::

Page 16: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1616

ExampleExample ofof logical optimizationlogical optimization (2)(2)

ππs.Semesters.Semester

p.Name = 'Sokrates' p.Name = 'Sokrates' ∧∧p.p.PersNrPersNr = c.= c.ReadByReadBy ∧∧a.a.CourseNrCourseNr = c. = c. CourseNrCourseNr ∧∧a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

σσ

××pp

××

××cc

ss aa

Operator Operator treetree inincanonicalcanonical representationrepresentationof of thethe SQLSQL--queryquery::

Page 17: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1717

ExampleExample ofof logical optimizationlogical optimization (3)(3)

ππs.Semesters.Semester

p.Name = 'Sokrates' p.Name = 'Sokrates' ∧∧p.p.PersNrPersNr = c.= c.ReadByReadBy ∧∧a.a. CourseNrCourseNr = c.= c. CourseNrCourseNr ∧∧a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

σσ

××pp

××

××cc

ss aa

Splitting and Splitting and pushpushííng selectionsng selections

nicht verschiebbar !nicht verschiebbar !

Page 18: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1818

ExampleExample ofof logical optimizationlogical optimization (4)(4)

ππs.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy σσ

××

pp××

××

cc

ss aa

σσ p.Name = 'Sokrates'p.Name = 'Sokrates'

σσ

a.a.CourseNrCourseNr = c.= c.CourseNr CourseNr σσ

a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

Forming joins from products Forming joins from products and and selectionsselections

Page 19: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 1919

ExampleExample ofof logical optimizationlogical optimization (5)(5)

ππ s.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy

pp

cc

ss aa

σσ p.Name = 'Sokrates'p.Name = 'Sokrates'a.a.CourseNrCourseNr = c.= c.CourseNrCourseNr

a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

ReorderingReordering joinsjoins

SmallestSmallest operandoperand::Should participate Should participate inin1st 1st joinjoin!!

11

22

33

Page 20: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2020

ExampleExample ofof logical optimizationlogical optimization (6)(6)

ππ s.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy

ppcc

ss

aa

σσ p.Name = 'Sokrates'p.Name = 'Sokrates'

a.a.CourseNrCourseNr = c.= c.CourseNr CourseNr

a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

. . . all . . . all otherother joinsjoinsare then ordered are then ordered automaticallyautomatically 33

22

11

Page 21: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2121

•• JustificationJustification forfor reorderingsreorderings duringduring logicallogical optimizationoptimization::In In eacheach stepstep equivalence preservingequivalence preserving transformationstransformations are appliedare applied..

•• AlwaysAlways::•• Answer setAnswer set is is identicalidentical overover eacheach DBDB--statestate beforebefore and and afterafter transformation transformation

((„„equivalenceequivalence").").•• EvaluationEvaluation of of thethe transformedtransformed expressionexpression isis oftenoften (in (in many casesmany cases, , notnot alwaysalways) )

moremore efficientefficient than before thethan before the transformationtransformation. . •• ButBut::

•• Transformations Transformations do do notnot come come with any guarantee forwith any guarantee for efficiencyefficiency gainsgains!!•• Transformation Transformation rulesrules areare heuristicsheuristics ((„„rulesrules of of thumbthumb"), "), which have which have to to be be

appliedapplied in a in a „„cleverclever““ way.way.•• ThereforeTherefore: :

•• NotionNotion „„optimizationoptimization" " isis misleadingmisleading !!•• IfIf at all, at all, only certain only certain improvementsimprovements of of efficiencyefficiency cancan bebe achievedachieved, ,

rarelyrarely a (a (theoreticallytheoretically imaginableimaginable) optimal ) optimal solutionsolution will will bebe reachedreached..

LogicalLogical optimizationoptimization: : TheoreticalTheoretical foundationsfoundations

σp(R1 × R2) ⇒ (σp(R1)) × R2σσpp(R(R11 ×× RR22) ) ⇒⇒ ((σσpp(R(R11)))) ×× RR22

if attrif attr(p) (p) ⊆⊆ attr(Rattr(R11))and and attrattr(p) (p) ∩∩ attr(Rattr(R22) = ) = ∅∅e.g.:e.g.:

Page 22: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2222

Physical optimizationPhysical optimization : : OverviewOverview

""logicallogical" " RARA--expressionexpression

improvedimprovedlogical logical RARA--expressionexpression

Evaluation PlanEvaluation Plan((usingusing physicalphysicalRARA--operatorsoperators))

LogicalLogicaloptimizationoptimization

PhysicalPhysicaloptimizationoptimization

Main Main taskstasks of of physicalphysical optimizationoptimization::

•• ChoosingChoosing suitablesuitable algorithmsalgorithms forfor physicallyphysically realizingrealizingRARA--operatorsoperators: : physicalphysical operatorsoperators, , physical algebraphysical algebra

•• ExploitingExploiting exististing exististing access pathsaccess paths to to storagestorage structuresstructures::indexesindexes, B, B--treestrees, , hashhash--tablestables

•• SortingSorting of of intermediate resultsintermediate results•• Creating temporary Creating temporary indexesindexes and and hashhash--tablestables•• PlanningPlanning of of intermediate storage usageintermediate storage usage•• Estimating costsEstimating costs of different of different implementation variantsimplementation variants

based based on on predefinedpredefined cost modelscost models

Page 23: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2323

ProjectProject

SortSort

SelectSelect

PhysicalPhysical optimizationoptimization: : ExampleExample

ππ s.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy

ppcc

ss

aa

σσ p.Name = 'Sokrates'p.Name = 'Sokrates'

a.a.CourseNrCourseNr = c.= c.CourseNr CourseNr

a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr s.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy

pp

cc

ss

aa

p.Name = 'Sokrates'p.Name = 'Sokrates'

h.h.CourseNrCourseNr = v.= v.CourseNrCourseNr

h.h.MatrNrMatrNr = s. = s. MatrNr MatrNr

IndexIndexJoinJoin

MergeMergeJoinJoin

IndexIndexJoinJoin

c.c.CourseNrCourseNrSortSort

h.h.CourseNrCourseNr

HashHash

IndexDupIndexDupResultResult ofoflogicallogicaloptimizationoptimization::

QEP QEP afterafterphysicalphysicaloptimizationoptimization::

Page 24: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2424

Operators of Operators of physicalphysical algebraalgebra

ProjectProjectaa

UnionUnion

SelectSelectcc

IndexSelectIndexSelectcc

EachEach operatoroperator of RA ("of RA ("logicallogical algebraalgebra") ") is associated withis associated with oneone oror more proceduresmore proceduresrealizing thisrealizing this operationoperation in in physicalphysical storagestorage: Operators of : Operators of physicalphysical algebraalgebra

•• ProjectionProjection withoutwithout duplicateduplicate eliminationelimination::

•• UnionUnion withoutwithout duplicate eliminationduplicate elimination ::

•• Duplicate elimininationDuplicate eliminination::•• "naive" "naive" basicbasic versionversion::

•• usingusing sortingsorting::

•• using using an an IndexIndex::

•• SelectionSelection::•• withoutwithout usingusing an an indexindex::

•• withwith usingusing anan indexindex ::

NestedDupNestedDup

SortDupSortDup

IndexDupIndexDup

Page 25: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2525

Operators ofOperators of physical algebraphysical algebra (2)(2)

SortSortaa

BucketBucket

HashHashaa

IndexJoinIndexJoincc

•• JoinJoin ((analoguoslyanaloguosly: : DifferenceDifference, , intersectionintersection):):•• "naive" "naive" basic versionbasic version::

•• on on sortedsorted inputinput relations:relations:

•• using using an an indexindex::

•• using using a a hashhash--table table in in mainmain memorymemory::

•• IntermediateIntermediate storingstoring::•• withwith temporarytemporary materialisationmaterialisation

of of results results relations in relations in memorymemory::

•• M. M. withwith sortingsorting of of resultresult relationrelation::

•• M. M. withwith indexingindexing the result relationthe result relation ::

•• SortingSorting of of intermediateintermediate resultresult relations:relations:

NestedLoopNestedLoopcc

MergeJoinMergeJoincc

MergeSortMergeSort

HashJoinHashJoincc

TreeTreeaa

Page 26: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2626

DiscussionDiscussion of of samplesample evaluation evaluation planplan

ProjectProject

SortSort

SelectSelect

s.Semesters.Semester

p.p.PersNrPersNr = c.= c.ReadBy ReadBy

pp

cc

ss

aa

p.Name = 'Sokrates'p.Name = 'Sokrates'

a.a.CourseNrCourseNr = c.= c.CourseNr CourseNr

a.a.MatrNrMatrNr = s. = s. MatrNr MatrNr

IndexIndexJoinJoin

MergeMergeJoinJoin

IndexIndexJoinJoin

c. c. CourseNr CourseNr SortSort

a. a. CourseNr CourseNr

HashHash

IndexIndexDupDup

Temporary hashTemporary hash--tabletablein in main memorymain memory

Primary indexPrimary index on son s

Secondary indexSecondary index on con c

Only Only partpart of of primary key primary key on a on a ⇒⇒ no no IndexJoin IndexJoin

No No indexindex on 'Name'on 'Name'

Page 27: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2727

Transcation manager basicsTranscation manager basics

User InterfaceUser InterfaceUser Interface

Schema ManagerSchema ManagerSchema Manager Query ManagerQuery ManagerQuery Manager Transaction ManagerTransaction Transaction ManagerManager

Storage ManagerStorage Storage ManagerManager

Data DictionaryData DictionaryDatabaseDatabase

LogLog

Page 28: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2828

TransactionsTransactions

•• Every change Every change (update) of a (update) of a database is performed database is performed in in special unitsspecial units comprising comprising one or more one or more update update statements statements ((possibly combined with queriespossibly combined with queries):):

•• Transactions Transactions in in SQLSQL are marked by special are marked by special keywordskeywords::•• BEGIN_OF_TRANSACTION, BEGIN_OF_TRANSACTION, for starting for starting a a transactiontransaction,,•• COMMIT, COMMIT, for successful completionfor successful completion, , oror•• ROLLBACK, ROLLBACK, for aborting for aborting a a failed transaction explicitlyfailed transaction explicitly. .

•• Transactions are Transactions are „„packagespackages““ of of individual steps treated with individual steps treated with „„particular careparticular care““by the by the DBMS: DBMS: Transactions are always guaranteeing some Transactions are always guaranteeing some form of form of logicallogicaland and physical physical consistency consistency of of the databasethe database!!

•• Individual Individual update update statementsstatements are treated like are treated like „„minimini--transactionstransactions““ implicitlyimplicitly,,eveneven without enclosing them into without enclosing them into BEGINBEGIN--COMMIT.COMMIT.

•• The The transaction managertransaction manager is one is one of of the most important components the most important components of a DBMS.of a DBMS.

TransactionsTransactionsTransactions

Page 29: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 2929

TransactionsTransactions: Motivation: Motivation

Transactions take placeTransactions take place in in many areas many areas of real life all of real life all the the time,time,e.g. e.g. when using when using a a cash cash machinemachine for receiving for receiving cash cash moneymoney or or when transfering moneywhen transfering money byby home bankinghome banking. .

e.g.e.g..: .: Transfer 50 Transfer 50 €€ fromfrom accountaccount A to A to accountaccount B !B !

begin_of_transaction; read(A, a); a := a -50; write(A, a); read(B, b); b := b+50; write(B, b); commit

beginbegin_of__of_transactiontransaction; ; readread(A, a); a :(A, a); a :== a a --50; 50; writewrite(A, a); (A, a); readread(B, b); b :(B, b); b :== b+50; b+50; writewrite(B, b); (B, b); commitcommit

300300

200200

AA

BB250250

200200

AA

BB250250

250250

AA

BB

Initial Initial statestate Intermediate stateIntermediate state Final Final statestate((consistentconsistent)) ((consistentconsistent))((inconsistentinconsistent))

locallocal variablesvariables

Page 30: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3030

TransactionsTransactions: Motivation (2): Motivation (2)

Transaction managementTransaction management often takes place often takes place in a in a context context wherewhere severalseveral competingcompeting transactionstransactions concurrentlyconcurrently trytryto to accessaccess a a centralcentral, , jointly used databasejointly used database. .

e.g.e.g..: .: BookBook a a seatseat on on thethe nextnext flightflight fromfrom Cologne to Delhi !Cologne to Delhi !

Mr. AMr. A Mr. BMr. B

begin_of_transaction;read(free, a);write(occupied, a);commit

beginbegin_of__of_transactiontransaction;;readread((freefree, a);, a);writewrite((occupiedoccupied, a);, a);commitcommit

begin_of_transaction;read(free, b);write(occupied, b);commit

beginbegin_of__of_transactiontransaction;;readread((freefree, b);, b);writewrite((occupiedoccupied, b);, b);commitcommit

TTAA TTBB

What happens if there is only one seat leftWhat happens if there is only one seat left? ? ––Who gets itWho gets it? ? -- What does the other one seeWhat does the other one see??

Page 31: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3131

•• Sequences Sequences of of updates are reasonably handled only if they are executed updates are reasonably handled only if they are executed entirelyentirelyand and without observable interruptionwithout observable interruption. .

•• TransactionsTransactions expectexpect•• ConsistencyConsistency of of the the final final state state (i.e., (i.e., satisfaction satisfaction of all of all integrity constraintsintegrity constraints))•• PersistencePersistence of of thethe resulting changesresulting changes•• Resistence Resistence of of the the DBMS DBMS againstagainst machine errorsmachine errors and and concurrentconcurrent accessaccess

•• IfIf anyany transactiontransaction has has reachedreached a a state where any state where any of of these expectations can these expectations can no no longerlongerbe satisfiedbe satisfied, , it it has to has to be be interruptedinterrupted by the by the DBMS and DBMS and thethe initialinitial state from which state from which this particularthis particular transactiontransaction has has startedstarted has to has to reconstructedreconstructed::

•• If If a a transactiontransaction hadhad to to bebe rolledrolled back back duedue to to externalexternal errorserrors,, thethe DBMS will DBMS will automaautoma--tically try tically try to to restart it restart it at at the next possible moment the next possible moment in time.in time.

•• HoweverHowever, , if the transaction if the transaction has has been stopped due been stopped due to to integrity violationsintegrity violations it it has has caused itselfcaused itself, , rollbackrollback isis performedperformed, , butbut nono restart is attemptedrestart is attempted forfor thisthis transactiontransaction..

Policy Policy of of transaction handlingtransaction handling

RecoveryRecoveryRecovery

RollbackRollbackRollback

Page 32: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3232

CharacteristicsCharacteristics of of transactionstransactions: : TheThe ACIDACID--paradigmparadigm

Transactions thus areTransactions thus are characterisedcharacterised byby thethe followingfollowing fourfour propertiesproperties addressedaddressedtogethertogether byby thethe notionnotion of of „„ACIDACID--paradigmparadigm““::

• Atomicity:Transactions are indivisible, i.e., they are either executed entirely, or not at all („all or nothing-principle“).

• Consistency:Transactions lead from consistent initial states to consistent finalstates (but possibly via inconsistent intermediate states).

• Isolation:Concurrently performed transactions on the same database(in multi-user mode) do not influence each other, even though their individual steps may be interleaved by the DBMS for impro-ving efficiency.

• Durability:The effect of a successfully completed transaction is maintained persistently in the DB, unless explicitly reversed later on by follow-uptransactions.

•• AAtomicitytomicity::TransactionsTransactions areare indivisibleindivisible, i.e., , i.e., they are either executed entirelythey are either executed entirely, , or not or not at all (at all („„all all oror nothingnothing--principleprinciple““).).

•• CConsistencyonsistency::TransactionsTransactions lead fromlead from consistentconsistent initial statesinitial states to to consistentconsistent finalfinalstatesstates ((but possiblybut possibly viavia inconsistentinconsistent intermediateintermediate statesstates).).

•• IIsolationsolation::ConcurrentlyConcurrently performedperformed transactionstransactions on on the samethe same databasedatabase(in multi(in multi--useruser mode) do mode) do notnot influenceinfluence eacheach otherother, , eveneven though though theirtheir individualindividual stepssteps maymay bebe interleavedinterleaved byby thethe DBMS DBMS forfor improimpro--vingving efficiencyefficiency..

•• DDurabilityurability::TheThe effecteffect of a of a successfullysuccessfully completedcompleted transaction is maintained transaction is maintained persistentlypersistently in in thethe DB, DB, unless explicitly reversed later unless explicitly reversed later on on by followby follow--upuptransactionstransactions. .

Page 33: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3333

ComponentsComponents of a of a transaction managertransaction manager

Resulting from the Resulting from the ACIDACID--paradigmparadigm: : three main three main components components of a of a transaction managertransaction manager

•• Consistency checkerConsistency checker::•• Checks Checks all all affected integrity constraints after each individual stepaffected integrity constraints after each individual step

in a in a transaction transaction (IMMEDIATE) (IMMEDIATE) or or at at the the end of a end of a transactiontransaction(DEFERRED).(DEFERRED).

•• Stops Stops transaction transaction on on integrity violations integrity violations and and initiates rollbackinitiates rollback..•• SchedulerScheduler::

•• Determines Determines a proper (a proper („„nonnon--interferringinterferring““) ) sequence sequence of of steps steps of conof con--current transactions current transactions ((„„interleavinginterleaving““) ) using resources efficientlyusing resources efficiently..

•• Maintains the Maintains the „„illusionillusion““ of a of a sequential execution sequential execution ((serializabilityserializability). ). •• Recovery managerRecovery manager::

•• Records Records individual steps individual steps of of each transaction each transaction in a in a special protocolspecial protocolstorage areastorage area, , called the called the DB logDB log ((„„logginglogging““).).

•• Manages rollback Manages rollback and and restart after restart after system system failures according failures according to to thethelog log entriesentries..

Page 34: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3434

Lost updateLost update--problemproblem

write(A,a2)a2 := a2 ∗ 1.03

read(A,a2)

T2

write(B,b1)b1 := b1 + 300

read(B,b1)write(A,a1)

a1 := a1 – 300read(A,a1)

T1

9.8.7.6.5.4.3.2.1.

Step

„Lost update problem"„„Lost update Lost update problemproblem""

Expl. of Expl. of twotwo unsynchronizedunsynchronized transactionstransactions, , thetheexecution execution of of which causes changes which causes changes to to be be „„lostlost““::

T1: Transferring 300 €from account A to B

T2: Paying 3% interestto account A

TT11: Transferring 300 : Transferring 300 €€fromfrom accountaccount A to BA to B

TT22: : PayingPaying 3% 3% interestinterestto to accountaccount AA

A B A B

5300 5300 €€ 4700 4700 €€

5459 5459 €€

5000 5000 €€

5000 5000 €€

Page 35: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3535

Lost updateLost update--problemproblem (2) (2)

T2

aa11 := := aa11 –– 300300

read(read(AA,,aa11))

T1

9.8.7.6.5.4.3.2.1.

Step A B A B

5300 5300 €€ 4700 4700 €€

5000 5000 €€

5150 5150 €€

5000 5000 €€

There would have been two intuitively correctThere would have been two intuitively correctsolutionssolutions::

1. 1. casecase: T: T11 executed beforeexecuted before TT22

writewrite(A,a(A,a11))

readread(B,b(B,b11))

bb11 := := bb11 + 300+ 300

writewrite(B,b(B,b11))

readread(A,a(A,a22))

writewrite(A,a(A,a22))

aa22 := := aa22 ∗∗ 1.031.03

5000 5000 €€

T1: Transferring 300 €from account A to B

T2: Paying 3% interestto account A

TT11: Transferring 300 : Transferring 300 €€fromfrom accountaccount A to BA to B

TT22: : PayingPaying 3% 3% interestinterestto to accountaccount AA

Page 36: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3636

Lost updateLost update--problemproblem (3) (3)

T2T1

9.8.7.6.5.4.3.2.1.

Schritt A B A B

5300 5300 €€ 4700 4700 €€

5459 5459 €€

5159 5159 €€

5000 5000 €€

AnotherAnother semanticallysemantically „„clean" clean" solutionsolution::

2. 2. casecase: T: T22 executedexecuted beforebefore TT11

writewrite(A,a(A,a11))

readread(B,b(B,b11))

bb11 := := bb11 + 300+ 300

writewrite(B,b(B,b11))

readread(A,a(A,a22))

writewrite(A,a(A,a22))

aa22 := := aa22 ∗∗ 1.031.03

aa11 := := aa11 −− 300300

readread(A,a(A,a11))

4700 4700 €€

T1: Transferring 300 €from account A to B

T2: Paying 3% interestto account A

TT11: Transferring 300 : Transferring 300 €€fromfrom accountaccount A to BA to B

TT22: : PayingPaying 3% 3% interestinterestto to accountaccount AA

Page 37: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3737

Lost updateLost update--problemproblem (4) (4)

T2T1

9.8.7.6.5.4.3.2.1.

Step A B A B

5300 5300 €€ 4700 4700 €€

5459 5459 €€

5159 5159 €€

5000 5000 €€

An An „„interleavedinterleaved" " execution execution of of both transactionsboth transactionsproducingproducing a a „„correctcorrect““ resultresult::

writewrite(A,a(A,a11))

readread(B,b(B,b11))

bb11 := := bb11 + 300+ 300

writewrite(B,b(B,b11))

readread(A,a(A,a22))

writewrite(A,a(A,a22))

aa22 := := aa22 ∗∗ 1.031.03

aa11 := := aa11 −− 300300

readread(A,a(A,a11))

T1: Transferring 300 €from account A to B

T2: Paying 3% interestto account A

TT11: Transferring 300 : Transferring 300 €€fromfrom accountaccount A to BA to B

TT22: : PayingPaying 3% 3% interestinterestto to accountaccount AA

Page 38: Database Management Systems · • DBMS normally make use of various internal auxiliary data structures in order to speed up access to larger tables. Such additional data structures

©© 2009 2009 Prof. Dr. Rainer Prof. Dr. Rainer MantheyManthey Foundations Foundations of IMof IM 3838

ResumeeResumee

There is muchThere is much, , much more much more to to be learnt be learnt about databases about databases and and database managementdatabase management, ..., ...

... ... butbut: : This lecture cannot provide more than This lecture cannot provide more than just a just a „„tastertaster““ of of this excitingthis excitingand and very very relevant relevant area area of of information processinginformation processing..

Time is over, folks!

Next week, we‘ll do a last roundof „exam training“ –make sure everybody attends:

I want each of you to pass straightaway!!

Time Time is overis over, , folksfolks!!

Next weekNext week, , wewe‘‘ll ll do a last do a last roundroundof of „„exam trainingexam training““ ––make sure everybody attendsmake sure everybody attends::

I I want each want each of of you you to pass to pass straightawaystraightaway!!!!