21
CS 542 Database Management Systems Controlling Database Integrity and Performance J Singh January 31, 2011

CS 542 Database Index Structures

  • Upload
    j-singh

  • View
    2.426

  • Download
    1

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: CS 542 Database Index Structures

CS 542 Database Management SystemsControlling Database Integrity and Performance

J Singh January 31, 2011

Page 2: CS 542 Database Index Structures

2© J Singh, 2011 2

Today’s Topics

• Database Integrity– Primary Key Constraints – Prevent Duplicates– Foreign Key Constraints – Prevent Dangling References– Attribute Constraints – Prevent Inconsistent Attribute Values– Tuple Constraints – More vigilant checking of attribute values– Assertions – Paranoid integrity checking

• Views

• Performance Topics– Indexes

• Discussion of presentation topic proposals

Page 3: CS 542 Database Index Structures

3© J Singh, 2011 3

Primary Key Constraints

• What are Primary Keys good for?

– Uniquely identify the subject of each tuple

– Ensure that there are no duplicates

– Cannot be null – that would imply a NULL subject.

P1 P2 … Pm A1 A2 … An

Thing1

::

Thingn

• A table may not have more than one primary key

– A Primary Key may consist of one or more columns

– Multiple Unique keys are OK

• For Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R,

– <P1, P2, …, Pm> are unique

– P1, P2, …, Pm are non-null

• <U1, U2, …, Um> together constitute a unique key if for each tuple in R,

– <U1, U2, …, Um> are unique

– But U1, U2, …, Um can be null

Page 4: CS 542 Database Index Structures

4© J Singh, 2011 4

Name CountryCode

Helsinki FIN

City

Foreign Key Constraints (p1)

• Foreign Key– Must point to a Key

ReferenceCREATE TABLE City ( :: CountryCode char(3) REFERENCES Country(Code))

• Key Reference– Must be unique or primary

key– Try: INSERT INTO city(Name, CountryCode) value ('xyzzy',

'XYZ');

– Try: UPDATE cityset CountryCode='XYZ' where

CountryCode='FIN';

– Key reference must already exist before a referencing tuple can be added

Code Name …

FIN Finland

Country

Foreign Key

Key Reference

Name CountryCode

Helsinki FIN

Xyzzy XYZ

City

Main Idea: Prevent Dangling Tuples

Page 5: CS 542 Database Index Structures

5© J Singh, 2011 5

Foreign Key Constraints (p2)

• Alternative methods of defining a foreign key– CREATE TABLE City ( CountryCode char(3) REFERENCES COUNTRY(Code), …)

– CREATE TABLE City ( CountryCode char(3), …, FOREIGN KEY CountryCode [CONSTRAINT [ctyREFcntry]] REFERENCES

COUNTRY(Code))

– CREATE TABLE City ( CountryCode char(3), …) Then, later, ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]] FOREIGN KEY CountryCode REFERENCES

COUNTRY(Code);

Notation: [] signifies optional

Page 6: CS 542 Database Index Structures

6© J Singh, 2011 6

Foreign Key Constraints (p3)

• Changes to Key References– Try: DELETE FROM country WHERE code=‘FIN’;– Try: UPDATE country SET Code='XYZ' WHERE Code='FIN‘;

• Referential Integrity Options– Restrict (default)

• Reject request– Cascade

• Reflect changes back– Set Null

• Set the foreign key to NULL

Code Name …

FIN Finland

Country

Foreign Key

Key Reference

Name CountryCode

Helsinki FIN

City

Page 7: CS 542 Database Index Structures

7© J Singh, 2011 7

Foreign Key Constraints (p4)

• Chicken and Egg definitionsCREATE TABLE chicken ( cID INT PRIMARY KEY, eID INT REFERENCES egg(eID));

CREATE TABLE egg( eID INT PRIMARY KEY, cID INT REFERENCES

chicken(cID));

• Consistently fails– Can’t define a foreign key

to a table before it has been defined

• Solution– Define the tables w/o

constraintsCREATE TABLE chicken( cID INT PRIMARY KEY, eID INT); CREATE TABLE egg( eID INT PRIMARY KEY, cID INT);

– And then add foreign keysALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID);ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID);

Page 8: CS 542 Database Index Structures

8© J Singh, 2011 8

Foreign Key Constraints (p5)

• Chicken and Egg insertionINSERT INTO chicken VALUES(1, 1001);INSERT INTO egg VALUES(1001, 1);

• Still consistently fails– Need a way to postpone

constraint checking– How long to postpone?

• Until transaction commit

• Solution– Define the tables with deferred

constraint-checkingALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID) INITIALLY DEFERRED DEFERRABLE;ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID) INITIALLY DEFERRED DEFERRABLE;

• And thenINSERT INTO chicken VALUES(1, 1001);

INSERT INTO egg VALUES(1001, 1);COMMIT;

Page 9: CS 542 Database Index Structures

9© J Singh, 2011 9

Attribute-Based Constraints

• NOT NULL– The most common

• Reasonability Constraints– Validate incoming data? e.g.,

• Population Density < 30000– Specification:

Population INT(11) NOT NULL CHECK (Population <= 30000 * SurfaceArea),

– The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take

• Including subqueries– The attribute constraint is checked when assigned

• Can be violated underneath as long as it is not re-evaluated• For example, if we update SurfaceArea, the violation won’t be

flagged

Not implemented in all databases, e.g., MySQL

Page 10: CS 542 Database Index Structures

10© J Singh, 2011 10

Tuple-Based Constraints

• Validate the entire tuple whenever anything in that tuple is updated

– More integrity enforcement than with attribute-based constraints e.g.,

• Population Density <= 30000– Specification:

Population INT(11) NOT NULL,CHECK (Population <= 30000 * SurfaceArea),

– The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take

• Including subqueries– The attribute constraint is checked when tuple is updated

• If we update SurfaceArea, the violation will be flagged• But the violation of CHECK (Population > ( SELECT SUM(Population) FROM City WHERE City.CountryCode = Code))

which specifies a subquery involving another table, will not be flagged

Not implemented in all databases, e.g., MySQL

Page 11: CS 542 Database Index Structures

11© J Singh, 2011 11

Assertions

• Validate the entire database whenever anything in the database is updated

– Part of the database, not any specific table– Specification: Table-like

CREATE ASSERTION CountryPop CHECK ( NOT EXISTS (SELECT * FROM Country WHERE Population < (SELECT SUM(Population) FROM City WHERE City.CountryCode = Code)))

– Difficult to implement efficiently• Often not implemented• I don’t know of any implementations

• Can be implemented for specific cases using Triggers, see Section 7.5

Page 12: CS 542 Database Index Structures

12© J Singh, 2011 12

Views

• Also called Virtual Views• Don’t actually exist in the database but behave as if they do• Can be subsets of the data or joins – actually, arbitrary queries

• Subset example,CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS

cntry FROM city c WHERE population > 0

• Join exampleCREATE VIEW CityLanguage as SELECT city.name, city.countrycode, lang.language as

Language FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = ‘T‘;

Page 13: CS 542 Database Index Structures

13© J Singh, 2011 13

Operations on Views (p1)

• SELECT SELECT * FROM CityLanguage WHERE Language='Dutch';

– Shouldn’t ‘temporarily’ create the table and SELECT from it.

– Should use the definition of CityLanguage to make a query, i.e.,

SELECT * FROM (SELECT …blabla… FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = 'T') WHERE Language='Dutch';

Page 14: CS 542 Database Index Structures

14© J Singh, 2011 14

Operations on Views (p2)

• UPDATE, INSERT not always possible, except– Can sometimes be implemented using INSTEAD OF triggers– Modifications are permitted when the view is derived from a

single table R and• The WHERE clause does not involve R in a Subquery• The FROM clause can only consist of one occurrence of R• The valued of all attributes not specified in the view definition can

be ‘manufactured’ by the database– Example. For the view ct

CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntry

FROM city c WHERE population > 0

the queryINSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN')

can be automatically rewritten as INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')

Page 15: CS 542 Database Index Structures

15© J Singh, 2011 15

Top-Down Datalog Recursion Revisited

• IDB’s are conceptualized (and implemented) as Views

for IDB predicate p(x,y, …)FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up

Rule Why unsafe?

isHappy(x) isRich(y)

y could recursively loop back to x

Bachelor(x) NOT isMarried(x)

Negated, may result in infinite recursion

IsCheap(x) x < 10

It’s safe!

Page 16: CS 542 Database Index Structures

16© J Singh, 2011 16

Indexes

• Motivation:– Preventing the need for linear search through a big table– Example query:

SELECT * FROM City WHERE CountryCode = 'FIN';– Another:

SELECT * FROM City WHERE Population > (0.4 * ( SELECT Population FROM Country WHERE CountryCode = Code));

– Expected time for first example: O(n). For the second, O(n2)• Declaration

CREATE INDEX CityIndex ON City(CountryCode);CREATE INDEX CityPopIndex ON City(Population);CREATE INDEX CountryPopIndex ON Country(Population);

Main Idea: Data Structures for Fast Search

Page 17: CS 542 Database Index Structures

17© J Singh, 2011 17

Selection of Indexes (p1)

• Why not create an index for every attribute?

• Useful indexes, and not so useful ones– Primary key?– Unique key?– From previous examples,

• CityIndex?• CityPopIndex?• CountryPopIndex?

Page 18: CS 542 Database Index Structures

18© J Singh, 2011 18

Selection of Indexes (p2)

• The Mantra:– Don’t define indexes too early: know your workload first

• Be as empirical as is practical– The Greedy approach to index selection:

• Start with no indexes• Evaluate candidate indexes, choose the one potentially most

effective• Repeat

– Query execution will take advantage of defined indexes

Page 19: CS 542 Database Index Structures

CS 542 Database Management SystemsReport Proposals

J Singh January 31, 2011

Page 20: CS 542 Database Index Structures

20© J Singh, 2011 20

Report Proposals – General Observations

• Simply Impressive!

• Corrective Themes– When in doubt, prefer depth over breadth– Tilt the balance toward obtaining and working with real

data– Focus on your contributions– Separate the report from the project

• If your intent in the project is to do a significant piece of development, make the report about the design

• Go light on implementation; toy application is good to get your feet wet but leave the heavy lifting for the project

– For big papers, don’t try to swallow it whole. Take a piece and focus on that.

Page 21: CS 542 Database Index Structures

21© J Singh, 2011 21

Next meeting

• February 7

• Index Structures, Chapter 14