Upload
augustus-lawson
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
B.L. “Tink” TysorBayard Lee Tysor, Inc.
401-965-2688
1
Utilizing Views, RI and Other Stuff for Performance
New England DB2 User Group (NEDB2UG)
March 25, 2009
2
Bayard Lee Tysor, Inc.DB2 SQL, DBA & Data Modeling
Consulting & Education
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Reed Meseck is an internationally recognized researcher, consultant and lecturer, specializing in very high volume, highly scalable transaction and data warehouse systems
DB2 is a Registered Trademark of IBM Corporation
Sheryl Larsen is an internationally recognized researcher, consultant and lecturer, specializing in DB2 and is known for her extensive expertise in SQL coding and tuning.
BL "Tink" Tysor is an internationally recognized researcher, consultant and lecturer, specializing in Data Modeling, DB2 SQL and Database Administration.
[email protected]@attglobal.net
[email protected]@BLTysor.com
[email protected]@cs.com
www.BLTysor.comUSA 401-965-2688
www.SMLSQL.comUSA 630-399-3330
3© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
Outline How did we get here?
Back to the future Data Matters
Getting to know your data. Using Views for Domains Layered Views Views Applied Relations? Who Needs Them? Constraints? Who needs them? Manual Query Rewrite
4
HOW DID WE GET HERE?
Back to the Future
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
5© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
Do You Know This Man?
Dr. Edgar Frank “Ted” Codd
6
Codd’s 12 Rules Abridged0. For a system to qualify as a RELATIONAL, DATABASE,
MANAGEMENT system, that system must use its RELATIONAL facilities (exclusively) to MANAGE the DATABASE.
1. The information rule – Everything is a value in a column in a table.
2. The guaranteed access rule – Every scalar value in the database must be logically addressable using table name, column name and the primary key of the containing row.
3. Systematic treatment of null values – Everything has a value, even nothing (NULL).
4. Active online catalog based on the relational model – The system must eat its own dogfood, i.e. the catalog is relational and accessed via SQL.
5. The comprehensive data sublanguage rule – One comprehensive language (SQL) for expression, definition and implementation.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
7
Codd’s 12 Rules Abridged6. The view updating rule - All views that are theoretically
updatable must be updatable by the system.
7. High-level insert, update, and delete - The system must support set-at-a-time INSERT, UPDATE, and DELETE operators.
8. Physical data independence - Self-explanatory.
9. Logical data independence - Self-explanatory
10.Integrity independence –
11.Distribution independence – Data access should be transparent regardless of where it lives.
12.The nonsubversion rule – No cheaters, no perversions, no backdoors! If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system (e.g.) bypassing a relational security or integrity constraint.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.
8
E.F. Codd
• Applications written transparent to physical design
• Applications remain untouched by physical design changes
• Physical design changes are often needed and natural in types of stored information
“Future users of large data banks must be protected from having to know how the data is organized …...”
E.F. Codd, A Relational Model of Data Large Shared Data Banks
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
9
Highlights• Users must be protected from having to know how the
data is organized– Hiding real representation from users is okay and
even good– It ensures accuracy and proper JOINs
• Most application programs should remain unaffected when the internal representation of data is changed– Changes to base tables should break few or no
applications
• Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information– Change is inevitable– Changes in traffic are normal and likely– Growth is natural
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Most Applications get BIGGER
10
Evolution of Program Abstraction
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Assembly Language
Hardware Programming
3GL Languages
APIs
Object Oriented Programming
Service Oriented Architecture
11
Evolution of Data Abstraction
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
File Systems
Hardware Programming
Relational Concept
Views
Layered Views
12
First Normal Form
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Remove multi Valued Attributes
CLAIMPolicy IDOccurrence IDClaim ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration DateOccurrence DateClaimant NameClaimant AddressMedical PaymentsIndemnity PaymentsExpense Payments
Payment Type1Payment Type Key
Payment Type
Payments1Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
CLAIM1Policy IDOccurrence IDClaim ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration DateOccurrence DateClaimant NameClaimant Address
13
Second Normal Form
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Remove Duplicate Data From Key Attributes
Payment Type1Payment Type Key
Payment Type
Payments1Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
CLAIM1Policy IDOccurrence IDClaim ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration DateOccurrence DateClaimant NameClaimant Address
Payment Type2Payment Type Key
Payment Type
Payments2Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
CLAIM2Occurrence IDClaim ID
Occurrence DateClaimant NameClaimant Address
POLICY2Policy ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration Date
Policy ID
14
Second Normal Form Cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Remove Duplicate Data From Key Attributes
Payment Type2aPayment Type Key
Payment Type
Payments2aPolicy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
OCCURRENCE2aOccurrence ID
Occurrence Date
POLICY2Policy ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration Date
Policy ID
Payment Type2Payment Type Key
Payment Type
Payments2Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
CLAIM2Occurrence IDClaim ID
Occurrence DateClaimant NameClaimant Address
POLICY2Policy ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration Date
Policy ID
Occurrence IDClaim IDClaimant NameClaimant Address
CLAIM2a
15
Third Normal Form Cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Payment Type3Payment Type Key
Payment Type
Payments3Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
Occurrence ID
Occurrence Date
Policy ID
Insured KeyPolicy Effective DatePolicy Expiration Date
Policy ID
Occurrence IDClaim IDClaimant Key
CLAIM3
Payment Type Key
Payment Type
Policy ID (FK)Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
Payment
Occurrence ID
Occurrence Date
Policy ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration Date
Policy ID
Occurrence IDClaim IDClaimant NameClaimant Address
Claimant Key
Claimant NameClaimant Address
Insured Key
Insured NameInsured AddressInsured Customer Rating
CLAIMANT3
Remove Duplicate Data From Key Attributes
OCCURRENCE3
POLICY3
Payment Type2a
Payments2a
OCCURRENCE2a
POLICY2
CLAIM2a
INSURED3
16
Star Schema Example
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Payment_Key: INTEGER
Policy_ID: INTEGERTime_Key: SMALLINTInsured_Key: INTEGERClaimant_Key: INTEGERPayment: DECIMAL(,)
Payment_Key: INTEGER
Payment_Type: CHARTime_Key: INTEGER
Date: DATEQuarter: SMALLINTLoss_Period: CHAR()Premium_Period: CHAR()
Insured_Key: INTEGER
Insured_Name: CHARInsured_Address: CHAR()Insured_Customer_Rating: CHAR()
Claimant_Key: INTEGER
Claimant_Name: CHARClaimant_Address: CHAR()
Policy_Key: INTEGER
Policy_ID: CHARPolicy_Effective_Date: DATEPolicy_Expiration_Date: DATE
Fact_Key: INTEGER
Policy_ID: INTEGERTime_Key: SMALLINTInsured_Key: INTEGERClaimant_Key: INTEGERIndemnity_Payments: DECIMAL(,)Medical_Payments: DECIMAL(,)Expense_Payments: DECIMAL(,)
17
To Reduce Costs
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CPU COST
ELAPSED
TIME
Attempt to Minimize Both
18
To Reduce Costs
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CPU COST
ELAPSED TIME
Labor C
osts
•Development•Maintenance•Enhancements•Fixing Bugs
19
•“Grain” of Performance– Large – “Gross” tuning
•Data Design Database design
3NF, Horizontal Table splits Data placement
Partitioning, load balancing Data organization
UNION in Views
• Subsystem Parameters
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Performance –How Can We Affect It?
20
•“Grain” of Performance– Small – “Fine” tuning
•SQL tuning Rewriting the query
• Index tuning Altering Index design
•Query plan tuning Changing the Optimizers’ Mind
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Performance –How Can We Affect It?
21
UNION ALL Views
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
M
I
U
S
A
N
X
Daily ReportFrequency
REGION
SIZE
I
N
X
M
USA
UNION ALL View
PartitionedBy Region
0 500 1000
One physical table of all regions with five years of data
One logical table of all regionswith five years of data
22
DATA MATTERS
Getting to Know Your Data
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
23
Parkinson’s Law of DataDefinition:"Data expands to fill the space available for
storage";
Buying more memory encourages the use of more memory-intensive techniques.
It has been observed over the last 10 years that the memory usage of evolving systems tends to double roughly once every 18 months.
Fortunately, memory density available for constant dollars also tends to double about once every 12 months (see Moore's Law);
Unfortunately, the laws of physics guarantee that the latter cannot continue indefinitely.
• - COPYRIGHT © 2000-2003 WEBNOX CORP. HYPERDICTIONARY
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
24
What Matters ….• Size matters!
− Absolute size− Relative size− Measures
Rows Bytes Etc.
• Where it matters− JOINS− Schema
definition© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
25
Cardinality - Partitioning• What Does Your Data Distribution Look Like?
– Ideal and Uniform?– Less than Ideal and “Clumpy”
• Yesterday’s Partitioning Scheme May Not Work Today!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CustomersA - F
CustomersA - F
CustomersG - L
CustomersG - L
CustomersM - R
CustomersM - R
CustomersM - R
CustomersM - R
CustomersS - Z
CustomersS - Z
CustomersA - F
CustomersA - F
26
Strategies for Performance
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
M
I
U
S
A
N
X
Daily ReportFrequency
REGION
SIZE
I
N
X
M
USA
UNION ALL View
PartitionedBy Subsets of USA
0 500 1000
One physical table of all regions with five years of data
One logical table of all regionswith five years of data
27© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
I
N
X
M
USA
UNION ALL View
One logical table of all regionswith five years of data
M C-4M Current
I C-4I Current
N C-4N Current
X C-4X Current
USA C- 5
USA C-4
USA C-3
USA CurrentUNION ALL View
USA CurrentUSA Current
USA CurrentUSA Current
USA CurrentUSA Current
USA CurrentUSA Current
USA CurrentUSA Current
USA Current
USA C-2
UNION ALL View
UNION ALL View
UNION ALL View
UNION ALL View
USA C-2USA C-2
USA C-2USA C-2
USA C-2USA C-2
USA C-2USA C-2
USA C-2USA C-2
USA C-2
Partitioned byMonth
28
Strata
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
2001
2004
2003
2002
29
Affinity
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Part 4
(Northeast)
Part 1
(Southwest)
Part 2
(Northwest)
Part 3
(Southeast)
Application Servers
30
Motif / Template Pattern
• Data “Frame” is common– Note the commonality
• Variant portion is small– “ABC”, “DEF”, “GHI” in the example
• Commonly found in XML documents, Web pages, etc.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
31
Vertical Split
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Prior 7 YearsPrior 7 Years EmployeeEmployee
CustomerCustomer
VIEW definition hides JOIN from APP
Customer - Employee
32
Horizontal Split
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Prior 7 YearsPrior 7 Years
Prior 5 YearsPrior 5 Years
Prior 2 YearsPrior 2 Years
UNION ALL VIEW
Newest
Oldest
Customer - Employee
33
Horizontal Split
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Prior 7 YearsPrior 7 Years
Current Period (MQT)Current Period (MQT)
Prior 5 YearsPrior 5 Years
Prior 2 YearsPrior 2 Years
UNION ALL and MQT
Newest
Oldest
“NOW”
Customer - Employee
34
Horizontal & Vertical Splitting
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Prior 7 YearsPrior 7 Years
Current Period (MQT)Current Period (MQT)
Prior 5 YearsPrior 5 Years
Prior 2 YearsPrior 2 Years
UNION ALL, MQT and Vertical
Newest
Oldest
“NOW”
CustomerCustomerCustomer - Employee
35
• Collapsing Tables
• Splitting Tables–Horizontal Split–Vertical Split
• Adding Redundant Columns
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Patterns of Denormalization
36© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
Collapsing Tables
C1 C2
Table A
C1 C3
Table B
C1 C2 C3
Table A
37
Splitting Tables – Horizontal(Strata)
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
C1
C2
C3
Table AC
1C
2C
3
Table A1
C1
C2
C3
Table A2
38
Splitting Tables – Vertical (Striping)
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
C1 C2
Table A
C1 C3
Table B
C1 C2 C3
Table A
39
USING VIEWS FOR DOMAINS
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
40
Relationship of Domains
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Domain of All Customers Domain of
Active Customers
Domain of Active Customers with Account Balance <> 0
41
Domains as Views
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
V_CUSTOMERS_ALL
V_CUSTOMERS_ACTIVE
V_CUSTOMERS_SENDBILL
SELECT * FROM V_CUSTOMERS
SELECT * FROM V_CUSTOMERS_ALL WHERE ACTIVE = ‘Y’
SELECT * FROM V_CUSTOMERS_ACTIVE WHERE BALANCE <> 0
42
LAYERED VIEWS
Simple Concepts for Simple Minds
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
43
Layered Views
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Base Table
All Customers
Active Customers Inactive Customers
Late Current
44
Views Can Change
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
All Customers
Active Customers Inactive Customers
Late Current
Balance <> 0
Base Table
45
Base Tables Can Change
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Base Table
All Customers
Active Customers Inactive Customers
Late Current
Base Table Base Table Base Table
Balance <> 0
46
VIEWS APPLIED
Can you spell “Viewmiester”?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
47
Views• Views for Easier Programming
– Reduce complexity for reliability
• Logical data independence– Ability to modify the physical layout– No program impact
• Views for performance– “Skinny” views– Views that define domains
• Result set is primary keys
• Views for Reuse– Combine domains using SET operators
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
48
Views Can Make Programming Easier
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Claim3Occurrence ID (FK)Claim ID
Claimant Key (FK)
Policy3Policy ID
Insured Key (FK)Policy Effective DatePolicy Expiration Date
Payment Type3Payment Type Key
Payment Type
Payment3Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
PaymentInsured Name
Occurance3Occurrence ID
Policy ID (FK)Occurrence Date
Insured3Insured Key
Insured NameInsured AddressInsured Customer Rating
Claimants3Claimant Key
Claimant NameClaimant Address
CLAIMPolicy IDOccurrence IDClaim ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration DateOccurrence DateClaimant NameClaimant AddressMedical PaymentsIndemnity PaymentsExpense Payments
SELECT P.POLICY_ID ,O.OCCURANCE_ID ,C.CLAIM_ID ,I.INSURED_NAME ,I.INSURED ADDRESS ,I.INUSRED_CUSTOMER_RATING ,P.POLICY_EFFECTIVE_DATE ,P.POLICY_EXPIRATION_DATE ,O.OCCURANCE_DATE ,T.CLAIMANT_NAME ,T.CLAIMANT_ADDRESS ,COALESCE(MED.MEDICAL_PAYMENTS,0) AS MEDICAL_PAYMENTS ,COALESCE(IND.INDEMNITY_PAYMENTS,0) AS INDEMNITY_PAYMENTS ,COALESCE(EXP.EXPENSE_PAYMENTS,0) AS EXPENSE_PAYMENTS
FROM POLICY3 P INNER JOIN INSURED3 I ON P.INSURED_KEY = I.INSURED_KEY INNER JOIN OCCURANCE3 O ON P.POLICY_ID = O.POLICY_ID INNER JOIN CLAIM3 C ON C.OCCURANCE_ID = O.OLCCURANCE_ID INNER JOIN CLAIMANTS3 T ON C.CLAIMANT_KEY = T.CLAIMANT_K LEFT OUTER JOIN (SELECT OCCURRENCE_ID,CLAIM_ID PAYMENT AS MEDICAL_PAYMENTS FROM PAYMENT3 WHERE PAYMENT_TYPE_KEY = ‘M’) AS MED ON MED.OCCURANCE_ID = C.OCCURANCE_ID AND MED.CLAIM_ID = C.CLAIM_ID LEFT OUTER JOIN (SELECT OCCURRENCE_ID,CLAIM_ID ,PAYMENT AS INDEMNITY_PAYMENTS FROM PAYMENT3 WHERE PAYMENT_TYPE_KEY = ‘I’) AS IND ON IND.OCCURANCE_ID = C.OCCURANCE_ID AND IND.CLAIM_ID = C.CLAIM_ID LEFT OUTER JOIN (SELECT OCCURRENCE_ID,CLAIM_ID ,PAYMENT AS EXPENSE_PAYMENTS FROM PAYMENT3 WHERE PAYMENT_TYPE_KEY = ‘E’) AS EXP ON EXP.OCCURANCE_ID = C.OCCURANCE_ID AND EXP.CLAIM_ID = C.CLAIM_ID;
Create VIEW CLAIM AS
Insured Key
49© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
Views Can Make Programming EasierCLAIM
Policy IDOccurrence IDClaim ID
Insured NameInsured AddressInsured Customer RatingPolicy Effective DatePolicy Expiration DateOccurrence DateClaimant NameClaimant AddressMedical PaymentsIndemnity PaymentsExpense Payments
SELECT C.INSURED_NAME ,SUM(C.MEDICAL_PAYMENTS)
AS TOTAL_PAYMENTSFROM CLAIM CGROUP BY C.INSURED_NAMEORDER BY TOTAL_PAYMENTS DESCFETCH FIRST 10 ROWS ONLY;
Claim3Occurrence ID (FK)Claim ID
Claimant Key (FK)
Policy3Policy ID
Insured Key (FK)Policy Effective DatePolicy Expiration Date
Payment Type3Payment Type Key
Payment Type
Payment3Occurrence ID (FK)Claim ID (FK)Payment Type Key (FK)
PaymentInsured Name
Occurance3Occurrence ID
Policy ID (FK)Occurrence Date
Insured3Insured Key
Insured NameInsured AddressInsured Customer Rating
Claimants3Claimant Key
Claimant NameClaimant Address
Insured Key`Insured Key
RI
50
Using Views to Avoid XML
• Can make an XML document look like a DB2 Column
• Performance could be a problem
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
51
Sample XML
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
<Courses> <Course ID="B1"> <Title>Basic SQL</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>[email protected]</Email> <Web_Site>www.BLTysor.com</Web_Site> </Instructor> <Duration>1 Day</Duration> <Labs>3 Labs</Labs> </Course> <Course ID="I1"> <Title>Intermediate SQL</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>[email protected]</Email> <Web_Site>www.BLTysor.com</Web_Site> </Instructor> <Instructor ID = "SML"> <Name>Sheryl M. Larsen</Name> <Phone>630-399-3330</Phone> <Email>[email protected]</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>2 Days</Duration> <Labs>6 Labs</Labs> </Course>
Humans can decipher XML, especially if it is formatted by an XML parser such as Internet Explorer
<Course ID = "A2"> <Title>Tuning DB2 SQL for Performance</Title> <Instructor ID = "SML"> <Name>Sheryl M. Larsen</Name> <Phone>630-399-3330</Phone> <Email>[email protected]</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>1 Day</Duration> <Labs>1 Lab</Labs> </Course> <Course ID = "X1"> <Title>pureXML</Title> <Instructor ID = "BLT"> <Name>B.L. "Tink" Tysor</Name> <Phone>401-965-2688</Phone> <Email>[email protected]</Email> <Web_Site>www.SMLSQL.com</Web_Site> </Instructor> <Duration>2 Days</Duration> <Labs>6 Labs</Labs> </Course></Courses>
52
Using Views to Avoid XML
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CREATE VIEW VCLASSES ASSELECT AC.CLASS_EFF, XT.ID, XT.TITLE, XT.DURATION, XT.LABSFROM ALL_CLASSES AC ,XMLTABLE( '$T/Courses/Course' PASSING AC.CLASSES AS "T" COLUMNS “ID" CHAR(3) PATH './@ID' ,”TITLE" VARCHAR(30) PATH 'Title' ,”DURATION" CHAR(10) PATH 'Duration' ,”LABS" CHAR(10) PATH 'Labs' ) AS XTWHERE AC.CLASS_EFF_DTE= '12/01/2008';
SELECT * FROM VCLASSES;CLASS_EFF ID TITLE DURATION LABS2008-12-01 B1 Basic SQL 1 Day 3 Labs 2008-12-01 I1 Intermediate SQL 2 Days 6 Labs 2008-12-01 A2 Tuning DB2 SQL for Performance 1 Day 1 Lab 2008-12-01 X1 pureXML 2 Days 6 Labs
53
Using Views to Optimize XML
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
…WHERE XMLEXISTS(‘$a/author[@id = $book/authors/author/@id]’ PASSING bookinfo as “b”, authorinfo as “a”)…
CREATE INDEX bookAuthorIdx ON books(bookinfo)GENERATE KEY USING XMLPATTERN ‘/book/authors/author/@id’AS SQL DOUBLE;CREATE INDEX authorIdx ON authors(authorinfo)GENERATE KEY USING XMLPATTERN ‘/author/@id’AS SQL DOUBLE;
…WHERE XMLEXISTS(‘$a/author[@id/xs:double(.) = $book/authors/author/@id/xs:double(.)]’ PASSING bookinfo as “b”, authorinfo as “a”)…
•Does not use indexes
•Uses indexes
54
Exception Based View
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Coverage IDState CodeCounty ID ZIP Code Locality ID Rate Factor
Rate
County
State IDState Info
State
CoverageCoverage ID
Coverage Info
ZIP CodeZIP Code
State Code
Locality
County ID
State CodeCounty Info
Locality ID
State CodeLocality Info
State CodeCounty ID ZIP Code Locality ID REL Info
County ZIP Locality REL
ZIP Codes
County ID
s
Locality IDs
State Code
31 Rows per Coverage(actually >100,000)
55
Exception Based View cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Coverage IDState CodeCounty ID NULLZIP Code NULLLocality ID NULLRate FactorPriority Key
Rate
County
State IDState Info
State
CoverageCoverage ID
Coverage Info
ZIP CodeZIP Code
State Code
Locality
County ID
State CodeCounty Info
Locality ID
State CodeLocality Info
State CodeCounty ID ZIP Code Locality ID REL Info
County ZIP Locality REL
ZIP Codes
County ID
s
Locality IDs
State Code
5 Rows(actually approx 5,000)
56
Exception Based View cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Coverage IDState CodeCounty ID NULLZIP Code NULLLocality ID NULLRate FactorPriority Key
Rate
CoverageCoverage ID
Coverage Info
State CodeCounty ID ZIP Code Locality ID REL Info
County ZIP Locality REL
Coverage IDState CodeCounty ID ZIP Code Locality ID Rate Factor
Exceptions the default would be according to the following priorities
COUNTY/ZIP/LOCALITY if no row then ZIP/LOCALITY if no row then ZIP/COUNTY if no row then LOCALITY if no row then ZIP if no row then COUNTY if no row then STATE
Geo_Pol_Rating
57
Exception Based View cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Coverage IDState CodeCounty ID NULLZIP Code NULLLocality ID NULLRate FactorPriority Key
Rate
CoverageCoverage ID
Coverage Info
State CodeCounty ID ZIP Code Locality ID REL Info
County ZIP Locality REL
Coverage IDState CodeCounty ID ZIP Code Locality ID Rate Factor
CREATE VIEW GEO_POL_RATING ASSELECT COVERAGE_ID ,GP.STATE_CODE ,GP.COUNTY_ID ,GP.LOCALITY_ID ,GP.ZIP_CODE ,RATE_FACTORFROM COUNTYZIP_LOCALITY_REL GP INNER JOIN RATE R ON GP.STATE_CODE = R.STATE_CODE AND GP.ZIP_CODE = COALESCE(R.ZIP_CODE,GP.ZIP_CODE) AND GP.LOCALITY_ID = COALESCE(R.LOCALITY_ID,GP.LOCALITY_ID) AND GP.COUNTY_ID = COALESCE(R.COUNTY_ID,GP.COUNTY_ID)WHERE R.PRIORTY_KEY =
Geo_Pol_Rating (SELECT MIN(R1.PRIORTY_KEY) FROM COUNTYZIP_LOCALITY_REL GP1 INNER JOIN RATE R1 ON GP1.STATE_CODE = R1.STATE_CODE AND GP1.ZIP_CODE = COALESCE(R1.ZIP_CODE,GP1.ZIP_CODE) AND GP1.LOCALITY_ID = COALESCE(R1.LOCALITY_ID,GP1.LOCALITY_ID) AND GP1.COUNTY_ID = COALESCE(R1.COUNTY_ID,GP1.COUNTY_ID) WHERE GP.STATE_CODE = GP1.STATE_CODE AND GP.COUNTY_ID = GP1.COUNTY_ID AND GP.LOCALITY_ID = GP1.LOCALITY_ID AND GP.ZIP_CODE = GP1.ZIP_CODE) ;
Correlated Subquery
58
Exception Based View cont.
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Rate
Coverage IDState CodeCounty ID ZIP Code Locality ID Rate Factor
Geo_Pol_Rating
SELECT GPR.RATE_FACTORFROM GEO_POL_RATING GPRWHERE GPR.STATE_CODE = 'RI' AND GPR.COUNTY_ID = 'PROVIDENCE' AND GPR.LOCALITY_ID = 'PROVIDENCE' AND GPR.ZIP_CODE = '02906';
RATE_FACTOR----------- 6.2700
1 record(s) selected.
59
RELATIONS?WHO NEEDS THEM?
Subliminal Requirements
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
60
Dumb, Simple SQL Join
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
SELECT E.EMPNO FROM
EMPLOYEE E, DEPARTMENT D
WHERE E.WORKDEPT = D.DEPTNO
;
ED
61© Bayard Lee Tysor, Inc. 2009-2010
NEDB2UG March 25, 2010
Do Constraints Matter?
No Indexes or RI
SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
62
What Does a Primary Index Buy Us?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
One Primary Index
SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
63
What Does A Foreign Key Constraint Buy Us?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
One Foreign Key Constraint
SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
64
What Does a Secondary Index Buy Us?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
One Secondary Index & RI
SELECT E.EMPNO FROM EMPLOYEE E ,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
65
Does it Matter with Views?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
• Views!CREATE VIEW JOINVIEW_ED (EMPNO)AS
SELECT E.EMPNO
FROM EMPLOYEE E
,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
;
66
Same!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CREATE VIEW JOINVIEW_ED (EMPNO)AS
SELECT E.EMPNO
FROM EMPLOYEE E
,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
;
SELECT *FROM JOINVIEW_ED;
No Indexes or RI
67
Same!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CREATE VIEW JOINVIEW_ED (EMPNO)AS
SELECT E.EMPNO
FROM EMPLOYEE E
,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
;
SELECT *FROM JOINVIEW_ED;
One Primary Index
68
Same!!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CREATE VIEW JOINVIEW_ED (EMPNO)AS
SELECT E.EMPNO
FROM EMPLOYEE E
,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
;
SELECT *FROM JOINVIEW_ED;
One Foreign Key Constraint
69
Same!!!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
CREATE VIEW JOINVIEW_ED (EMPNO)AS
SELECT E.EMPNO
FROM EMPLOYEE E
,DEPARTMENT DWHERE E.WORKDEPT = D.DEPTNO
;
SELECT *FROM JOINVIEW_ED;
One Secondary Index & RI
70
Redundant Join Elimination ..
Very Powerful
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Elimination of redundant joins between tables related through an RI constraint
Employee
empno
workdept
Department
deptno
SELECT * FROM Employee E, Department DWHERE workdept = deptno
SELECT empno, workdept FROM Employee WHERE workdept is not null
Original ViewEmpDeptView
Rewritten SQL
SELECT empno, workdept FROM EmpDeptViewWHERE workdept = deptno SQL
71
CONSTRAINTS?WHO NEEDS THEM?
Where to Use Them, Why They Matter!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
72
Constraints• Primitive constraints
– data type– NOT NULL– unique indexes– DEFAULT
• Table CHECK Constraints• Referential Integrity
– Primary Key Constraints– Unique Key Constraints– Foreign Key Constraints
• Triggers (may be)• Constraints on Views "WITH CHECK OPTION"
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
73
UNION ALL Views
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
• Providing SELECT Transparency
UNION ALL
UNION ALL
UNION ALL
UNION ALL
CREATE VIEW LOGICAL_TABLE ….. AS
SELECT columnsFROM LOGICAL_TABLE, other tablesWHERE some amazing filters
SELECT FROM
74
UNION Query - Rewrite• Optimizing Access Paths Containing UNION ALL
– DB2 tries to rewrite the query in this sequence:• Distribute qualified predicates• Prune the subselects (will also be done for
UNIONs) Use BETWEEN, IN or COL op literal for best
pruning• Distribute the joins
If results in more than 225 tables, then no distribution
• Distribute the aggregations (SUM & COUNT) To calculate accurate averages even if parallel
• Avoid Materialization Search for index support for each query block Unavoidable for nullable sets of outer joins Unavoidable for > 225 tables after distribution
• Execution – Pruning Continues for :hostvars at execution time!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
75
Constraint Definition (CHECK)• CHECK Constraints
– Simple predicates (can use AND / OR but no subqueries)
– Limited to data in the row– Can use deterministic User Defined Functions - very
powerful– Defined using CREATE TABLE or ALTER TABLE– Dropped using ALTER TABLE
• CREATE TABLE students (name varchar(100), age int, CONSTRAINT agelimit CHECK (age >= 5 AND age <= 18));
• ALTER TABLE students DROP CONSTRAINT agelimit ;
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
76
Constraint Definition (RI)• Primary Key Constraints
– One per table– Enforced using unique index on NOT NULL
columns• Unique Key Constraints
– Can define more than one per table– Enforced using unique index on NOT NULL
columns• Foreign Key Constraints
– One or more columns– Associated with PRIMARY KEY or UNIQUE KEY
constraint– ON DELETE - CASCADE, SET NULL, RESTRICT, NO
ACTION– ON UPDATE - RESTRICT, NO ACTION– Referential Integrity can be self-referencing or
cyclic
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
77
Informational Check ConstraintExample 1: Create an employee table where a
minimum salary of $25,000 is guaranteed by the application
CREATE TABLE emp (empno INTEGER NOT NULL PRIMARY KEY,
name VARCHAR(20), firstname VARCHAR(20), salary INTEGER CONSTRAINT minsalary CHECK (salary >= 25000) NOT ENFORCED ENABLE QUERY OPTIMIZATION);
If later enforcement is desired:
ALTER TABLE emp ALTER CONSTRAINT minsalary ENFORCED
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
78
Informational RI ConstraintExample 2: Create a department table where the
application ensures the existence of departments to which the employees belong.
CREATE TABLE dept (deptno INTEGER NOT NULL PRIMARY
KEY, deptName VARCHAR(20), budget INTEGER);
ALTER TABLE emp ADD COLUMN dept INTEGER NOT NULL
CONSTRAINT dept_exist REFERENCES dept NOT ENFORCED ENABLE QUERY
OPTIMIZATION);
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
79
EXPLOITING CONSTRAINTS FOR QUERY OPTIMIZATION
To Prune or Not to Prune?
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
80
UNION ALL branch elimination
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Data stored in separate tables for each yearQuery needs 4Q/1995 data from UNION ALL View
S
SS S
U
T94 T96
T95
Select * from T95 where tdate >= '10/01/1995‘ and tdate <='12/31/1995
Select * from T96where tdate >= '10/01/1995' and tdate <= '12/31/1995 Without Check Constraints
Select * from T94where tdate >= '10/01/1995' and tdate <= '12/31/1995 Without Check Constraints
81
UNION ALL branch elimination
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
With check constraints we avoid compiling and executing redundant branches of the UNION
S
SS S
U
T94 T96
T95
Select * from T96where tdate >= '10/01/1995' and tdate <= '12/31/1995 and tdate >= '01/01/1996' and tdate <= '12/31/1996'
With Check Constraints
Select * from T94where tdate >= '10/01/1995' and tdate <= '12/31/1995 and tdate >= '01/01/1994' and tdate <= '12/31/1994'
With Check Constraints
Select * from T95 where tdate >= '10/01/1995‘ and tdate <='12/31/1995
82
Exploiting RI for Query Optimization• Group By Pushdown
• Group By + Truncated Order By Pushdown
• Rewrite of Outer Join to Inner join• Better filter factor estimation for multi-
column RI joins– Traditionally we use an independence assumption– Better column correlation information with RI
• Elimination of redundant joins in star-schema views
• Views often include more tables than query requires– RI allows us to prove that the joins are
redundant• RI information is exploited when
matching queries to Materialized Query Tables (MQTs)
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
83
Find top 20 stores in terms of total revenue, and the store name and city information:
select st.store_id, st.name, st.city, sum(f.sales) as sm
from salesF as f, store as st
where f.store_id=st.store_id
group by st.store_id, st.name, st.city
order by sm desc
fetch first 20 rows only;
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
Fact table
Dimension table
Ref. Integrity
salesF
store
Group By Pushdown Through
RI Joins
84
Group By Pushdown Through
RI Joins (Cont.)
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
f.store_id=st.store_id /* FK=PK */
sum(sales) as sm
group by store_id, name, city
store_id, name, city, sm
Group By Pushdownorder by sm descfetch first 20 rows only
store_id, name, city 2,000 rows
store_id, sales100,000 rows
salesF store
SortSort
S
join
GB
85
Group By Pushdown Through RI Joins
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
f.store_id=st.store_id /* FK=PK */
sum(sales) as smgroup by store_id
After Group By Pushdownorder by sm descfetch first 20 rows only
store_id, name, city, sm
store_id, sm2000 rows
store_id, name, city
salesF
store
SortSort
S
join
GB
86
Fetch First n Row (Truncated Sort) Pushdown
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
sum(sales) as sm
group by store_id
After Group By + Truncated Order By Pushdown
order by sm descfetch first 20 rows only
f.store_id=st.store_id /* FK=PK */
store_id, sm20 rows
store_id, name, city, sm
store_id, name, city
salesF
store
S
join
GB
SortSort
z/OS - Use the “Separate the Group By Work” method discussed in the Advanced SQL class page 42, use nested table expressions.
87
Exploiting RI When Matching MQTs• With a Summary table created on 5 tables .........
• CREATE TABLE dba.PG_SALESSUM AS (• SELECT l.lineid, pg.pgid, loc.country, loc.state,• YEAR(pdate) AS year, MONTH(pdate) AS month, • SUM(ti.amount) AS amount, COUNT(*) AS count• FROM stars.transitem AS ti, stars.trans AS t,• stars.loc AS loc, stars.pgroup AS pg, stars.prodline AS l• WHERE ti.transid = t.transid AND ti.pgid = pg.pgid AND pg.lineid = l.lineid AND t.locid =
loc.locid• GROUP BY loc.country, loc.state, year(pdate), month(pdate) l.lineid, pg.pgid, • ) DATA INITIALLY DEFERRED REFRESH IMMEDIATE;
• ...... the query on 3 tables will use the MQT with appropriate RI between transitem and pgroup and between pgroup and prodline
• SELECT YEAR(pdate) AS year, loc.country,• SUM(ti.amount) AS amount, COUNT(*) AS count• FROM stars.transitem AS ti, stars.trans AS t, stars.loc AS loc• WHERE ti.transid = t.transid AND t.locid = loc.locid • AND year(pdate) between 1990 and 1999• GROUP BY year(pdate), loc.country
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
88
Constraints SummaryCheck and Referential Integrity constraints push application rules down to the database
The DB2 Optimizer can exploit constraint information for better access plans
Informational constraints allows us to optimize queries without the overhead of enforcing
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
89
MANUAL QUERY REWRITE
Sometimes Necessary on all Platforms
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
90
A Typical Data Warehouse/BI Query• Initial cost of 16 million
timerons–WOULD NOT FINISH!
• Multiple DISTINCT Table Expressions
• Initial join involved all columns and all rows
• The very wide and very deep set was dragged through many more query steps
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
91
Before and After
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
LEFT JOIN
(SELECT DISTINCTFROM
LEFT JOIN
(SELECT DISTINCTFROM
INNER JOIN
INNER JOIN
(SELECT DISTINCTFROM
LEFT JOIN
SELECT DISTINCTFROM
LEFT JOIN
SELECT DISTINCTFROM
INNER JOIN
INNER JOIN
(SELECT DISTINCTFROM
(SELECT DISTINCTFROM
)
SELECT DISTINCTFROM
))
(SELECT DISTINCTFROM
GROUP BY ROLLUP ))GROUP BY ROLLUP )))
92
Conclusion• Data Matters
–So Do Constraints–So Does RI–So Do Views–So Do Access Paths–So Does Good Index Design–So Do MQTs!
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
93
Bibliography– E.F. Codd – “A Relational Model of Data Large
Shared Data Banks”– E. F. Codd – “Derivability, Redundancy and
Consistency of Relations Stored in Large Data Banks”
– Richard Snodgrass, et al – “Temporal Databases”– Robert R. Stoll – “Set Logic and Theory”– C.J. Date, et al – “Temporal Data & the Relational
Model”– C.J. Date – “The Database Relational Model: A
Retrospective Review and Analysis : A Historical Account and Assessment of E. F. Codd's Contribution to the Field of Database Technology”
– C.J. Date – “An Introduction to Database Systems”, Eighth Edition
© Bayard Lee Tysor, Inc. 2009-2010NEDB2UG March 25, 2010
B.L. “Tink” TysorBayard Lee Tysor, Inc.
401-965-2688
94
Utilizing Views, RI and Other Stuff for Performance
New England DB2 User Group (NEDB2UG)
March 25, 2009
Thank You