13
Apache Derby Query Optimizer Improvements CS 4420 Group 13 Presented by: Nufail, Nadeeshani, Amila, Malith

Query Optimizer Improvements for Apache Derby

Embed Size (px)

DESCRIPTION

Possible Improvements for Derby Query Optimizer disscussed

Citation preview

Page 1: Query Optimizer Improvements for Apache Derby

Apache Derby Query Optimizer

Improvements

CS 4420

Group 13 Presented by: Nufail, Nadeeshani, Amila, Malith

Page 2: Query Optimizer Improvements for Apache Derby

• Evaluates the least cost execution plan to be

sent to the evaluation engine

• A key factor in deciding DBMS performance

• Cost-based vs. Heuristic

Query optimization

Page 3: Query Optimizer Improvements for Apache Derby

Derby Optimizer

• Considers left-deep trees

• Represent the tables in an array

• Goes through the search space in depth-first

manner

• Exhaustive search of query plans

• Cost based search space pruning

Page 4: Query Optimizer Improvements for Apache Derby
Page 5: Query Optimizer Improvements for Apache Derby

Complex join queries for 8 relations, each with 400 records

Query H2

embedded (ms)

PostgreSQL

(ms)

Derby

(ms)

1 28 43 1416

2 438 26 151733

3 420 35 147261

4 84256 31 125356

5 312 52 2026

6 63456 68 142458

Performance Statistics

Page 6: Query Optimizer Improvements for Apache Derby

Concurrency

• Derby optimizer's poor design is its main drawback,

which executes serially

• Uses two while loops to iterate through each join order,

and access path per each order

o getNextPermutation()

o getNextDecoratedPermutation()

o costPermutation()

• As easy approach: Use loop-parellel programming

pattern.

• Make each iteration independent and execute each

iteration in new threads.

Page 7: Query Optimizer Improvements for Apache Derby

Bushy Trees

• Left deep trees & Bushy trees

• More flexibility in query plan generation

• Has a large search space

• Best plan may be bushy

Page 8: Query Optimizer Improvements for Apache Derby

Bushy Trees contd.

More than half of the queries have better

solutions in the bushy tree solution space M. Steinbrunn, et. al. 1997. Heuristic and randomized optimization for the join ordering problem

Page 9: Query Optimizer Improvements for Apache Derby

Randomized Algorithms

• Deterministic - Start from base relations

and build plans by adding one relation at

each step

• Randomized - Search for optimal solutions

around a particular starting point

• Trade optimization time for execution time

• No guarantee of the best solution

• Useful for joins with a high number of

relations

Page 10: Query Optimizer Improvements for Apache Derby

Heuristics For Timeout nImprovements

• Consequence of a miserable timeout

value

• Time wasted for generation of numerous

plans + estimating their costs

• Applicability of optimal solution over sub-

optimal

• Heuristic based values for timeout

• Improvement over time

Page 11: Query Optimizer Improvements for Apache Derby

Genetic Algorithms

• Genetic?

• Used by PostgreSQL

• Evolution

o Removal of least fit individuals

o Recombination of individuals of high fitness

• Initial population: query plans with

possible join orders

• Fitness function: to minimize cost

• Lower cost join sequence has higher

fitness

Page 12: Query Optimizer Improvements for Apache Derby

Questions?

Page 13: Query Optimizer Improvements for Apache Derby

Thank you!