45
LOGO Simon Zeltser Towards Declarative Queries on Adaptive Data Structures ed on the article by Nicolas Bruno and Pablo Castro

Adaptive Data Structures

  • Upload
    aaron

  • View
    27

  • Download
    0

Embed Size (px)

DESCRIPTION

Adaptive Data Structures. Towards Declarative Queries on. Simon Zeltser. Based on the article by Nicolas Bruno and Pablo Castro. Contents. 1. Introduction. 2. LINQ on Rich Data Structures. 3. LINQ Query Optimization. 4. Conclusions and Discussion. Introduction. THE PROBLEM - PowerPoint PPT Presentation

Citation preview

Page 1: Adaptive Data Structures

LOGO

Simon Zeltser

Towards Declarative Queries onAdaptive Data Structures

Based on the article by Nicolas Bruno and Pablo Castro

Page 2: Adaptive Data Structures

Seminar in Database Systems Technion

Contents

Introduction1

LINQ on Rich Data Structures2

LINQ Query Optimization3

Conclusions and Discussion4

Page 3: Adaptive Data Structures

Introduction

THE PROBLEM There is an increasing number of applications

that need to manage data outside the DBMS Need for a solution to simplify the interaction

between objects and data sources Current solutions lack rich declarative query

mechanismTHE NEED

Unified way to query various data sourcesTHE SOLUTION

LINQ (Language Integrated Query)

Seminar in Database Systems Technion

Page 4: Adaptive Data Structures

IntroductionLINQ : Microsoft.NET 3.5 Solution

Accessing multiple data sources via the same API

Technology integrated into the programming language

Supports operations: Traversal – grouping, joins Filter – which rows Projection –which columns

var graduates = from student in students                   where student.Degree = “Graduate”

               orderby student.Name, student.Gender, student.Age

                  select student;

BUT… The default implementation is simplistic Appropriate for small ad-hoc structures in

memory

Seminar in Database Systems Technion

Page 5: Adaptive Data Structures

Introduction

THE GOAL OF THIS SESSION Introduce LINQ key principles Show model of customization of LINQ’s

Execution Model on Rich Data Structures Evaluate the results

Seminar in Database Systems Technion

Page 6: Adaptive Data Structures

LINQ – Enabled Data Sources

LINQ – High Level Architecture

Seminar in Database Systems Technion

C# 3.0 Visual Basic Other Languages…

LINQ To Objects

LINQ To Datasets

LINQ To XML

<xml>

Objects Databases XML

.NET Language Integrated Query (LINQ)

LINQ To SQL

LINQ To Entities

Page 7: Adaptive Data Structures

Compare two approachesIterationList<String> matches = new

List<String>();// Find the matchesforeach (string item in data) {

if (item.StartsWith("Eric")) {matches.Add(item);

}}

// Sort the matchesmatches.Sort();// Print out the matchesforeach (string item in matches)}

Console.WriteLine(item);{

LINQ// Find and sort matchesvar matches = from n in data

where n.StartsWith("Eric")orderby nselect n;

// Print out the matchesforeach (var match in

matches)}

Console.WriteLine(match);{

Seminar in Database Systems Technion

Page 8: Adaptive Data Structures

Language IntegrationLambda

ExpressionsFunctionint StringLength(String s) { return s.Length();{

QuerySyntax

var matches = from n in data where n.StartsWith("Eric") orderby n select n;

ExtensionMethods

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source,

Func<TSource, bool> predicate)

Anonymous

Types

var name = "Eric";var age = 43;var person = new { Name = "Eric", Age = 43 };var names = new [] {"Eric", "Ryan", "Paul" };foreach (var item in names)

Lambda Expression

s => s.Length();

var matches = data .Where(n => n.StartsWith("Eric")) .OrderBy(n => n) .Select(n => n)

Seminar in Database Systems Technion

Page 9: Adaptive Data Structures

LINQ - Example// Retrieve all CS students with more // than 105 pointsvar query =

from stud in studentswhere ( stud.Faculty == “CS” && stud.Points > 105)orderby stud.Points descendingselect new { Details = stud.Name +

“:” + stud.Phone };

// Iterate over resultsforeach(var student in query) {

Console.WriteLine(student.Details);}

Seminar in Database Systems Technion

Lambda Expressions

QuerySyntax

ExtensionMethods

AnonymousTypes

Page 10: Adaptive Data Structures

Customizing LINQ Execution ModelEXPRESSION TREES

LINQ represents queries as in-memory abstract syntax tree

Query description and implementation are not tied together

THE PROBLEM The default implementation of the operations uses

fixed, general purpose algorithms

SUGGESTED SOLUTION Change how the query is executed without changing

how it’s expressed Analyze alternative implementations of a given query

and dynamically choose the most appropriate version depending on the context.Seminar in Database Systems Technion

1 5 7

+

*

Page 11: Adaptive Data Structures

Customizing LINQ Execution Model (2)

PROBLEM EXAMPLE WHERE operator is implemented by performing a

sequential scan over the input and evaluating the selection predicate on each tuple!

Seminar in Database Systems Technion

var q = A.Where(x=>x<5).Select(x=>2*x);

int[] A = {1, 2, 3, 10, 20, 30};var q = from x in A

where x < 5 select 2*x;

foreach(int i in q)Console.WriteLine(i);

IEnumerable<int> res = new List<int>();foreach(int a in A)

if (AF1(a)) res.Add(AF2(a));return res;

IEnumerable<int> q = Enumerable.Project( Enumerable.Where(A, AF1), AF2);

bool AF1(int x) { return x<5; }int AF2(int x) { return 2*x; }

1

2

3Query Implementation:

Page 12: Adaptive Data Structures

Rich Data Structures - DataSet

DataSet objectDataTable object

DataRow

DataColumn

DataTable object

UniqueConstraint

UniqueConstraint

ForeignKeyConstraint

In-memory cache of data Typically populated from a database Supports indexing of DataColumns

via DataViews

Seminar in Database Systems Technion

We will use LINQ on DataSet for demonstrating query optimization techniques

Page 13: Adaptive Data Structures

LINQ on Rich Data StructuresEnable LINQ to work over DataSets.EXAMPLE

Given R and S – two DataTables

Seminar in Database Systems Technion

from r in R.AsEnumerable()join s in S.AsEnumerable()

on r.Field<int>(“x”) equals s.Field<int>(“y”)

select new { a = r.Field<int>(“a”), b = s.Field<int>(“b”) };

LINQ on DataSet

Standard C# Code

Interm. Language

Expression Tree

OptimizedExpression

Tree

Interm.Language

DataSetSelf-tuningState

Compile and run-time phases on an implementation of our prototype

Compile Time Run Time

Page 14: Adaptive Data Structures

Expression Tree Optimizer

Seminar in Database Systems Technion

Cost ModelQuery Cost Estimator

StatisticsManager

Self Tuning OrganizerQuery

AnalyzerIndex

ReorganizerOscillationManager

Our solution will be built according to the following architecture

Page 15: Adaptive Data Structures

Query Cost Estimator

Seminar in Database Systems Technion

Cost ModelStatisticsManager

Self Tuning OrganizerQuery

AnalyzerIndex

ReorganizerOscillationManager

Query Cost Estimator

Page 16: Adaptive Data Structures

Query Estimation - Cost Model

Follow traditional database approach: COST: {execution plans} -> [expected

execution time] Relies on:

a set of statistics maintained in DataTables for some of its columns

formulas to estimate selectivity of predicates and cardinality of sub-plans

formulas to estimate the expected costs of query execution for every operator

Seminar in Database Systems Technion

Page 17: Adaptive Data Structures

Cardinality EstimationReturns an approximate number of

rows that each operator in a query plan would output To reduce the overhead, we will use only

these statistical estimators: maxVal – maximum number of distinct

values minVal – minimum number of distinct

values dVal – number of distinct values in a

column If statistics are unavailable, rely on “magic

numbers” until automatically creation of statistics

Seminar in Database Systems Technion

Page 18: Adaptive Data Structures

Predicate Selectivity EstimationLet: σp(T ) be an arbitrary

expression.The cardinality of T is defined:

Card(σp(T )) =sel(p)· Under this definition we define:

COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of

table T): COST(T) = Card(T) * MEM_ACCESS_COST

Seminar in Database Systems Technion

Selectivity Estimation Predicate

sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)

sel(c0<=c<=c1)

For each p in {operators of T}

Average Cost Of Memory Access

Page 19: Adaptive Data Structures

Predicate Selectivity EstimationLet: σp(T ) be an arbitrary

expression.The cardinality of T is defined:

Card(σp(T )) =sel(p)· Under this definition we define:

COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of

table T: COST(T) = Card(T) * MEM_ACCESS_COST

Seminar in Database Systems Technion

Selectivity Estimation Predicate

sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)

sel(c0<=c<=c1)

For each p in {operators of T}

Average Cost Of Memory Access

c0minVal(c)

maxVal(c) Intuition:We model sel(co<=c<=c1) as the probability to get a “c” value in interval [c0, c1] among all possible “c” values

c1

c

Page 20: Adaptive Data Structures

Predicate Selectivity EstimationLet: σp(T ) be an arbitrary

expression.The cardinality of T is defined:

Card(σp(T )) =sel(p)· Under this definition we define:

COSTT(Execution Plan) = Σ (COST(p))EXAMPLE: Consider full table scan of

table T): COST(T) = Card(T) * MEM_ACCESS_COST

Seminar in Database Systems Technion

Selectivity Estimation Predicate

sel(p1)· sel(p2) sel(p1 ^ p2)sel(p1) + sel(p2)−sel(p1 ^ p2) sel(p1 v p2)(dVal(c))-1 sel(c = c0)

sel(c0<=c<=c1)

For each p in {operators of T}

Average Cost Of Memory Access

Consider now a join predicate: T1 c1=c2T2

Card(T1 c1=c2 T2)=

)()(*

)()(*))(),(min(

2

2

1

121

cdValTCard

cdValTCardcdValcdVal

Page 21: Adaptive Data Structures

Query Analyzer

Seminar in Database Systems Technion

Cost ModelStatisticsManager

Self Tuning OrganizerQuery

AnalyzerIndex

ReorganizerOscillationManager

Query Cost Estimator

Page 22: Adaptive Data Structures

Execution AlternativesRely on indexes on DataColumns

when possible Example: σa=7∧(b+c)<20

Seminar in Database Systems Technion

Full Table Scan a=7 b+c < 20

5

3 7

2 4

Index on “a”column

c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3

c b a3 1 776 3 232 34 58 14 79 9 923 4 73 1 8

c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3

c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3

c b a3 1 776 3 232 34 58 14 79 9 423 4 73 1 3

Alternative 1: Alternative 2:

Page 23: Adaptive Data Structures

Analyzing Execution Plans Global vs. Local Execution Plan –

EXAMPLE:

Seminar in Database Systems Technion

Join

Products Join

Carts FilterCustom

ers

Global Execution PlanLocal Execution Plan

HashJoin? IndexJoin? MergeJoin?

Page 24: Adaptive Data Structures

Enumeration Architecture Two phases:

First phase: Join reordering based on estimated cardinalities

Second phase: Choose the best physical implementation for each operator

EXAMPLE: Suppose we analyze JOIN operator. We evaluate the following JOIN implementations:

Hash Join Merge Join (inputs must be sorted in the join

columns) Index Join (index on the inner join column

must be available) Other possible calculation options

Choose the alternative with the smallest cost

Seminar in Database Systems Technion

Page 25: Adaptive Data Structures

Query Analysis

Seminar in Database Systems Technion

Cost ModelStatisticsManager

Self Tuning OrganizerQuery

AnalyzerIndex

ReorganizerOscillationManager

Query Cost Estimator

Page 26: Adaptive Data Structures

Self Tuning OrganizationWe want to reach the smallest query

execution time. Indexes can be used to speedup query

executionPROBLEM:

It might become problematic to forecast in advance what indexes to build for optimum performance

SOLUTION: Continuous monitoring/tuning component

that addresses the challenge of choosing and building adequate indexes and statistics automatically

Seminar in Database Systems Technion

Page 27: Adaptive Data Structures

Self Tuning Organization - ExampleConsider the following execution

plan:

Seminar in Database Systems Technion

The selection predicate Name=“Pam” over Customers DataTable can be improved if an index on Customers(Name) is built

Both hash joins can be improved if indexes I2 and I3 are available, since we can transform hash join into index join* The three sub-plans

enclosed in dotted lines might be improved if suitable indexes were present

Page 28: Adaptive Data Structures

Technion

Algorithm for automatic index tuning

Seminar in Database Systems

Page 29: Adaptive Data Structures

Index TuningHigh-Level Description:

Identify a good set of candidate indexes that would improve performance if they were available.

Later, when the optimized queries are evaluated, we aggregate the relative benefits of both candidate and existing indexes.

Based on this information, we periodically trigger index creations or deletions, taking into account storage constraints, overall utility of the resulting indexes, and the cost to creating and maintaining them.Seminar in Database Systems Technion

Page 30: Adaptive Data Structures

Technion

Algorithm for automatic index tuning

Seminar in Database Systems

Page 31: Adaptive Data Structures

Index tuning algorithmNotation:

H – a set of candidate indexes to materialize T – task set for query qi

Ii – either a candidate or an existing index δIi – amount that I would speed up query q

Seminar in Database Systems Technion

Task SetI1, δI1 I2, δI2 In, δIn . . …

H (initially empty)

Page 32: Adaptive Data Structures

Technion

Algorithm for automatic index tuning

Seminar in Database Systems

Page 33: Adaptive Data Structures

Index tuning algorithmNotation:

ΔI – value maintained for each index I Materialized index – already created one

SELECT query: ΔI = ΔI + δI UPDATE query: ΔI = ΔI – δI

Seminar in Database Systems Technion

Task SetI1, δI1 I2, δI2 In, δIn . . …

H

I1, δI1

I1

Page 34: Adaptive Data Structures

Index Tuning Algorithm

The purpose of ΔI:

Seminar in Database Systems Technion

We maintain ΔI on every query evaluation

If the potential aggregated benefit of materializing a candidate index exceeds its creation cost, we should create it, since we gathered enough evidence that the index is useful

Page 35: Adaptive Data Structures

Technion

Algorithm for automatic index tuning

Seminar in Database Systems

Page 36: Adaptive Data Structures

Index tuning algorithmRemove “bad” indexes phaseNotation:

Δmin – minimum Δ value for index I Δmax – maximum Δ value for index I BI – the cost of creating index I Residual(I) = BI – (Δmax – Δ)

(the “slack” an index has before being deemed “droppable)”

IF (Residual(I)) <= 0) THEN Drop(I) Net-Benefit(I) = (Δ-Δmin)-BI

(the benefit from creating the index)IF (Net-Benefit(I) >= 0) THEN Add(I)

Seminar in Database Systems Technion

Page 37: Adaptive Data Structures

Technion

Algorithm for automatic index tuning

Seminar in Database Systems

Page 38: Adaptive Data Structures

Index tuning algorithm

Notation: ITM – all the indexes from H which creation is

cost effective ITD – subset of existing indexes such that:

ITD fits in existing memory It’s still cost effective to create new index I

after possibly dropping members from ITD

If creating index I is more effective than maintaining existing indexes in ITD, DROP(ITD) && CREATE(I)

Remove I from H (set of candidate indexes to materialize)

Seminar in Database Systems Technion

Page 39: Adaptive Data Structures

Experimental Evaluation

Seminar in Database Systems Technion

checkCarts($1) =from p in Products.AsEnumerable()join cart in Carts.AsEnumerable()

on p.Field<int>("id") equals cart.Field<int>("p_id")join c in Customers.AsEnumerable()

on cart.Field<int>("cu_id") equals c.Field<int>("id")where c.name = $1 select new { cart, p }

Possible IndexesI1 Categories(par_id)I2 Products(c_id)I3 Carts(cu_id)I4 Products(ca_id)I5 Customers(name)

browseProducts($1) =from p in Products.AsEnumerable()join c in Categories.AsEnumerable()on p.Field<int>("ca_id") equalsc.Field<int>("id")where c.par id = $1select pGenerated:

• 200,000 products• 50,000 customers• 1,000 categories• 5,000 items in the shopping

carts

Consider the following schema:

Page 40: Adaptive Data Structures

Execution plans for evaluation queries

Seminar in Database Systems Technion

Page 41: Adaptive Data Structures

Experimental Evaluation – Cont.

Seminar in Database Systems Technion

Generated schedule when tuning was disabled

Page 42: Adaptive Data Structures

Experimental Evaluation – Cont.

Seminar in Database Systems Technion

Generated schedule when tuning was enabled

Page 43: Adaptive Data Structures

Summary

We’ve discussed: LINQ – for declarative query formulation DataSet - a uniform way of representing in-

memory data. A lightweight optimizer for automatically

adjusting query execution strategies

Article’s main contribution: NOT a new query processing technique BUT: careful engineering of traditional

database concepts in a new context

Seminar in Database Systems Technion

Page 44: Adaptive Data Structures

LOGO

Simon Zeltser

Page 45: Adaptive Data Structures

LINQ Execution Model

Seminar in Database Systems Technion

Compiler merges LINQ

extension methods

Query syntax is converted to function calls and lambda expressions

Lambda expressions are converted to expression trees

Compiler finds a query pattern

Query is executed

lazily

Compiler infers types produced by queries

Adds query operations to IEnumerable<T>

At compile time Expressions are evaluated at run-time

Parsed and type checked at compile-time

Datasets are strongly typed

Operations ondata sets are strongly typed

Specialized or base Can optimize and re-write query

Expressions and operationscan execute remotely At run-time, when results are used We can force evaluations (ToArray())