auto-tuning database kernels 2 - Harvard...

auto-tuning database kernels 2.0prof. Stratos Idreos

HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/

class 22

Stratos Idreos

a bit more about auto-tuning db kernels

and then a couple of open research topics

Stratos Idreos

storage indexing queryschema loadX X X

be able to query the data immediately & with good performance

raw data explore data and gain knowledge “immediately”

Stratos Idreos

database cracking

Stratos Idreos

database cracking

idle timeworkload knowledge

external tools

human control

Stratos Idreos

database crackingauto-tuning database kernels

incremental, adaptive, partial indexing

external tools

human control

Stratos Idreos

indexing

initialization querying

external tools

human control

Stratos Idreos

indexing

external tools

human control

Stratos Idreos

indexing

external tools

human control

every query is treated as an advice on how data should be stored

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

column AQ1: select R.A from R where R.A>10 and R.A<14

Database Cracking CIDR 2007

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

gain knowledge on how data is organized

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

dynamically/on-the-fly within the select-operator

gain knowledge on how data is organized

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

Q2: select R.A from R where R.A>7 and R.A<=16

dynamically/on-the-fly within the select-operatorDatabase Cracking CIDR 2007

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

4 2 1 3 6 7 9 8 13 12 11 14 16 19

piece1: A<=7

piece2: 7<A<=10Q2: select R.A from R where R.A>7 and R.A<=16

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

4 2 1 3 6 7 9 8 13 12 11 14 16 19

piece1: A<=7

piece2: 7<A<=10

piece3: 10<A<14Q2: select R.A from R where R.A>7 and R.A<=16

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

4 2 1 3 6 7 9 8 13 12 11 14 16 19

piece1: A<=7

piece2: 7<A<=10

piece3: 10<A<14

piece4: 14<=A<=16

piece5: A>16

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

4 2 1 3 6 7 9 8 13 12 11 14 16 19

piece1: A<=7

piece2: 7<A<=10

piece3: 10<A<14

piece4: 14<=A<=16

piece5: A>16

Stratos Idreos

13 16 4 9 2 12 7 1 19 3 14 11 8 6

4 9 2 7 1 3 8 6 13 12 11 16 19 14

piece1: A<=10

piece2: 10<A<14

piece3: A>=14

4 2 1 3 6 7 9 8 13 12 11 14 16 19

piece1: A<=7

piece2: 7<A<=10

piece3: 10<A<14

piece4: 14<=A<=16

piece5: A>16

the more we crack, the more we learn

Stratos Idreos

100K random selections random selectivity random value ranges in a 10 million integer column

set-upScan

Full Index

1 10 100 1000 10000 100000

Query sequence (x1000)

continuous adaptation

Stratos Idreos

almost no initialization overhead

set-upScan

Full Index

1 10 100 1000 10000 100000

Stratos Idreos

continuous improvement

set-upScan

Full Index

1 10 100 1000 10000 100000

Stratos Idreos

continuous improvement

set-upScan

Full Index

1 10 100 1000 10000 100000

cracking databases

updates (SIGMOD07)

storage-restrictions (SIGMOD09)

benchmarking (TPCTC10)

concurrency control

(PVLDB12)

algorithms (PVLDB11)

basics (CIDR07)

multi-cores (SIGMOD15)

hadoop (Yale/Saarland)

b-trees (HP Labs)

>1 columns (SIGMOD09)

robustness (PVLDB12)

adaptive storage

(SIGMOD14)

time-series (SIGMOD14)

encryption (SIGMOD16)

Stratos Idreos

concurrency control

Concurrency Control, PVLDB 12

problem: read queries become write queries! (?)

goal: be able to crack for multiple queries in parallel

Stratos Idreos

traditional indexing adaptive indexing

Stratos Idreos

traditional indexing adaptive indexingwrite queries read queries

Stratos Idreos

change index contents and structure

only index structure changes

write queries read queries

Stratos Idreos

no need for traditional locks = too heavy

short term latches = fast and release quickly

Stratos Idreos

all or nothing incremental and optional

Stratos Idreos

v1 v2 v3

Stratos Idreos

v1 v2 v3

Stratos Idreos

v1 v2 v3 v1 v2 v3

Stratos Idreos

v1 v2 v3 v1 v2 v3

Stratos Idreos

v1 v2 v3 v1 v2 v3

v1 v2 v3a

Stratos Idreos

v1 v2 v3 v1 v2 v3

v1 v2 v3a

Stratos Idreos

impact stable storage stable storage optional

Stratos Idreos

v1 v2 v3 v1 v2 v3a b

Stratos Idreos

need to serialize can execute in any order

Stratos Idreos

select min(C) from R where A<10 & B<20

diskA B C D

Stratos Idreos

A<10disk memory

A B C D

Stratos Idreos

A<10disk memory

A B C D IDs

Stratos Idreos

A<10disk memory

A B C D IDs B

Stratos Idreos

B<20A<10disk memory

A B C D IDs B

Stratos Idreos

B<20A<10disk memory

A B C D IDs B CIDs

Stratos Idreos

B<20 minCA<10disk memory

A B C D IDs B CIDs

Stratos Idreos

B<20 minCA<10disk memory

A B C D IDs B CIDs

column lock and release as soon as an operator completes

Stratos Idreos

need to latch only to be cracked pieces (max 2 per select)

select [a,b]

Stratos Idreos

piece locking

avl-tree crack column

Stratos Idreos

piece locking

avl-tree crack columncrack select

Stratos Idreos

piece locking

avl-tree crack columncrack selectwlock

Stratos Idreos

piece locking

avl-tree crack columncrack select

Stratos Idreos

piece locking

avl-tree crack column

Stratos Idreos

piece locking

avl-tree crack columnmax

Stratos Idreos

piece locking

avl-tree crack columnmaxrlock

Stratos Idreos

piece locking

Stratos Idreos

piece locking

Stratos Idreos

piece locking

Stratos Idreos

avl-tree crack column1090

piece locking

Stratos Idreos

avl-tree crack column crack select1090

piece locking

Stratos Idreos

avl-tree crack column crack selectwlock

piece locking

Stratos Idreos

r/wlock

piece locking

Stratos Idreos

q1,q2,q3...qn1090

piece locking

Stratos Idreos

q1,q2,q3...qn1090

piece locking

Stratos Idreos

q1,q2,q3...qn1090

piece locking

Stratos Idreos

1 2 4 8 16 32

1024 Q

ueries

Number of Concurrent Clients

scansort

1 2 4 8 16 32

ughput ove

r 1024 Q

ueries

(Queries

Number of Concurrent Clients

scansort

1024 Q

ueries

(Sequential Execution)

Concurrency Control

Enabled Disabled

Rows: 100M Query: select sum(A) from R where v1 < A1 < v2 Selectivity: 0.01%, Random, # of queries: 1024 Clients: 1-32, Machine: 4 cores

adaptive indexing maintains its performance advantage

Stratos Idreos

crack time and wait time

1 10 100 1000

Query sequence

index refinementwait time

Query: select sum(A) from R where v1 < A1 < v2 Selectivity: 0.01%, Random, # of queries: 1024 Clients: 8, Machine: 4 cores

adaptive indexing maintains

the adaptive behavior

Stratos Idreos

stochastic cracking robustness

Stochastic Cracking, PVLDB 12

Stratos Idreos

adaptive indexing

v4v5 v3 v2 v1

Stratos Idreos

Response time: X

adaptive indexing

v4v5 v3 v2 v1

Stratos Idreos

Response time: X

adaptive indexing

v4v5 v3v2 v1

Stratos Idreos

Response time: X

adaptive indexing

Response time: 1000X

v4v5 v3v2 v1

Stratos Idreos

v4v5 v3 v2 v1

Stratos Idreos

v4v5 v3 v2 v1 v1 v2 v3 v4 v5

Stratos Idreos

v4v5 v3 v2 v1

v1 v2 v3 v4 v5

Stratos Idreos

v4v5 v3v2 v1

v1 v2 v3 v4 v5

Stratos Idreos

v4v5 v3v2 v1

v1 v2 v3 v4 v5

Stratos Idreos Stochastic Cracking, PVLDB 12

Stratos Idreos

select [15,55]

Stratos Idreos

select [15,55]

Stratos Idreos

10 20 30 40 50 60

select [15,55]

Stratos Idreos

10 20 30 40 50 60

select [15,55]

Stratos Idreos

10 20 30 40 50 60

select [15,55]

Stratos Idreos

10 20 30 40 50 60

select [15,55]

Stratos Idreos

10 20 30 40 50 60

select [15,55]

the amount of work for each query depends on the index state

the state of the index depends on past queries patterns

Stratos Idreos

good sequence

column with 100 unique integers [1,100]

Stratos Idreos

good sequence

column with 100 unique integers [1,100]<50

Stratos Idreos

good sequence

Stratos Idreos

good sequence

Stratos Idreos

good sequence

column with 100 unique integers [1,100]<50<25

Stratos Idreos

good sequence

Stratos Idreos

good sequence

Stratos Idreos

q1q2 q3

good sequence

column with 100 unique integers [1,100]<50<25 <75

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2N

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2NN-1

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2NN-1

good sequence

bad sequence

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2 q3NN-1

good sequence

bad sequence

<2 <3 <4

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2 q3NN-1N-2

good sequence

bad sequence

<2 <3 <4

Stratos Idreos

NN/2N/2

q1q2 q3

q1 q2 q3NN-1N-2

good sequence

bad sequence

<2 <3 <4

blindly adapting to queries is not always a good idea

Stratos Idreos Stochastic Cracking, PVLDB 12

query driven

Stratos Idreos

to be cracked

query driven

Stratos Idreos

to be cracked

q random

query driven

Stratos Idreos

to be cracked

q random

progressive cracking

query driven

Stratos Idreos

to be cracked

q random

q1: <v1progressive cracking

query driven

Stratos Idreos

to be cracked

q random

q1: <v1 crack + filter <v1progressive cracking

query driven

Stratos Idreos

to be cracked

q random

random

q1: <v1 crack + filter <v1progressive cracking

query driven

Stratos Idreos

to be cracked

q random

random

q1: <v1 crack + filter <v1

scan + filter <v1

progressive cracking

query driven

Stratos Idreos

to be cracked

q random

randomq1: <v1progressive cracking

q2: <v2

query driven

Stratos Idreos

to be cracked

q random

scan + filter <v2q2: <v2

query driven

Stratos Idreos

to be cracked

q random

scan + filter <v2

crack + filter <v2

q2: <v2

query driven

Stratos Idreos

to be cracked

q random

scan + filter <v2

crack + filter <v2

q2: <v2

query driven

Stratos Idreos

cracking on Skyserver (4TB) (Sloan Digital Sky Survey, www.sdss.org)

cracking answers 160.000 queries while full indexing is still half way creating one index

Stratos Idreos

multi-core utilization

Multi-cores, SIGMOD 15

Stratos Idreos Multi-cores, SIGMOD 15

select [a,b]

core 1 core 2

Stratos Idreos

select [a,b]

Stratos Idreos

select [a,b]

Stratos Idreos

select [a,b]

<a >=a

Stratos Idreos

select [a,b]

<a >=a

core 1 core 2 core 3 … core N

Stratos Idreos Multi-cores, SIGMOD 15

Stratos Idreos

core 1 core 2 core 3

core N…

Stratos Idreos

core N…

Stratos Idreos

core N…

Stratos Idreos

core N…

Stratos Idreos

core N…

<a >=a

Stratos Idreos

problem: cores may be under utilized

goal: either fully utilize a core or shut it down

Holistic, SIGMOD 15

Stratos Idreos

scheduler

monitoring CPU utilization

worker 1worker 2

worker 3worker 4

worker 5

worker 6worker 7

worker 8

thread pool

when there is an underutilized CPU, pin a thread to it and to do a cracking task

Stratos Idreos

index space

partitions size - access frequency - hit ratio

random works best

Stratos Idreos

108 tuples - 10 attributes, random queries

1 10 100 1000

Query sequence

no indexing

offline indexing

online indexing

adaptive indexing

holistic indexing

(a) Performance

AdaptiveIndexing

HolisticIndexing

(b) Performance Breakdown

0 2 4 6 8 10

adaptive indexing

holistic indexing

(c) Index Partitions

2 4 6 8 10 12 14 0

Holistic Worker Activation

#workers

(d) Idle CPU UtilizationFigure 6: Improving performance with holistic indexing.

center slice is contiguous, while the remaining n� 1 slices con-sist of two disjoint halves, each, that are arranged concentricallyaround the center slice (xi and yi indicate the first and the last el-ement of piece i respectively). n threads crack the n slices inde-pendently applying a vectorized, out-of-place cracking algorithm(Figure 5), which was proven in [44] to be the most CPU efficientsingle-threaded cracking implementation reported so far. Finally,the local data are merged into a big cracked piece (Figure 4(b)).We found that devoting all resources to perform adaptive indexingfor user queries in parallel does not lead to the absolute best per-formance. Specifically, we found that we can improve performanceeven more by assigning part of the resources to holistic indexing. Inthis way, some of the CPU resources are assigned to parallel crack-ing for user queries but the rest of the CPU resources are distributedacross several holistic workers for additional index refinements. Inthe experimental section we show why this approach is better thanassigning all available resources to user queries.

5. EXPERIMENTAL ANALYSISIn this section, we demonstrate that holistic indexing leads to a

self-organizing always-on DBMS with substantial benefits in termsof response time; with zero administration or set-up effort holisticindexing improves performance adaptively by exploiting all avail-able CPU resources to the maximum. We present a detailed exper-imental analysis using both standard benchmarks such as TPC-Hand real-life workloads such as SkyServer as well as synthetic mi-crobenchmarks for a fine-grained analysis.

We use a dual-socket machine equipped with two 2.00 GHz In-tel(R) Xeon(R) CPU E5-2650 processors and with 256 GB RAM.Each processor has 8 hyper-threading cores resulting in 32 hard-ware threads in total. The operating system is Fedora 20 (kernelversion 3.12.10). All experiments we report are based on an im-plementation of holistic indexing in MonetDB and assume a main-memory environment.

5.1 Improving over State-of-the-Art IndexingIn our first experiment we demonstrate that holistic indexing

has the potential to bring substantial performance improvementsover existing state-of-the-art indexing approaches. We test holisticindexing against parallel versions of adaptive indexing (databasecracking), offline indexing, online indexing and plain scans.

For plain scans (no indexing), we use a parallel select opera-tor implemented in MonetDB. For offline and online indexing wesort the columns using a highly parallel NUMA-aware sorting algo-rithm that was introduced in [9] (m-way, 16-byte keys) and is pub-licly available in [1]. Specifically, in offline indexing we pre-sortall the columns before query processing, while in online indexing

we assume that after processing a few queries we understand theworkload patterns and then we sort the relevant columns. MonetDBautomatically detects that a column is sorted and can use efficientbinary search actions during select operations. For adaptive index-ing we use the parallel vectorized database cracking algorithm thatwas introduced in [44] (see Section 4.2).

Here we use a synthetic benchmark. The query workload con-sists of 103 range select queries over a table of 10 attributes; eachquery touches a single attribute (we will see more complex querieslater on). Each attribute consists of 230 uniformly distributed inte-gers, while the value range requested by each query (and thus theselectivity) is random. All queries are of the following form.

select A from R where A < v

The tested scenario assumes a dynamic and ad-hoc environmentwith zero workload knowledge and zero idle time to pre-index thedata. Figure 6(a) shows the results. On the x-axis queries are rankedin execution order. The y-axis represents the cumulative responsetime as the query sequence evolves, i.e., each point (x,y) representsthe sum of the execution time y for the first x queries. In this way,the graph shows how the response times evolve as we process moreand more queries.

If there is no indexing support (plain scans), the entire column isscanned in parallel by 32 threads for every query. Because of thisstable access pattern, the cumulative response time of the query se-quence grows linearly as every query has similar cost. With offlineindexing, on the other hand, it takes 12 seconds to completely sorteach column, assuming a priori workload knowledge. This leads toa 120 seconds initialization overhead to sort all 10 columns. Sincethere is no idle time before the first query, the sorting cost is addedto the execution time of the first query in Figure 6(a). After thefirst query, all queries are answered with fast binary search actionswhich results in a rather flat cumulative curve. With online index-ing, the first 100 queries are answered without any index supportand thus the cost grows linearly. After the 100th query, assumingenough workload knowledge has been obtained via monitoring, weproceed to sort all 10 columns. The sorting cost is added to theexecution cost of the 101st query, since there is no idle time be-tween the 100th and the 101st query. As with offline indexing, allsubsequent queries are answered very fast with binary search overthe sorted column. Thus, both in offline and in online indexing,the whole query sequence is affected by the sorting costs. On theother hand, adaptive indexing continuously improves performancerequiring no workload knowledge and without penalizing individ-ual queries. This improvement is due to the fact that adaptive in-dexing builds only partial indices which it incrementally refines asqueries arrive. However, there is still room for big improvements.

Stratos Idreos

NORMALIZED DATA DENORMALIZED DATA

Stratos Idreos

good for updates, storage but we need joins

Stratos Idreos

good for updates, storage but we need joins

only fast scans but expensive to create,

storage & updates

Stratos Idreos

adaptive denormalization

normalized data

first award in ACM SIGMOD undergrad

research competition

Stratos Idreos

normalized data possible denormalized space

Stratos Idreos

continuously physically reorganize data based on incoming query patterns (joins)

normalized data possible denormalized space

Stratos Idreos

normalized data

denormalized fragments queries only need to fast scan

possible denormalized space

Stratos Idreos

normalized data

Stratos Idreos

normalized data

Stratos Idreos

normalized data

Stratos Idreos

[IBMbigdata]years

[StratosGuess]

llsdata system design,

set-up, tune, use

Stratos Idreos

data systems that are easy to design(storage, data flow, algorithms, tuning, etc)

Stratos Idreos

SystemDesign

FullImplemention

PrototypeImplementation

workloads, h/wexpected properties

(performance, throughput,

energy, budget, …)

application specific workload, h/w & expected properties

Set-up &

4 Auto-tuningduring query processing

Design & DevelopmentRoles: Architects & Developers

Set-up and TuningRoles: Database Administrators

Stratos Idreos

say we need a system for workload X (data/access patterns): should we strip down a relational system?

should we build up a key-value store or main-memory system? should we build something from scratch?

Stratos Idreos

say the workload (read/write ratio) shifts (e.g., due to app features):should we use a different data layout for base data - diff updates?

should we use different indexing or no indexing?

Stratos Idreos

say we buy new hardware X (flash/memory): should we change the size of b-tree nodes?

should we change the merging strategy in our LSM-tree?

Stratos Idreos

say we buy new hardware X (flash/memory): should we change the size of b-tree nodes?

should we change the merging strategy in our LSM-tree?

say we want to improve response time:would it be beneficial if we would buy faster flash disks?

would it be beneficial if we buy more memory? or just spend the same budget on improving software?

Stratos Idreos

# of design options?

Stratos Idreos

# of design options?

> 1 QUINTILLION(and counting…)

Stratos Idreos

data systems design (and research) is kind of an art(both in the good & in the bad sense)

Stratos Idreos

data systems design (and research) is kind of an art

but can we keep up?

(both in the good & in the bad sense)

Stratos Idreos

disk memory flash …

1 easily utilize past concepts/designs

Stratos Idreos

do not miss out on cool ideas and concepts

1996 1999 2002 2005 2008 2011 2014

P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996

Stratos Idreos

do not miss out on cool ideas and concepts

1996 1999 2002 2005 2008 2011 2014

P. O’Neil, E. Cheng, D. Gawlick, E, O'Neil The log-structured merge-tree (LSM-tree) Acta Informatica 33 (4): 351–385, 1996

Google publishes BigTable

Stratos Idreos

data+queries+hardware

data system

self-designing data systems

can we design new systems in weeks instead of years?

Stratos Idreos

data+queries+hardware

data system

self-designing data systems

easy to design

adapt to environment

can we design new systems in weeks instead of years?

Stratos Idreos

INTERACTIVE DATA SYSTEM DESIGN/TUNING/TESTING

datasystem

designer

Stratos Idreos

How to model/abstract design decisions?

How much can happen automatically (auto-design)?

Stratos Idreos

How to model/abstract design decisions?

How much can happen automatically (auto-design)?

move from design based on intuition & experience only to a more formal and systematic way to design systems

Stratos Idreos

row-store

column-store

key-value store

hybrid store

ACM SIGMOD Blog, June’15

1. write/extend modules in a high level language

(optimizations)2. modules=

storage/execution/data flow

3. try out >>1 designs (sets of modules)

Stratos Idreos

data systems that are easy to use

dbTouch

show me something interesting

Queriosity

cidr2013/icde2014

bigdata2015

can we design systems where no SQL is needed?

cracking example

dbTouch CIDR 2013

Stratos Idreos

data systems todayallow us to answer queries fast

data systems tomorrow should allow us to find fast which queries to ask

db explore

Stratos Idreos

a semester of quizzes and brainstorming

option between systems project (key-value store) & research with DASlab

(only for CS165 students or otherwise advanced students)

the end

\*project deadline Dec 21 OH and labs continue until then*/

DASlab productions 2016

learn the concepts, not just the techniques learn to adapt

it all starts with how we store the data yes you can do research

auto-tuning database kernels 2 - Harvard...

Documents

Automatic Performance Tuning of Sparse Matrix Kernels · PDF fileAutomatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc B.S. (Cornell University) 1997 A dissertation

Kernels Review

Reproducing Kernels in Hilbert Spaces - math.unl.edujorr1/classes/2009summer/math896/... · Reproducing Kernels in Hilbert Spaces Thomas Clark August 10 - 11, 2009. Reproducing Kernels

Ressort: An Auto-Tuning Framework for Parallel Shuffle Kernelskrste/papers/EECS-2015-246.pdf · Ressort: An Auto-Tuning Framework for Parallel Shuffle Kernels Eric Love Electrical

EMPIRICALLY TUNING HPC KERNELS WITH IFKOhomes.sice.indiana.edu/rcwhaley/theses/sujon_phd.pdf · continuous support of my Ph.D. study and research, for his patience, encouragement,

Real Time Kernels

Differentiable Programming for Image Processing and Deep ... · tuning image processing pipelines x latent image camera solve inverse problems kernels deep learning!10 Another application

Building and Auto-Tuning Computing Kernels: …Julien Bigot, Virginie Grandgirard, Guillaume Latu, Jean-François Méhaut, Luís Felipe Millani, et al.. Building and Auto-Tuning Computing

Extensible Kernels

Automatic Performance Tuning of Numerical Kernels BeBOP: Berkeley Benchmarking and OPtimization

Knn Kernels Margin

Signed Kernels

bebop.cs.berkeley.edubebop.cs.berkeley.edu/pubs/thesis.pdf1 Abstract Automatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc Doctor of Philosophy in Computer

SAGE: Self-Tuning Approximation for Graphics Engines · PDF fileSAGE: Self-Tuning Approximation for Graphics Engines ... SAGE Generates Multiple Approximate Kernels SAGE Monitors the

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress Jim Demmel, Kathy Yelick Berkeley Benchmarking and OPtimization (BeBOP) Project

COE 509 Parallel Numerical Computing Lecture 4: The Future of Numerical Linear Algebra Libraries: Automatic Tuning of Sparse Matrix Kernels The Next LAPACK

Graph Kernels in Chemoinformatics - GREYC · Introduction Graph Kernels Treelets kernel Cyclic similarity Conclusion Extensions Plan 1 Introduction 2 Graph Kernels 3 Treelets kernel

Approximate Tree Kernels

Automatic Performance Tuning Sparse Matrix Kernels

Kernels - Vermont Foodbank