Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Query processing for parallel languages
Brandon Myers, Mark Oskin, Bill Howe [email protected]
DB Day 2015
1
2slide src: Jeff Gardner
How to turn astrophysics simulation output into scientific knowledge
3slide src: Jeff Gardner
How to turn astrophysics simulation output into scientific knowledge
4slide src: Jeff Gardner
5
GASOLINE (a cosmology N-body code)required 10 person-years of development
but code is reused for many years by many people
Data analysis codeless reuse, ad hoc queries
slide src: Jeff Gardner
6
globalList[int]histogram[N]block;forallpinParticles{atomic{ifnothistogram[h(p.z)]{histogram[h(p.z)]=newList[int];
}histogram[h(p.z)].find(p.z)+=1;
}}
valhistogram=selectcount(*),zfromParticlesgroupbyz
Our solution
7
memory
queries
HPC stack
8
memory
queries
PGAS
Our solution
9
Integrate queries into PGAS languages
10
Integrate queries into PGAS languages
Partitioned global address space (PGAS) languages
Global data
DRAM DRAM DRAM DRAM
Core Core Core Core
Global Tasks
Example PGAS program
globalParticleP[N]block;globalinthistogram[N]block;forallpjinP{
atomichistogram[pj.z]+=1;
}
12
[pj.z]
histogram
pjParticle
P0 P1 P2
13
Integrate queries into PGAS languages
14
Integrate queries into PGAS languages
Radish: queries in PGAS
Relational Algebra++
SQL
Grappa Chapel X10
PGAS physical algebra
mapping?
15
PGAS languages
Approaches for query processing in PGAS
demand-driven data flow (iterators)
next!next!next!
16
HashTableJoinProbe[o_custkey=c_custkey]
HashTableAggregateScan
Project[cnt, c_name]
Scan(Order)Scan(Customer)
Select[o_totalprice > 20]
HashTableJoinBuild[c_custkey]
HashTableAggregateUpdate
[o_custkey]
Shuffle[o_custkey]Shuffle[c_custkey]
Project[c_custkey, c_name]
Scan(Customer)
HashTableJoinBuild[c_custkey]
Shuffle[c_custkey]
Project[c_custkey, c_name]
query compilation
codegen
optimize/ compile
binary
forallcinCustomerT1t1t1.id=t0.custkeyt1.c_name=t0.c_nameonpartition(join_table[h(t1.c_custkey)])atomic{join_table[h(t1.c_custkey)].append(t1)}
17
forallcinCustomerT1t1t1.id=t0.custkeyt1.c_name=t0.c_name
Generation of tuple-centric parallel code
Customer
globalTupleCustomerblock;
P0 P1 P2
Scan(Customer)
HashTableJoinBuild[c_custkey]
Shuffle[c_custkey]
Project[c_custkey, c_name]
18
Generation of tuple-centric parallel code
Customer
P0 P1 P2
Scan(Customer)
HashTableJoinBuild[c_custkey]
Shuffle[c_custkey]
Project[c_custkey, c_name]
hashpartitions
…localjoin
…localjoin
…localjoin
19
onpartition(jointable[h(t1.c_custkey)])atomic{jointable[h(t1.c_custkey)].append(t1)}
Generation of tuple-centric parallel code
jointable
Customer
globalTupleCustomerblock;globalCelljointableblock;
P0 P1 P2
Scan(Customer)
HashTableJoinBuild[c_custkey]
Shuffle[c_custkey]
Project[c_custkey, c_name]
forallcinCustomerT1t1t1.id=t0.custkeyt1.c_name=t0.c_name
Generation algorithm
20
classOperator{//getthenexttuplenext(T&):bool}
Runtime interpretation Code generation
classOperator{//emitcodeforproducing//atupleproduce(state&):unit
//emitcodeforconsuming//atupleconsume(state&,inputTuple):string
}
Generation algorithm
21
classOperator{//emitcodeforproducing//atupleproduce(state&):unit
//emitcodeforconsuming//atupleconsume(state&,inputTuple):string
}
Code generation
HashTableJoinProbe[o_custkey=c_custkey]
HashTableAggregateScan
Project[cnt, c_name]
Scan(Order)Scan(Customer)
Select[o_totalprice > 20]
HashTableJoinBuild[c_custkey]
HashTableAggregateUpdate
[o_custkey]
Shuffle[o_custkey]Shuffle[c_custkey]
Project[c_custkey, c_name]
full dependence
pipeline
tuple dependence
P
C
22
Integrate queries into PGAS languages
23
Integrate queries into PGAS languages
24
Experiment #1: conventional approach vs. Radish
0
5
10
15
20
25
30
1 2 3 4 6 8 9 10 11 14 15 16 17 19 20
Speedupoveriterators
Query
Radish Radish-sym
(TPC-H)
0
5
10
15
20
25
30
1 2 3 4 6 8 9 10 11 14 15 16 17 19 20
Speedupoveriterators
Query
Radish Radish-sym25
Experiment #1: conventional approach vs. Radish
(TPC-H)
26
Experiment #2: vs ImpalaExperiment #2: vs Impala
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20
Runtime(s)
Query
Radish Impala (codegen) Impala (nocodegen)
(TPC-H)
27
Experiment #2: vs Impala
0
20
40
60
80
100
120
140
160
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 20
Runtime(s)
Query
Radish Impala (codegen) Impala (nocodegen)
TPC-H100GB database
(TPC-H)
Compilation time is high
28
0
5
10
15
20
25
30
35
6 1 19 12 17 14 15 4 3 11 16 10 18 20 5 7 9 2 8
Time
Query
compile queryplan
Conclusion
get the code!
RACO+RADISH: github.com/uwescience/raco GRAPPA: grappa.io
29
30
Attribution