Upload
winfred-cooper
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
April 19, 2023 IDEAS 20111
Extend Core UDF Framework for GPU-Enabled Analytical Query
Evaluation
Qiming Chen, Ren Wu, Meichun Hsu, Bin Zhang
HP Labs
Palo Alto, CA, USA
April 19, 2023
2 April 19, 2023 IDEAS 2011
Problems
• Motivated by pushing-down analytics to DB layer for fast data access and reduced data move −which requires integrating analytic computation
into the query pipeline using UDFs
• Existing UDF cannot act as a block operator with chunk-wise input, therefore −unable to deal with the application semantics
definable on a set of incoming tuples (e.g. representing an object)
−unable to leverage external computation engines (e.g. GPU) for efficient batch processing.
April 19, 2023
3 April 19, 2023 IDEAS 2011
Why need Block UDFs• From semantic point of view, many applications are
definable on a set of tuples−Minimal Spanning Ttree (MST) computation is defined on a tuple-
set representing a graph and returns a tuple-set representing the MST
April 19, 2023
• From performance point of view, processing data by external engine should be in-batch rather than copying data back and forth on the per-tuple basis
A graph tuple-set
MST relation
A MST tuple-set
Graph relation
GPU
Computation node
SAS server
UDF
4 April 19, 2023 IDEAS 2011
Solution: Set-In Set-Out (SISO) UDF• Introduce a new kind of UDFs called Set-In Set-Out
(SISO) as a block operator for processing the input tuples chunk by chunk from query processing pipeline−pool a chunk of input tuples,
−dispatches them to GPUs or an analytic engine for batch computation
−materializes the computation results and then streams out tuple by tuple to the query processing pipeline
April 19, 2023
SISO
UDF
Pipelined input
pooling Materializeresult
Pipelined output
GPU
5 April 19, 2023 IDEAS 2011
SISO Example: select vectorize(x,y,10) from point_table
April 19, 2023
comp
On t1, …t9, do “ETL” but return NULL
act like a scalar function
On t10, act like a table function
FIRST CALL: run computation on the 10 tuples
Normal CALLs: return 1 result tuple per call – tupe by tuple pipelined again
Build phase
Compute phase
Streamout phase
6 April 19, 2023 IDEAS 2011
Comparison with Scalar, Table UDF• Scalar UDF
−1 tuple in, 1 value/tuple out (tuple as composite value)
−Access to per-function state and per-tuple state
• Table UDF−1 tuple in, N tuples out
−Access to per-tuple (input) state and per-return state
• SISO−N tuple in, M value/tuple out
−Access to 4 level states: per-function, per-chunk, per-tuple (input), per-return• runs chunk by chunk; each chunk contains N tuples; return
nothing for (1,N-1)th tuple, return a result-set for Nth tuple
April 19, 2023
7 April 19, 2023 IDEAS 2011
Comparison with UDA• Agg operator or UDA
−No general form of set output (except group-by)
−No chunk-wise semantics
• SISO−Flexible forms of set output
−Chunk-wise semantics
April 19, 2023
Comparison with RVF• RVF
−Input relation initially as static data
−Input relation is loaded entirely rather than by chunks
• SISO−Input tuple-set chunk by chunk along query processing
−Input tuple-set as dynamic data
8 April 19, 2023 IDEAS 2011
Extending Query Engine to Support SISO UDF
• Support SISO as block-operator along the tuple-by-tuple query processing pipeline−With hybrid behavior in processing a chunk of N
tuples• for input tuples 1,…,N-1, like a scalar function, 1 call per
input tuple, returning nothing
• For tuple N, like a table function, multi-calls corresponding to that input tuple, returning a set
• Need to extend UDF Accessible States• Need to extend Invocation Pattern
April 19, 2023
9 April 19, 2023 IDEAS 2011
UDF Memory Context• A UDF is called multiple times in query
processing−In the FIRST_CALL a buffer can be initiated
−Then each NORMAL_CALL references and updates the buffer – buffer state across multi-calls
−After the FINAL_CALL, the buffer is discarded
• Multi-call context different for scalar and table UDF−For scalar UDF, 1 call per input
−For table UDF, N calls per input
• Therefore their memory contexts are different
April 19, 2023
10 April 19, 2023 IDEAS 2011
Extend UDF Accessible States
April 19, 2023
Per-function state
Per-chunk state
Per-tuple state
Per-return state
Per-function state
Per-tuple state Per-tuple state
Per-return state
SISO UDF Scalar UDF Table UDF
11 April 19, 2023 IDEAS 2011
Extend Call Skeleton
April 19, 2023
SISO UDF Scalar UDF Table UDF
Global First Call
Per-chunk First Call
Per-tuple single Call (no return)Per-tuple single Call (no return)Per-tuple single Call (no return)
:Last-tuple First Call
Normal Call (1 return)
Normal Call (1 return):
Last-tuple Last Call
Per-chunk Last Call
:
Per-chunk First Call
Per-chunk Final Call
Per-chunk First Call
Per-chunk Final Call:
Per-tuple Normal Call (1 return)
Per-tuple Normal Call (1 return)
Normal Call (1 return)
Normal Call (1 return):
Per-tuple First Call
Per-tuple Final Call
Global First Call
:
Final call optional (system specific)
:
Normal Call (1 return)
Normal Call (1 return):
Per-tuple First Call
Per-tuple Final Call
12 April 19, 2023 IDEAS 2011
SISO Call Skeleton Explained
April 19, 2023
SISO UDF
Global First Call
Per-chunk First Call
Per-tuple single Call (no return)Per-tuple single Call (no return)Per-tuple single Call (no return)
:Per-tuple First Call
Normal Call (1 return)
Normal Call (1 return):
Per-tuple Final Call
Per-chunk Final Call
:
Per-chunk First Call
Per-chunk Final Call
Per-chunk First Call
Per-chunk Final Call:
Pool last tuple in the chunk, make batch analytic computation
Set up function call global context for chunk-wise invocation (extend from fun-call node
Set up chunk-based buffer for pooling data
Pool tuples (vectorizing), return null
Rewind chunk oriented tuple index; Cleanup buffer
Return materialized results one tuple at a time
Advance chunk oriented tuple index, return null
13 April 19, 2023 IDEAS 2011
Integrate Query Processing with GPU Computation using SISO UDF
• Using General Purpose GPU (GPGPU) to accelerate analytic query processing allows us to leverage SQL’s analysis power and GPU’s computation power
• However, their operational patterns are different
−GPU computation is a kind of batch–processing with data-parallelism
−Query processing is tuple-by-tuple pipelined• We solve this problem by using SISO UDFs in
queries
−To handle batch GPU computation in query dataflow pipeline
April 19, 2023
14 April 19, 2023 IDEAS 2011
Experiment on Accelerating K-Means Clustering of Very Large Data Sets • K-Means clustering is an iterative process, in
each iteration −each point is assigned to the nearest cluster
center as the member of that cluster
−then for each center, its coordinates is re-calculated as the “mean” of the coordinates of its member points
• The process is repeated until convergence is achieved.
April 19, 2023
Init. Centers
Assign Center
Calc Centers
DoneConvergence Check
15 April 19, 2023 IDEAS 2011
Single Iteration of K-Means by SQL and SISO UDF
SELECT (p).cid, AVG((p).x) AS cx, AVG((p).y) AS cy FROM (
SELECT assign_center_siso(x, y, “SELECT * FROM Centers”, N)
AS p FROM Points ) r
GROUP BY (p).cid;
April 19, 2023
xp,yp
assign_center()
AVG GROUPBY
Points
Centers
cid,xc,yc
cid,xp,yp cid,xc,yc
SISO UDF
chunk-wise
initially
16 April 19, 2023 IDEAS 2011
Experiment Results Comparison• We compare performance of
−scalar UDF-wrapped, CPU-based implementation
−SISO UDF-wrapped, CPU-based implementation
−SISO-wrapped, GPU-accelerated implementation
April 19, 2023
Overall end-to-end query performance – Scalar UDF/CPU vs. SISO/CPU vs. SISO/GPUs
10M Points (second)
100M Points (second)
Q1: generalized scalar UDF (tuple by tuple)
155.45 1845.66
Q2: SISO UDF in 1M chunk computed by CPU
145.02 1541.41
Q3: SISO UDF in 1M chunk computed by GPGPU
27.41 345.01
17 April 19, 2023 IDEAS 2011
Scalar UDF vs. SISO UDF• the number of clusters set to 1000 • the number of data points from 1M to 100M• the chunk size fixed to 1M
−Beyond 1M (1000K), the performance gain gradually diminishes with further increase in chunk size
April 19, 2023
18 April 19, 2023 IDEAS 2011
Conclusions• In-DB analytics has been extensively
investigated, but not yet become a scalable approach
• An important reason lies in the lack of block UDFs to deal with the application semantics definable on a set of tuples, and to leverage external computation units such as GPUs for efficient batch processing
• To solve this problem, we developed SISO as a new kind of UDFs
• Integrating SISO with parallel DB is under further investigation
April 19, 2023