90
Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan for CS632

Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Embed Size (px)

Citation preview

Page 1: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Fast Computation of Database Operations using Graphics Processors

Fast Computation of Database Operations using Graphics Processors

Naga K. Govindaraju

Univ. of North Carolina

Modified By,

Mahendra Chavan for CS632

Page 2: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

GoalGoal

• Utilize graphics processors for fast computation of common database operations

Page 3: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Motivation: Fast operationsMotivation: Fast operations

• Increasing database sizes

• Faster processor speeds but low improvement in query execution time– Memory stalls– Branch mispredictions– Resource stalls Eg. Instruction dependency

• Utilize the available architectural features and exploit parallel execution possibilities

Page 4: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Graphics ProcessorsGraphics Processors

• Present in almost every PC

• Have multiple vertex and pixel processing engines running parallel

• Can process tens of millions of geometric primitives per second

• Peak Perf. Of GPU is increasing at the rate of 2.5-3 times a year!

• Programmable- fragment programs – executed on pixel processing engines

Page 5: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Main ContributionsMain Contributions

• Algorithms for predicates, boolean combinations and aggregations

• Utilize SIMD capabilities of pixel processing engines

• They have used these algorithms for selection queries on one or more attributes and aggregate queries

Page 6: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Related WorkRelated Work

• Hardware Acceleration for DB operations– Vector processors for relational DB operations

[Meki and Kambayashi 2000]– SIMD instructions for relational DB operations

[ Zhou and Ross 2002]– GPUs for spatial selections and joins [Sun et al. 2003]

Page 7: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Graphics Processors: Design IssuesGraphics Processors: Design Issues

• Programming model is limited due to lack of random access writes– Design algorithms avoiding data rearrangements

• Programmable pipeline has poor branching– Design algorithms without branching in programmable

pipeline - evaluate branches using fixed function tests

Page 8: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Frame BufferFrame Buffer

• Pixels stored on graphics card in a frame buffer.

• Frame buffer conceptually divided into:

• Color Buffer– Stores color component of each pixel in the frame buffer

• Depth Buffer– Stores depth value associated with each pixel. The depth is

used to determine surface visibility

• Stencil Buffer– Stores stencil value for each pixel . Called Stencil because, it

is typically used for enabling/disabling writes to frame buffer

Page 9: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Graphics PipelineGraphics Pipeline

Vertices

Vertex Processing

Engine

Vertex Processing

Engine

Pixel processing

EngineSetupEngine Alpha TestAlpha Test

Stencil TestStencil Test

Depth TestDepth Test

Page 10: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Graphics PipelineGraphics Pipeline

• Vertex Processing Engine – Transforms vertices to points on screen

• Setup Engine– Generates Info. For color, depth etc. associated with primitive

vertices

• Pixel processing Engines– Fragment processors, performs a series of tests before

writing the fragments to frame buffer

Page 11: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Pixel processing EnginesPixel processing Engines

• Alpha Test– Compares fragments alpha value to user-specified reference

value

• Stencil Test– Compares fragments’ pixel’s stencil value to user-specified

reference value

• Depth Test– Compares depth value of the fragment to the reference depth

value.

Page 12: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OperatorsOperators

• =

• <

• >

• <=

• >=

• Never

• Always

Page 13: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Occlusion Query Occlusion Query

• Users can supply custom fragment programs on each fragment

Fragment ProgramsFragment Programs

•Gives no. of fragments that pass different no. of tests

Page 14: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Radeon R770 GPU by AMD Graphics Product Group

Page 15: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Data Representation on GPUsData Representation on GPUs

• Textures – 2 D arrays- may have multiple channels

• We store data in textures in floating point formats

• To perform computations on the values, render the quadrilateral, generate fragments, run fragment programs and perform tests!

Page 16: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Stencil TestsStencil Tests

• Fragments failing Stencil test are rejected from the rasterization pipeline

• Stencil Operations– KEEP: keep the stencil value in the stencil buffer– INCR: stencil value ++– DECR: stencil value –– ZERO: stencil value = 0 – REPLACE: stencil value = reference value– INVERT: bitwise invert (stencil value)

Page 17: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Stencil and Depth TestsStencil and Depth Tests

• We can setup the stencilOP routine as below

• For each fragment , three possible outcomes, based on the outcome, corresponding stencil op. is executed

• Op1: when a fragment fails stencil test

• Op2: when a fragment passes stencil test but fails depth test

• Op3: when a fragment passes stencil and depth test

Page 18: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OutlineOutline

• Database Operations on GPUs

• Implementation & Results

• Analysis

• Conclusions

Page 19: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OutlineOutline

• Database Operations on GPUs

• Implementation & Results

• Analysis

• Conclusions

Page 20: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OverviewOverview

• Database operations require comparisons

• Utilize depth test functionality of GPUs for performing comparisons– Implements all possible comparisons <, <=, >=, >, ==, !=,

ALWAYS, NEVER

• Utilize stencil test for data validation and storing results of comparison operations

Page 21: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Basic OperationsBasic Operations

Basic SQL query

Select A

From T

Where C

A= attributes or aggregations (SUM, COUNT, MAX etc)

T=relational table

C= Boolean Combination of Predicates (using operators AND, OR, NOT)

Page 22: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Outline: Database OperationsOutline: Database Operations

• Predicate Evaluation– (a op constant) – depth test and stencil test– (a op b) = (a-b op 0 ) – can be executed on GPUs

• Boolean Combinations of Predicates– Express as CNF and repetitively use stencil tests

• Aggregations– Occlusion queries

Page 23: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Outline: Database OperationsOutline: Database Operations

• Predicate Evaluation

• Boolean Combinations of Predicates

• Aggregations

Page 24: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Basic OperationsBasic Operations

• Predicates – ai op constant or ai op aj

– Op is one of <,>,<=,>=,!=, =, TRUE, FALSE

• Boolean combinations – Conjunctive Normal Form (CNF) expression evaluation

• Aggregations – COUNT, SUM, MAX, MEDIAN, AVG

Page 25: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Predicate EvaluationPredicate Evaluation

• ai op constant (d)

– Copy the attribute values ai into depth buffer

– Define the comparison operation using depth test– Draw a screen filling quad at depth d

Page 26: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Screen

PIf ( ai op d )

pass fragment

Else

reject fragment

ai op d

d

Page 27: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Predicate EvaluationPredicate Evaluation

• ai op aj

– Treat as (ai – aj) op 0

• Semi-linear queries– Defined as linear combination of attribute values compared

against a constant– Linear combination is computed as a dot product of two

vectors– Utilize the vector processing capabilities of GPUs

Page 28: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Data ValidationData Validation

• Performed using stencil test

• Valid stencil values are set to a given value “s”

• Data values that fail predicate evaluation are set to “zero”

Page 29: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Outline: Database OperationsOutline: Database Operations

• Predicate Evaluation

• Boolean Combinations of Predicates

• Aggregations

Page 30: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Boolean CombinationsBoolean Combinations

• Expression provided as a CNF

• CNF is of form (A1 AND A2 AND … AND Ak)

where Ai = (Bi1 OR Bi

2 OR … OR Bimi )

• CNF does not have NOT operator– If CNF has a NOT operator, invert comparison operation to

eliminate NOT

Eg. NOT (ai < d) => (ai >= d)

Page 31: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Boolean CombinationBoolean Combination

• We will focus on (A1 AND A2)

• All cases are considered

– A1 = (TRUE AND A1)

– If Ei = (A1 AND A2 AND … AND Ai-1 AND Ai),

Ei = (Ei-1 AND Ai)

Page 32: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

• Clear stencil value to 1

• For each Ai , i=1,….,k

• do

– if (mod(I,2)) /* Valid stencil value is 1 */• Stencil test to pass if stencil value is equal to 1• StencilOp (KEEP,KEPP, INCR)

– Else• Stencil test to pass if stencil value is equal to 2• StencilOp (KEEP,KEPP, DECR)

– Endif

– For each Bij, j=1,…..,mi– Do

• Perform Bij using COMPARE /* depth test */– End for

– If (mod(I,2)) /* valid stencil value is 2 */• If stencil value on screen is 1 , REPLACE with 0

– Else /* valid stencil value is 1 */• If stencil value on screen is 2, REPLACE with 0

– Endif

• End For

Page 33: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

A1

B21

B22

B23

Page 34: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

Stencil value = 1

Page 35: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

A1

Stencil value = 1

Page 36: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

A1

Stencil value = 0

Stencil value = 2

Page 37: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

A1

St = 0

B21

St=1

B22

St=1

B23

St=1

St=0St=2

Page 38: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

A1

Stencil = 0

St = 0

B21

B22

B23

St=1

St=1

St=1

Page 39: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

A1 AND A2A1 AND A2

St = 0

St=1A1 AND B2

1

St = 1A1 AND B2

2 St=1

A1 AND B23

Page 40: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Range QueryRange Query

• Compute ai within [low, high]

– Evaluated as ( ai >= low ) AND ( ai <= high )

Page 41: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Outline: Database OperationsOutline: Database Operations

• Predicate Evaluation

• Boolean Combinations of Predicates

• Aggregations

Page 42: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

AggregationsAggregations

• COUNT, MAX, MIN, SUM, AVG

• No data rearrangements

Page 43: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

COUNTCOUNT

• Use occlusion queries to get pixel pass count

• Syntax:– Begin occlusion query– Perform database operation– End occlusion query– Get count of number of attributes that passed database

operation

• Involves no additional overhead!

Page 44: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

MAX, MIN, MEDIANMAX, MIN, MEDIAN

• We compute Kth-largest number

• Traditional algorithms require data rearrangements

• We perform no data rearrangements, no frame buffer readbacks

Page 45: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

K-th Largest NumberK-th Largest Number

• Say vk is the k-th largest number

• How do we generate a number m equal to vk?

– Without knowing vk’s bit-representation and using comparisons

Page 46: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Our algorithmOur algorithm

• b_max = max. no. of bits in the values in tex

• x=0

• For i= b_max-1 down to 0– Count = Compare (text >= x + 2^i)– If Count > k-1

• x=x+2^i

• Return x

Page 47: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

K-th Largest NumberK-th Largest Number

• Lemma: Let vk be the k-th largest number. Let count

be the number of values >= m

– If count > (k-1): m<= vk

– If count <= (k-1): m>vk

• Apply the earlier algorithm ensuring that count >(k-1)

Page 48: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 00000000

Page 49: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 10000000

• M <= Vk

Page 50: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11000000

• M <= Vk

Page 51: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11100000

• M <= Vk

Page 52: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11110000

• M > Vk

Make the bit 0

M = 11100000

Page 53: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11101000

• M <= Vk

Page 54: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11101100

• M > Vk

• Make this bit 0

• M = 11101000

Page 55: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11101010

• M > Vk

• M = 11101000

Page 56: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Vk = 11101001

• M = 11101001

• M <= Vk

Page 57: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ExampleExample

• Integers ranging from 0 to 255

• Represent them in depth buffer– Idea – Use depth functions to perform comparisons– Use NV_occlusion_query to determine maximum

Page 58: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Example: Parallel MaxExample: Parallel Max• S={10,24,37,99,192,200,200,232}

• Step 1: Draw Quad at 128– S = {10,24,37,99,192,200,200,232}

• Step 2: Draw Quad at 192– S = {10,24,37,192,200,200,232}

• Step 3: Draw Quad at 224– S = {10,24,37,192,200,200,232}

• Step 4: Draw Quad at 240 – No values pass

• Step 5: Draw Quad at 232– S = {10,24,37,192,200,200,232}

• Step 6,7,8: Draw Quads at 236,234,233 – No values pass

• Max is 232

Page 59: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

SUM and AVGSUM and AVG

• Mipmaps – multi resolution textures consisting of multiple levels

• Highest level contains average of all values at lowest level

• SUM = AVG * COUNT

• Problems with mipmaps

– If we want sum of a subset of values then we have to introduce conditions in the fragment programs

– Floating point representations may have problems

Page 60: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

AccumulatorAccumulator

• Data representation is of form

• ak 2k + ak-1 2k-1 + … + a0

Sum = sum(ak) 2k+ sum(ak-1) 2k-1+…+sum(a0)

Current GPUs support no bit-masking operations

AVG = SUM/COUNT

Page 61: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

TestBitTestBit

• Read the data value from texture, say ai

• F= frac(ai/2k+1)

• If F>=0.5, then k-th bit of ai is 1

• Set F to alpha value. Alpha test passes a fragment if alpha value>=0.5

Page 62: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OutlineOutline

• Database Operations on GPUs

• Implementation & Results

• Analysis

• Conclusions

Page 63: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ImplementationImplementation

• Dell Precision Workstation with Dual 2.8GHz Xeon Processor

• NVIDIA GeForce FX 5900 Ultra GPU

• 2GB RAM

Page 64: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ImplementationImplementation

• CPU – Intel compiler 7.1 with hyperthreading, multi-threading, SIMD optimizations

• GPU – NVIDIA Cg Compiler

Page 65: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

BenchmarksBenchmarks

• TCP/IP database with 1 million records and four attributes

• Census database with 360K records

Page 66: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Copy TimeCopy Time

Page 67: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Predicate Evaluation (3 times faster)Predicate Evaluation (3 times faster)

Page 68: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Range Query(5.5 times faster)Range Query(5.5 times faster)

Page 69: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Multi-Attribute Query (2 times)Multi-Attribute Query (2 times)

Page 70: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Semi-linear Query (9 times faster)Semi-linear Query (9 times faster)

Page 71: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

COUNTCOUNT

• Same timings for GPU implementation

Page 72: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Kth-Largest for median(2.5 times)Kth-Largest for median(2.5 times)

Page 73: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Kth-LargestKth-Largest

Page 74: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Kth-Largest conditionalKth-Largest conditional

Page 75: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Accumulator(20 times slower!)Accumulator(20 times slower!)

Page 76: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OutlineOutline

• Database Operations on GPUs

• Implementation & Results

• Analysis

• Conclusions

Page 77: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Analysis: IssuesAnalysis: Issues

• Precision – Currently depth buffer has only 24 bit precision , inadequate

• Copy time– Copy from texture to depth buffer – no mechanism in GPU

• Integer arithmetic– Not enough arithmetic inst. In pixel processing engines

• Depth compare masking– Useful to have comparison mask for depth function

Page 78: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Analysis: IssuesAnalysis: Issues

• Memory management– Current GPUS have 512 MB video memory, we may use the

out-of–core techniques and swap

• No random writes– No data re-arrangements possible

Page 79: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Analysis: PerformanceAnalysis: Performance

• Relative Performance Gain– High Performance – Predicate evaluation, multi-attribute

queries, semi-linear queries, count– Medium Performance – Kth-largest number– Low Performance - Accumulator

Page 80: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

High PerformanceHigh Performance

• Parallel pixel processing engines

• Pipelining– Multi-attribute queries get advantage

• Early Depth culling– Before passing through the pixel processing engine

• Eliminate branch mispredictions

Page 81: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Medium PerformanceMedium Performance

• Parallelism

• FX 5900 has clock speed 450MHz, 8 pixel processing engines

• Rendering single 1000x1000 quad takes 0.278ms

• Rendering 19 such quads take 5.28ms. Observed time is 6.6ms

• 80% efficiency in parallelism!!

Page 82: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Low PerformanceLow Performance

• No gain over SIMD based CPU implementation

• Two main reasons:– Lack of integer-arithmetic– Clock rate

Page 83: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

OutlineOutline

• Database Operations on GPUs

• Implementation & Results

• Analysis

• Conclusions

Page 84: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ConclusionsConclusions

• Novel algorithms to perform database operations on GPUs

– Evaluation of predicates, boolean combinations of predicates, aggregations

• Algorithms take into account GPU limitations– No data rearrangements– No frame buffer readbacks

Page 85: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

ConclusionsConclusions

• Preliminary comparisons with optimized CPU implementations is promising

• Discussed possible improvements on GPUs

• GPU as a useful co-processor

Page 86: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Relational JoinsRelational Joins

• Modern GPUs have thread groups

• Each thread group have several threads

• Data Parallel primitives– Map– Scatter – scatters the Data of a relation with respect to an

array L– Gather – reverse of scatter– Split – Divides the relation into a number of disjoint partitions

with a given partitioning function

Page 87: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

NINLJNINLJ

R

S

Thread Group 1

Thread Group j

Thread Group i

Thread Group Bp

Page 88: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

INLJINLJ

• Used Cache Optimized Search Trees (CSS trees) for index structure

• Inner relation as the CSS tree

• Multiple keys are searched in parallel on the tree

Page 89: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Sort Merge joinSort Merge join

• Merge step is done in parallel

• 3 steps– Divide relation S into Q chunks Q= ||S|| / M– Find the corresponding matching chunks from R by using the

start and end of each chunk of S– Merge each pair of S and R chunk in parallel. 1 thread group

per pair.

Page 90: Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju Univ. of North Carolina Modified By, Mahendra Chavan forCS632

Hash joinHash join

• Partitioning– Use the Split primitive to partition both the relations

• Matching– Read the inner relation in memory relation– Each tuple from the outer relation uses sequential/binary

search on the inner relation– For binary search, the inner relation will be sorted using

bitonic sort.