Upload
talon-delaney
View
33
Download
3
Embed Size (px)
DESCRIPTION
SQL vs GPU. Richard Wilton Johns Hopkins University Department of Physics and Astronomy. Rationale. Terabyte-scale data is best managed in a DBMS Some classes of computation are best implemented using a GPU. How to do GPU computation in a terabyte-scale database. - PowerPoint PPT Presentation
Citation preview
1
SQL vs GPU
Richard Wilton
Johns Hopkins University
Department of Physics and Astronomy
2
Rationale
• Terabyte-scale data is best managed in a DBMS• Some classes of computation are best implemented
using a GPU
3
How to do GPU computation in a terabyte-scale database
• Do the computation “outside” the database (export the data / compute / re-import the data)
• Do the computation “inside” the database• Something in between
IPCSQL code
SQLCLR procedure or function
Special-case functionality
SQL Server Out-of-process server
SQLCLR out-of-process server
Why?• Avoid memory, threading, and permissions restrictions on SQLCLR implementations• Load dynamic-link libraries• Invoke native-code methods• Exploit lower-level APIs (e.g. SqlClient, bulk insert) to facilitate data movement between SQL Server and non-SQL computational resources
The basic concept• Implement computational functionality in a separate process from the SQL Server process• Access that functionality using interprocess communication (IPC)
5
IPCSQL code
SQLCLR procedure CUDA functionality
SQL Server Out-of-process server
declare @sql nvarchar(max) set @sql = N'exec SqA1.dbo.SWGPerf @qOffset=311, @chrNum=7, @dimTile=128' exec dbo.ExecSWG @sqlCmd=@sql, @targetTable='##tmpX08'
SQL
SQLCLR implementation (in C#)• Allocates IPC buffer• Initializes IPC buffer with the result set from a specified SQL query • Signals out-of-process server• Waits for response from out-of-process server• Returns data from the IPC buffer as a SQL result set
Out-of-process server implementation (in C#)
CUDA implementation (in C++)
• Loads the CUDA implementation (dynamic link library)• Invokes a method in the DLL with a reference to the IPC buffer• Waits for completion• Signals the SQLCLR implementation• Stand-alone application or Windows service
• Implements GPU functionality• Uses the IPC buffer for both incoming and outgoing data
SQLCLR out-of-process server implementation
6
A test case: genomic sequence data
• Nucleic acid sequence data• Data is processed in a “loose” workflow by a
sequence of software tools• Example:
– Align short sequences to a long reference sequence– Identify alignment regions of interest– Map regions of interest to annotation data (genome
map, known variations, etc.)
7
A test case: genomic sequence data
• Typical data-management scenario:– All data is maintained in file-system files– Processing software reads and writes files– “Database-like” operations (e.g. joins) are performed
using command-line tools (e.g. Perl scripts)
• Can we use a DBMS as a repository?– All data is maintained in the DBMS– Processing software reads and writes tables– “Database-like” operations (e.g. joins) are performed
using SQL
8
A test case: genomic sequence data
• Nucleic acid sequence data• DBMS: data repository• GPU: implementation of sequence alignment algorithms
– Gapped alignment: Smith Waterman– Non-gapped alignment: bit vector mapping through a hash table
9
A test case: genomic sequence data
• Results: so far, so good …– Sequence data can be managed using a DBMS
• Use low-level (C#, C++) implementations to manipulate string and binary data
– Sequence data from the DBMS can be processed using GPU implementations
• GPU sequence alignment implementations can exploit task parallelism and execute 10x (or more) faster than equivalent CPU implementations
• But …– DBMS data abstractions (tables, columns) must be mapped to
and from native GPU data formats and layouts– Data movement out of and into the DBMS is slow
10
Going forward…
• Compromise: maintain GPU-formatted data as BLOBs in the file system– BLOBs are in database “native” format
– BLOBs correspond to database tables or table partitions
• Much programming and testing remain to be done…