23
GPU-Accelerated Analytics on your Data Lake.

GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

GPU-Accelerated Analytics on your Data Lake.

Page 2: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Data Lake

@blazingdb

Page 3: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Data Swamp

@blazingdb

Page 4: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

ETL Hell

@blazingdb

DATA LAKE0001010100001001011010110

>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>

>>>>>>

>>>>>>>>>>>>>>>>>

>>>

>>>>>>>>>>>

>>>>>>>>>>>

>>>>>>>>>>>>>>

>>>

0101010100100101010101100001

0101101010010001011010100001

01010110100001

0101010100100101010101100001

0101101010010001011010100001

01010110100001

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>

>>>>>>>>>>>>>>>>>>>>>>> >>>>

>>>>>

>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>

>>>>>

>>>>>

>>>>

>>>>>>>>>>>>>>

>>>

>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>

>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>

>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>

Page 5: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

COMMON

@blazingdb

DATALAYER

Page 6: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Simplify Data Storage

@blazingdb

SCHEMA

METADATA

DATA

Page 7: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

SQL Warehouse on Data Lake

@blazingdb

Page 8: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

BlazingDB – How it works

@blazingdb

• Compression/Decompression

• Filtering (Predicate Pushdown)

• Aggregations

• Transformations

• Joins

• Sorting/OrderingDATA LAKE0001010100001001011010110

• RAM Cache (Hot)

• Disk Cache (Medium)

• HDD

• SSDLocal DiskHDFS

AWS S3

Page 9: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

BlazingDB Multi-nodal Cluster

@blazingdb

Page 10: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Shared Data Architecture

@blazingdb

DATA LAKE0001010100001001011010110

Page 11: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

The Nays

@blazingdb

No Vendor

Lock-in

No Consistency

Management

No BlazingDB

Specific ETL

No DuplicationNo Ingest

Page 12: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

The Yays

@blazingdb

High

Concurrency

Data Sharing

(Across Clusters

And Other Tools)

Multi-Terabyte

Queries

Scalable,

On Demand

Data Warehouse

Incredibly

Fast SQL

Page 13: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

@blazingdb

DEMO

Page 14: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

@blazingdb

Demo - ArchitectureHDFS on Azure Azure GPU Servers

NC24 V1• 4 Servers

Page 15: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Queries: BlazingDB 4 Node Query times (Lower is better)

@blazingdb

Cold

Medium

(Disk cache only)

Hot

Query 1 Query 2 Query 3 Query 4 Query 5

QUERIES

SE

CO

ND

S

142.1

281.1

380.5

135.5

46

73.6

154.1

251.8

73.8

46.3

72

63.1

14 12.214.9

Page 16: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Query 1

@blazingdb

Query 1

SE

CO

ND

S

Cold Medium(Disk cache only)

Hot

select l_returnflag, l_linestatus,

sum(l_quantity) as sum_qty,

sum(l_extendeprice) as sum_disc_price,

sum(l_extendeprice*(1-l_discount)) as

sum_base_price,

sum(l_extendeprice*(1-l_discount)*(1+l_tax)) as

sum_charge,

avg(l_quatity) as avg_qty,

avg(l_extendedprice) as avg_price,

avg(l_discount) as avg_disc,

count(l_quantity) as count_order

from lineitem

where l_shipdate <= ‘1995-06-01’

group by l_returnflag, l_linestatus

order by l_returnflag, l_linestatus;

1234

5

6789

10111213

Query1

Data Points• 6 billion row table

• Many aggregations/transformations

Page 17: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Query 2

@blazingdb

Query 2

SE

CO

ND

S

Cold Medium(Disk cache only)

Hot

select lineitem.l_orderkey,

sum(lineitem.l_extendedprice*(1-

lineitem.l_discount)) as revenue,

orders.o_orderdate, orders.o_shippriority

from customer

inner join orders on customer.c_custkey =

orders.o_custkey inner join lineitem on

lineitem.l_orderkey = orders.o_orderkey

where

customer.c_mktsegment = 'BUILDING'

and orders.o_orderdate < '1995-03-15'

and lineitem.l_shipdate > '1995-03-15'

group by lineitem.l_orderkey,

orders.o_orderdate, orders.o_shippriority

order by revenue desc,orders.o_orderdate;

1234

5

6789

10111213

Query2

Data Points• Join 6B rows to 1.5B rows to 150M rows

• Many aggregations/transformations

• Order (sorting)

Page 18: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Query 3

@blazingdb

Query 3

SE

CO

ND

S

Cold Medium(Disk cache only)

Hot

select nation.name, sum(lineitem.l_extendedprice *

(1 - lineitem.l_discount)) as revenue

from customer

inner join orders on customer.cust_key =

orders.o_custkey inner join lineitem on

lineitem.l_orderkey = orders.o_orderkey

inner join supplier on lineitem.l_suppkey =

supplier.s_suppkey inner join nation on

supplier.s_nationkey = nation.nation_key

inner join region on nation.region_key =

region.r_regionkey

where supplier.s_nationkey = nation.nation_key

and region.r_name = 'ASIA'

and orders.o_orderdate >= '19940101'

and orders.o_orderdate < '19950101'

group by nation.name order by revenue desc

1234

5

6789

1011121314

Query3

Data Points• Join 6B rows to 1.5B rows to 150M rows (and many

small joins)

• Multiple aggregations/transformations

• Order (sorting)

Page 19: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Query 4

@blazingdb

Query 4

SE

CO

ND

S

Cold Medium(Disk cache only)

Hot

select sum(l_extendedprice) as sum_exprice,

sum(l_discount) as sum_discount

from lineitem

where l_shipdate >= '19940101'

and l_shipdate < '19950101'

and l_discount >= 0.05 and l_discount <= 0.07

and l_quantity < 24

1234

5

6789

1011121314

Query4

Data Points• 6B row table

• Multiple aggregations/transformations

Page 20: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

Query 5

@blazingdb

Query 5

SE

CO

ND

S

Cold Medium(Disk cache only)

Hot

select supplier.s_acctbal, supplier.s_suppkey, nation.name,

part.p_partkey, part.p_mfgr, supplier.s_address, supplier.s_phone,

supplier.s_comment

from supplier

inner join partsupp on supplier.s_suppkey = partsupp.ps_suppkey

inner join nation on supplier.s_nationkey = nation.nation_key

inner join region on nation.region_key = region.r_regionkey

inner join part on part.p_partkey = partsupp.ps_partkey

where part.p_size = 15

and part.p_type in ('ECONOMY ANODIZED BRASS', 'ECONOMY BRUSHED BRASS',

'ECONOMY BURNISHED BRASS', 'ECONOMY PLATED BRASS', 'ECONOMY POLISHED

BRASS', 'LARGE ANODIZED BRASS',

LARGE BRUSHED BRASS','LARGE BURNISHED BRASS','LARGE PLATED BRASS',

'LARGE POLISHED BRASS', 'SMALL ANODIZED BRASS', 'SMALL BRUSHED BRASS',

'SMALL BURNISHED BRASS',

SMALL PLATED BRASS', 'SMALL POLISHED BRASS', 'STANDARD ANODIZED

BRASS', 'STANDARD BRUSHED BRASS', 'STANDARD BURNISHED BRASS',

'STANDARD PLATED BRASS', 'STANDARD POLISHED BRASS')

and region.r_name = 'EUROPE'

order by supplier.s_acctbal desc, supplier.s_suppkey, nation.name,

part.p_partkey

Query1

Data Points• Join multiple tables

• Many aggregations/transformations

• String comparisons

Page 21: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

@blazingdb

Data Pipeline

GPU Data Frame

Apache Arrow

CommonData Layer

INGEST

STORAGE(Data Lake)

Coming Soon

Page 22: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String

@blazingdb

Questions?

Page 23: GPU-Accelerated Analytics on your Data Lake....Query 1 Query 2 Query 3 Query 4 Query 5 QUERIES NDS 1 1 5 5 46 6 1 8 8 3 72 1 14 14.9 12.2. Query 1 @blazingdb Query 1 NDS ... • String