A Perfect Hybrid

A Perfect Hybrid

Split query processing in Polybase

[email protected]

2013/4/25

Outline• Background• Related Work• PDW• Polybase• Performance Evaluation

Background• Structured data & unstructured data

• RDBMS & Big Data

RDBMSHadoop

Combine

Insight

Related Work• Sqoop: Transferring bulk data between Hadoop

and structured data stores such as relational database

• Teradata & Asterdata• Greenplum & Vertica : external table• Oracle: external table and OLH(Oracle loader for

Hadoop)• IBM: split mechanism to use mapreduce to access

appliance• Hadapt(HadoopDB): outset to support the

execution of SQL-like queries across both unstructured and structured data sets.

PDW Architecture• Parallel Data Warehouse

• Shared-nothing system

Components in PDW• Node

o SQL server instance on nodeo Data are hash-partitioned through compute node

• Control node: [PWD Engine in it]o query parsingo Optimizationo creating distributed execution plan to compute nodes(DSQL)o tracking execution steps of plan to compute nodes

• Compute node:o Storageo Query processing

• DMS: Data Movement Service o (1)repartitioning rows of a table among the SQL Server instances on PDW

compute nodes.o (2)converting fields of rows being loaded into appliance into the appropriate

ODBC types.

Overview of Polybase• A new feature in PDW V2• Using SQL standard language• Dealing with both structured and unstructured

data(in SQL Server and Hadoop)• Split query processing paradigm• leverages the capabilities of SQL Server PDW,

especially it's cost-based parallel query optimizer and execution engine.

Use case of Polybase

Assumption in Polybase

• 1. Polybase makes no assumptions about where HDFS data is

• 2. Nor any assumptions about the OS of data nodes

• 3. Nor the format of HDFS files (i.e. TextFile, RCFile, custom, …)

Core Components• External Table• HDFS Bridge in DMS• Cost-based query optimizer(wrapping the one in

V1)

External Table• Create cluster instance

o CREATE HADOOP_CLUSTER GSL_CLUSTER WITH (namenode=‘hadoop-head’,namenode_port=9000, jobtracker=‘hadoop-head’,jobtracker_port=9010);

• Create File Formato CREATE HADOOP_FILEFORMAT TEXT_FORMAT WITH (INPUT_FORMAT=‘polybase.TextInputFormat’, OUTPUT_FORMAT = ‘polybase.TextOutputFormat’, ROW_DELIMITER = '\n', COLUMN_DELIMITER = ‘|’);

• Create External Tableo CREATE EXTERNAL TABLE hdfsCustomer ( c_custkey bigint not null, c_name varchar(25) not null, …… c_comment varchar(117) not null) WITH (LOCATION='/tpch1gb/customer.tbl', FORMAT_OPTIONS (EXTERNAL_CLUSTER = GSL_CLUSTER, EXTERNAL_FILEFORMAT = TEXT_FORMAT));

HDFS Bridge

HDFS Bridge• HDFS is a component of DMS• Goal: Transferring data in parallel between the

nodes of Hadoop and PDW clusters.• HDFS shuffle phase: (read data from hadoop)

o 1. Communicate with namenode, get info of fileo 2. Balance number of bytes read by each DMS instance(based on hdfs

info and dms instances count)o 3. Invoke openRecordReader() RecordReader instance: directly

communicate with datanodeo 4. Get data and transfer into ODBC types.(may done in mapreduce job)o 4. Hash function to determine target node for each record

• Write to hadoop is almost the sameo Invoking openRecordWriter()

Read Process

Optimizer & Compilation

• Parsingo A Memo data structure of alternative serial plans

• Parallel optimization[in PDW V1]o Bottom-up optimizer to insert data movement operators in the serial

plans

• Cost-based query optimizer:[whether pushing to Hadoop]o Based on statistics\ relative size of two clusters and other factors

• Semantic Compatibilityo Data typeso SQL semanticso Error handling

Statistics• Define statistics table for external table:

o CREATE STATISTICS hdfsCustomerStats ONo hdfsCustomer (c_custkey);

• Steps to obtain statistics in HDFS:o 1. Read block level sample data from DMS or map jobso 2. Partitioned samples across compute nodes.o 3. Each node calculates a histogram on its portiono 4. Merge all histograms stored in catalog for database.

• An alternative implementation:o In HadoopV2, let Hadoop cluster calculate the histograms. (cost a lot)o Make the best use of computational resource of Hadoop cluster

Semantic Compatibility

• Data typeso Java primitive typeso Non-primitive typeso Third-party types that can be implementedo Marked those can not be implemented in Java[only can be processed in

PDW]

• SQL semanticso Return of Expressions: implemented in Javao Returning null: eg. A+B (A==null || B==null)?null: (A+B)o Marked those can not be implemented in Java[only can be processed in

PDW]

• Error handlingo Exceptions will come out in SQL should also be throwed in Java

Example• SELECT count (*)

from Customer WHERE acctbal < 0GROUP BY nationkey

Optimized Query Plan #1

Optimized Query Plan #2

MapReduce Join• Distributed Hash Join

o Support for equi-join

• Implementation: o Build side: the side with smaller size of data. They are materialized in

HDFS.o Probe side: the other side of data. o Partition build side, making build side in-memory to speed up.o Build side may also be replicated.

Performance Evaluation

• Test configuration:o C-16/48 16 node PDW cluster, 48 node Hadoop clustero C-30/30 30 node PDW cluster, 30 node Hadoop clustero C-60 60 node PDW cluster and 60 node Hadoop cluster

• Test database:o Two identical tables T1 and T2

• 10 billion rows• 13 integer attributes and 3 string attributes (~200 bytes/row)• About 2TB uncompressed

o One copy of each table in HDFS• HDFS block size of 256 MB• Stored as a compressed RCFile• RCFiles store rows “column wise” inside a block

o One copy of each table in PDW• Block-wise compression enabled

SELECT u1, u2, u3, str1, str2, str4 from T1 (in HDFS) where (u1 % 100) < sf

Selection on HDFS table

1 20 40 60 80 1000

500

1000

1500

2000

2500

PDWImportMR

Selectivity Factor (%)

Exe

cu

tion

Tim

e (

secs.

)

Polybase Phase 2

Polybase Phase 1

Crossover Point:Above a selectivity factor of ~80%, PB Phase 2 is slower

PB.1

SP

PB.1 PB.1 PB.1 PB.1 PB.1

SP

SP

SP

SP

SP

23

Join HDFS Table with PDW Table

SELECT * from T1 (HDFS), T2 (PDW) where T1.u1 = T2.u2 and (T1.u2 % 100) < sf and (T2.u2 % 100) < 50

1 33 66 1000

500

1000

1500

2000

2500

3000

3500

PDW

Import

MR

Selectivity Factor (%)

Exe

cu

tion

Tim

e (

secs.

)

Polybase Phase 1

Polybase Phase 2

PB.1

SP

SP

SP

SP

PB.1 PB.1 PB.1

24

Join Two HDFS TablesSELECT * from T1 (HDFS), T2 (HDFS) where T1.u1 = T2.u2 and (T1.u2 % 100) < SF and (T2.u2 % 100) < 10

PB

.1

PB

.2H

PB

.2P

PB

.2H

PB

.2H

PB

.1

PB

.1

PB

.1

1 33 66 1000

500

1000

1500

2000

2500

PDW

Import-Join

MR-Shuffle-J

MR-Shuffle

Import T2

Import T1

MR- Sel T2

MR-Sel T1Selectivity Factor

Execu

tion

Tim

e (

secs.)

PB

.2H

PB

.2P

PB

.2P

PB

.2P

PB.2P – Selections on T1 and T2 pushed to Hadoop. Join performed on PDWPB.1 – All operators on PDWPB.2H – Selections & Join on Hadoop

25

Performance Wrap-up• Split query processing really works!• Up to 10X performance improvement!• A cost-based optimizer is clearly required to

decide when an operator should be pushed• Optimizer must also incorporate relative cluster

sizes in its decisions

Reference• Split Query Processing in Polybase(SIGMOD’13 ， June 22-

27,2013,New York,USA.)o Microsoft Corporation

• Polybase: What, Why, How(ppt)o Microsoft Corporation

• Query Optimization in Microsoft SQL Server PDW (SIGMOD'12, May 20-24,2012,Scottsdale,Arizona,USA)o Microsoft Corporation

THANKS

Documents

A Perfect Hybrid