135
HAWQ Architecture Alexey Grishchenko

HAWQ Architecture HL++ 2015 Moscow

  • Upload
    hadat

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: HAWQ Architecture HL++ 2015 Moscow

HAWQ Architecture Alexey Grishchenko

Page 2: HAWQ Architecture HL++ 2015 Moscow

Who I am Enterprise Architect @ Pivotal •  7 years in data processing

•  5 years of experience with MPP

•  4 years with Hadoop

•  Using HAWQ since the first internal Beta

•  Responsible for designing most of the EMEA HAWQ and Greenplum implementations

•  Spark contributor

•  http://0x0fff.com

Page 3: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

Page 4: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

•  Why you need it

Page 5: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

•  Why you need it

•  HAWQ Components

Page 6: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

•  Why you need it

•  HAWQ Components

•  HAWQ Design

Page 7: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

•  Why you need it

•  HAWQ Components

•  HAWQ Design

•  Query execution example

Page 8: HAWQ Architecture HL++ 2015 Moscow

Agenda •  What is HAWQ

•  Why you need it

•  HAWQ Components

•  HAWQ Design

•  Query execution example

•  Competitive solutions

Page 9: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

Page 10: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Page 11: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Postgres Greenplum HAWQ

2005 Fork Postgres 8.0.2

Page 12: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Postgres HAWQ

2005 Fork Postgres 8.0.2

2009 Rebase Postgres 8.2.15

Greenplum

Page 13: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Postgres HAWQ

2005 Fork Postgres 8.0.2

2009 Rebase Postgres 8.2.15

2011 Fork GPDB 4.2.0.0

Greenplum

Page 14: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Postgres HAWQ

2005 Fork Postgres 8.0.2

2009 Rebase Postgres 8.2.15

2011 Fork GPDB 4.2.0.0

2013 HAWQ 1.0.0.0

Greenplum

Page 15: HAWQ Architecture HL++ 2015 Moscow

What is •  Analytical SQL-on-Hadoop engine

•  HAdoop With Queries

Postgres HAWQ

2005 Fork Postgres 8.0.2

2009 Rebase Postgres 8.2.15

2011 Fork GPDB 4.2.0.0

2013 HAWQ 1.0.0.0

HAWQ 2.0.0.0 Open Source 2015

Greenplum

Page 16: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

Page 17: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

Page 18: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

Page 19: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

•  60’000 Java LOC

Page 20: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

•  60’000 Java LOC

•  23’000 Makefile LOC

Page 21: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

•  60’000 Java LOC

•  23’000 Makefile LOC

•  7’000 Shell scripts LOC

Page 22: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

•  60’000 Java LOC

•  23’000 Makefile LOC

•  7’000 Shell scripts LOC

•  More than 50 enterprise customers

Page 23: HAWQ Architecture HL++ 2015 Moscow

HAWQ is … •  1’500’000 C and C++ lines of code

–  200’000 of them in headers only

•  180’000 Python LOC

•  60’000 Java LOC

•  23’000 Makefile LOC

•  7’000 Shell scripts LOC

•  More than 50 enterprise customers –  More than 10 of them in EMEA

Page 24: HAWQ Architecture HL++ 2015 Moscow

Apache HAWQ •  Apache HAWQ (incubating) from 09’2015

–  http://hawq.incubator.apache.org

–  https://github.com/apache/incubator-hawq

•  What’s in Open Source –  Sources of HAWQ 2.0 alpha

–  HAWQ 2.0 beta is planned for 2015’Q4

–  HAWQ 2.0 GA is planned for 2016’Q1

•  Community is yet young – come and join!

Page 25: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

Page 26: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

Page 27: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

–  Example - 5000-line query with a number of window function generated by Cognos

Page 28: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

–  Example - 5000-line query with a number of window function generated by Cognos

•  Universal tool for ad hoc analytics on top of Hadoop data

Page 29: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

–  Example - 5000-line query with a number of window function generated by Cognos

•  Universal tool for ad hoc analytics on top of Hadoop data

–  Example - parse URL to extract protocol, host name, port, GET parameters

Page 30: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

–  Example - 5000-line query with a number of window function generated by Cognos

•  Universal tool for ad hoc analytics on top of Hadoop data

–  Example - parse URL to extract protocol, host name, port, GET parameters

•  Good performance

Page 31: HAWQ Architecture HL++ 2015 Moscow

Why do we need it?

•  SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003

–  Example - 5000-line query with a number of window function generated by Cognos

•  Universal tool for ad hoc analytics on top of Hadoop data

–  Example - parse URL to extract protocol, host name, port, GET parameters

•  Good performance –  How many times the data would hit the HDD during

a single Hive query?

Page 32: HAWQ Architecture HL++ 2015 Moscow

HAWQ Cluster Server1

SNameNode

Server4

ZK JM

NameNode

Server3

ZK JM

Server2

ZK JM

Server6

Datanode

ServerN

Datanode

Server5

Datanode

interconnect

Page 33: HAWQ Architecture HL++ 2015 Moscow

HAWQ Cluster Server1

SNameNode

Server4

ZK JM

NameNode

Server3

ZK JM

Server2

ZK JM

Server6

Datanode

ServerN

Datanode

Server5

Datanode

YARNNM YARNNM YARNNM

YARNRM YARNAppTimeline

interconnect

Page 34: HAWQ Architecture HL++ 2015 Moscow

HAWQ Cluster

HAWQMaster

Server1

SNameNode

Server4

ZK JM

NameNode

Server3

ZK JM

HAWQStandby

Server2

ZK JM

HAWQSegment

Server6

Datanode

HAWQSegment

ServerN

Datanode

HAWQSegment

Server5

Datanode

YARNNM YARNNM YARNNM

YARNRM YARNAppTimeline

interconnect

Page 35: HAWQ Architecture HL++ 2015 Moscow

Master Servers Server1

SNameNode

Server4

ZK JM

NameNode

Server3

ZK JM

Server2

ZK JM

HAWQSegment

Server6

Datanode

HAWQSegment

ServerN

Datanode

HAWQSegment

Server5

Datanode

YARNNM YARNNM YARNNM

YARNRM YARNAppTimeline

interconnect

HAWQMaster HAWQStandby

Page 36: HAWQ Architecture HL++ 2015 Moscow

Master Servers

HAWQMaster

QueryParser

QueryOpJmizer

GlobalResourceManager

DistributedTransacJonsManager

QueryDispatch

MetadataCatalog

HAWQStandbyMaster

QueryParser

QueryOp,mizer

GlobalResourceManager

DistributedTransac,onsManager

QueryDispatch

MetadataCatalog

WALrepl.

Page 37: HAWQ Architecture HL++ 2015 Moscow

HAWQMaster HAWQStandby

Segments Server1

SNameNode

Server4

ZK JM

NameNode

Server3

ZK JM

Server2

ZK JM

Server6

Datanode

ServerN

Datanode

Server5

Datanode

YARNNM YARNNM YARNNM

YARNRM YARNAppTimeline

interconnect

HAWQSegment HAWQSegmentHAWQSegment …

Page 38: HAWQ Architecture HL++ 2015 Moscow

Segments HAWQSegment

QueryExecutor

libhdfs3

PXF

HDFSDatanode

LocalFilesystemTemporaryData

Directory

Logs

YARNNodeManager

Page 39: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

Page 40: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

Page 41: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

–  Most common values for each field

Page 42: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

–  Most common values for each field

–  Histogram of values distribution for each field

Page 43: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

–  Most common values for each field

–  Histogram of values distribution for each field

–  Number of unique values in the field

Page 44: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

–  Most common values for each field

–  Histogram of values distribution for each field

–  Number of unique values in the field

–  Number of null values in the field

Page 45: HAWQ Architecture HL++ 2015 Moscow

Metadata •  HAWQ metadata structure is similar to

Postgres catalog structure

•  Statistics

–  Number of rows and pages in the table

–  Most common values for each field

–  Histogram of values distribution for each field

–  Number of unique values in the field

–  Number of null values in the field

–  Average width of the field in bytes

Page 46: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics How many rows would produce the join of two tables?

Page 47: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics How many rows would produce the join of two tables?

ü  From 0 to infinity

Page 48: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics

Row Count

How many rows would produce the join of two tables?

ü  From 0 to infinity

How many rows would produce the join of two 1000-row tables?

Page 49: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics

Row Count

How many rows would produce the join of two tables?

ü  From 0 to infinity

How many rows would produce the join of two 1000-row tables?

ü  From 0 to 1’000’000

Page 50: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics

Row Count

Histograms and MCV

How many rows would produce the join of two tables?

ü  From 0 to infinity

How many rows would produce the join of two 1000-row tables?

ü  From 0 to 1’000’000

How many rows would produce the join of two 1000-row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?

Page 51: HAWQ Architecture HL++ 2015 Moscow

Statistics No Statistics

Row Count

Histograms and MCV

How many rows would produce the join of two tables?

ü  From 0 to infinity

How many rows would produce the join of two 1000-row tables?

ü  From 0 to 1’000’000

How many rows would produce the join of two 1000-row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?

ü  ~ From 500 to 1’500

Page 52: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90

Page 53: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90

hash(ID)

Page 54: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

–  Number of hash buckets

ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90

hash(ID)

ID Name Num Price1 Яблоко 10 50

2 Груша 20 803 Банан 40 40

4 Апельсин 25 505 Киви 5 120

6 Арбуз 20 307 Дыня 40 100

8 Ананас 35 90

Page 55: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

–  Number of hash buckets

–  Partitioning (hash, list, range)

ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90

hash(ID)

ID Name Num Price1 Яблоко 10 50

2 Груша 20 803 Банан 40 40

4 Апельсин 25 50

5 Киви 5 1206 Арбуз 20 307 Дыня 40 100

8 Ананас 35 90

Page 56: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

–  Number of hash buckets

–  Partitioning (hash, list, range)

•  General metadata –  Users and groups

Page 57: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

–  Number of hash buckets

–  Partitioning (hash, list, range)

•  General metadata –  Users and groups

–  Access privileges

Page 58: HAWQ Architecture HL++ 2015 Moscow

Metadata •  Table structure information

–  Distribution fields

–  Number of hash buckets

–  Partitioning (hash, list, range)

•  General metadata –  Users and groups

–  Access privileges

•  Stored procedures –  PL/pgSQL, PL/Java, PL/Python, PL/Perl, PL/R

Page 59: HAWQ Architecture HL++ 2015 Moscow

Query Optimizer •  HAWQ uses cost-based query optimizers

Page 60: HAWQ Architecture HL++ 2015 Moscow

Query Optimizer •  HAWQ uses cost-based query optimizers

•  You have two options –  Planner – evolved from the Postgres query

optimizer

–  ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ

Page 61: HAWQ Architecture HL++ 2015 Moscow

Query Optimizer •  HAWQ uses cost-based query optimizers

•  You have two options –  Planner – evolved from the Postgres query

optimizer

–  ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ

•  Optimizer hints work just like in Postgres –  Enable/disable specific operation

–  Change the cost estimations for basic actions

Page 62: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

Page 63: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

ü  It depends on what you mean by “optimal”

Page 64: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

ü  It depends on what you mean by “optimal” –  Minimal CPU usage for reading and writing the data

Page 65: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

ü  It depends on what you mean by “optimal” –  Minimal CPU usage for reading and writing the data

–  Minimal disk space usage

Page 66: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

ü  It depends on what you mean by “optimal” –  Minimal CPU usage for reading and writing the data

–  Minimal disk space usage

–  Minimal time to retrieve record by key

Page 67: HAWQ Architecture HL++ 2015 Moscow

Storage Formats Which storage format is the most optimal?

ü  It depends on what you mean by “optimal” –  Minimal CPU usage for reading and writing the data

–  Minimal disk space usage

–  Minimal time to retrieve record by key

–  Minimal time to retrieve subset of columns

–  etc.

Page 68: HAWQ Architecture HL++ 2015 Moscow

Storage Formats •  Row-based storage format

–  Similar to Postgres heap storage

•  No toast

•  No ctid, xmin, xmax, cmin, cmax

Page 69: HAWQ Architecture HL++ 2015 Moscow

Storage Formats •  Row-based storage format

–  Similar to Postgres heap storage

•  No toast

•  No ctid, xmin, xmax, cmin, cmax

–  Compression

•  No compression

•  Quicklz

•  Zlib levels 1 - 9

Page 70: HAWQ Architecture HL++ 2015 Moscow

Storage Formats •  Apache Parquet

–  Mixed row-columnar table store, the data is split into “row groups” stored in columnar format

Page 71: HAWQ Architecture HL++ 2015 Moscow

Storage Formats •  Apache Parquet

–  Mixed row-columnar table store, the data is split into “row groups” stored in columnar format

–  Compression

•  No compression

•  Snappy

•  Gzip levels 1 – 9

Page 72: HAWQ Architecture HL++ 2015 Moscow

Storage Formats •  Apache Parquet

–  Mixed row-columnar table store, the data is split into “row groups” stored in columnar format

–  Compression

•  No compression

•  Snappy

•  Gzip levels 1 – 9

–  The size of “row group” and page size can be set for each table separately

Page 73: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Two main options

–  Static resource split – HAWQ and YARN does not know about each other

Page 74: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Two main options

–  Static resource split – HAWQ and YARN does not know about each other

–  YARN – HAWQ asks YARN Resource Manager for query execution resources

Page 75: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Two main options

–  Static resource split – HAWQ and YARN does not know about each other

–  YARN – HAWQ asks YARN Resource Manager for query execution resources

•  Flexible cluster utilization –  Query might run on a subset of nodes if it is small

Page 76: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Two main options

–  Static resource split – HAWQ and YARN does not know about each other

–  YARN – HAWQ asks YARN Resource Manager for query execution resources

•  Flexible cluster utilization –  Query might run on a subset of nodes if it is small

–  Query might have many executors on each cluster node to make it run faster

Page 77: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Two main options

–  Static resource split – HAWQ and YARN does not know about each other

–  YARN – HAWQ asks YARN Resource Manager for query execution resources

•  Flexible cluster utilization –  Query might run on a subset of nodes if it is small

–  Query might have many executors on each cluster node to make it run faster

–  You can control the parallelism of each query

Page 78: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

Page 79: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

Page 80: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

–  Memory usage limits

Page 81: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

–  Memory usage limits

–  CPU cores usage limit

Page 82: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

–  Memory usage limits

–  CPU cores usage limit

–  MIN/MAX number of executors across the system

Page 83: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

–  Memory usage limits

–  CPU cores usage limit

–  MIN/MAX number of executors across the system

–  MIN/MAX number of executors on each node

Page 84: HAWQ Architecture HL++ 2015 Moscow

Resource Management •  Resource Queue can be set with

–  Maximum number of parallel queries

–  CPU usage priority

–  Memory usage limits

–  CPU cores usage limit

–  MIN/MAX number of executors across the system

–  MIN/MAX number of executors on each node

•  Can be set up for user or group

Page 85: HAWQ Architecture HL++ 2015 Moscow

External Data •  PXF

–  Framework for external data access

–  Easy to extend, many public plugins available

–  Official plugins: CSV, SequenceFile, Avro, Hive, HBase

–  Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe

Page 86: HAWQ Architecture HL++ 2015 Moscow

External Data •  PXF

–  Framework for external data access

–  Easy to extend, many public plugins available

–  Official plugins: CSV, SequenceFile, Avro, Hive, HBase

–  Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe

•  HCatalog –  HAWQ can query tables from HCatalog the same

way as HAWQ native tables

Page 87: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

Page 88: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

QE

Page 89: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

QE

Page 90: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

QE

Page 91: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

QE

Page 92: HAWQ Architecture HL++ 2015 Moscow

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Resource Prepare Execute Result CleanupPlan

QE ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 93: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Page 94: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Ineed5containersEachwith1CPUcoreand256MBRAM

Page 95: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Ineed5containersEachwith1CPUcoreand256MBRAM

Server1:2containersServer2:1containerServerN:2containers

Page 96: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Ineed5containersEachwith1CPUcoreand256MBRAM

Server1:2containersServer2:1containerServerN:2containers

Page 97: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Ineed5containersEachwith1CPUcoreand256MBRAM

Server1:2containersServer2:1containerServerN:2containers

Page 98: HAWQ Architecture HL++ 2015 Moscow

Plan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

QE

Resource

Ineed5containersEachwith1CPUcoreand256MBRAM

Server1:2containersServer2:1containerServerN:2containers

QE QE QE QE QE

Page 99: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Execute Result Cleanup

QE

QE QE QE QE QE

Prepare

Page 100: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Execute Result Cleanup

QE

QE QE QE QE QE

Prepare

ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 101: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Execute Result Cleanup

QE

QE QE QE QE QE

Prepare

ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 102: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Result Cleanup

QE

QE QE QE QE QE

Prepare Execute

ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 103: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Result Cleanup

QE

QE QE QE QE QE

Prepare Execute

ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 104: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Result Cleanup

QE

QE QE QE QE QE

Prepare Execute

ScanBarsb

HashJoinb.name = s.bar

ScanSellss Filterb.city = 'San Francisco'

Projects.beer, s.price

MotionGather

MotionRedist(b.name)

Page 105: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Cleanup

QE

QE QE QE QE QE

Prepare Execute Result

Page 106: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Cleanup

QE

QE QE QE QE QE

Prepare Execute Result

Page 107: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Cleanup

QE

QE QE QE QE QE

Prepare Execute Result

Page 108: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

Page 109: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers

Page 110: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers

OK

Page 111: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers

OK

Page 112: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers

OK

Page 113: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

QE

QE QE QE QE QE

Prepare Execute Result Cleanup

FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers

OK

Page 114: HAWQ Architecture HL++ 2015 Moscow

ResourcePlan

Query Example HAWQMaster

Metadata

TransacJonMgr.

QueryParser QueryOpJmizer

QueryDispatch

ResourceMgr.

NameNode

Server1

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

Server2

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

ServerN

Localdirectory

HAWQSegmentPostmaster

HDFSDatanode

YARNRMPostmaster

Prepare Execute Result Cleanup

Page 115: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

Page 116: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

•  Data is not buffered on the segments unless this cannot be avoided

Page 117: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

•  Data is not buffered on the segments unless this cannot be avoided

•  Data is transferred between the nodes by UDP

Page 118: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

•  Data is not buffered on the segments unless this cannot be avoided

•  Data is transferred between the nodes by UDP

•  HAWQ has a good cost-based query optimizer

Page 119: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

•  Data is not buffered on the segments unless this cannot be avoided

•  Data is transferred between the nodes by UDP

•  HAWQ has a good cost-based query optimizer

•  C/C++ implementation is more efficient than Java implementation of competitive solutions

Page 120: HAWQ Architecture HL++ 2015 Moscow

Query Performance •  Data does not hit the disk unless this cannot be

avoided

•  Data is not buffered on the segments unless this cannot be avoided

•  Data is transferred between the nodes by UDP

•  HAWQ has a good cost-based query optimizer

•  C/C++ implementation is more efficient than Java implementation of competitive solutions

•  Query parallelism can be easily tuned

Page 121: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

Page 122: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Page 123: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

Page 124: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

DiskIO

Page 125: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

DiskIO

Parallelism

Page 126: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

DiskIO

Parallelism

DistribuJons

Page 127: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

DiskIO

Parallelism

DistribuJons

Stability

Page 128: HAWQ Architecture HL++ 2015 Moscow

Competitive Solutions Hive SparkSQL Impala HAWQ

QueryOpJmizer

ANSISQL

Built-inLanguages

DiskIO

Parallelism

DistribuJons

Stability

Community

Page 129: HAWQ Architecture HL++ 2015 Moscow

Roadmap •  AWS and S3 integration

Page 130: HAWQ Architecture HL++ 2015 Moscow

Roadmap •  AWS and S3 integration

•  Mesos integration

Page 131: HAWQ Architecture HL++ 2015 Moscow

Roadmap •  AWS and S3 integration

•  Mesos integration

•  Better Ambari integration

Page 132: HAWQ Architecture HL++ 2015 Moscow

Roadmap •  AWS and S3 integration

•  Mesos integration

•  Better Ambari integration

•  Cloudera, MapR and IBM Hadoop distributions native support

Page 133: HAWQ Architecture HL++ 2015 Moscow

Roadmap •  AWS and S3 integration

•  Mesos integration

•  Better Ambari integration

•  Cloudera, MapR and IBM Hadoop distributions native support

•  Make the SQL-on-Hadoop engine ever!

Page 134: HAWQ Architecture HL++ 2015 Moscow

Summary

•  Modern SQL-on-Hadoop engine

•  For structured data processing and analysis

•  Combines the best techniques of competitive solutions

•  Just released to the open source

•  Community is very young

Join our community and contribute!

Page 135: HAWQ Architecture HL++ 2015 Moscow

Questions

Apache HAWQ

http://hawq.incubator.apache.org

[email protected]

[email protected]

Reach me on http://0x0fff.com