13
April 16 Hangout [email protected] MapR Technologies

Apache Drill Hangout Update, April 16

Embed Size (px)

Citation preview

Page 1: Apache Drill Hangout Update, April 16

April 16 Hangout

[email protected] Technologies

Page 2: Apache Drill Hangout Update, April 16

Key Design Goals (reminder)

• Keep it off-heap for data, on-heap for metadata: heap < 0.5gb

• Support reasonable JNI interplay as desired• Specified, compatible wire level formats• Pipelined vectorized columnar execution• Nested data and late schema• Full SQL

Page 3: Apache Drill Hangout Update, April 16

Julian Hyde’s work on SQL parser

• GitHub push soon• Support for basic scan, project and filter. – Includes sub-queries, scalar function pass-

through, nested references and the any data type• Next Up: Group By, Union, Join

Page 4: Apache Drill Hangout Update, April 16

Topics

• Configuration• In Memory Formats• Schema Management• RPC Framework• Specific RPC protocols• Cluster Coordination and cache

Page 5: Apache Drill Hangout Update, April 16

Configuration

• Leverage HOCON for modular configuration– JSON++ for configurations: allows composite

configuration definitions and looser syntax.• Hierarchical precedence– Common module loads drill-default.conf top-level

configuration. – All other classpath drill-module.conf files loaded

to integrate additional classes– drill-override.conf provides user-level properties

Page 6: Apache Drill Hangout Update, April 16

Schema Management and RecordBatch

• RecordBatch is the relational operator unit of work• Targets ~256k in size, designed to fit in single core L2 cache• Internally manages a set of fields

– Focused on fields required for completion of the query. Inference provides some type information.

– Untouched or asterisk fields may be stored in secondary compound inline fields depending on RecordReader implementation

• Each next() call moves forward the set of records– Each movement forward informs whether a new schema was found—if so,

consumer should reconfigure based on updated schema.– Schema can be expanded: from one type to any type. Ultimately may be

able to contract as well (e.g. nullable to not-nullable). – An incoming schema changing does not necessarily modify an outgoing

schema.

Page 7: Apache Drill Hangout Update, April 16

In-memory Formats

• Values are managed in one of three ValueMode’s: ValueVector, RLE or Dict

• More concrete than some research work such as C-store but also allows for simpler implementation with most of the benefits.

• Physical plan describes the ValueMode of the particular fields. (a field level physical property)

• Depending on the particular requirements of a query and operator capabilities, data can be maintained in a compressed value-based structure. – Decision occurs at physical plan level prior to scans (requires format

foreknowledge)

Page 8: Apache Drill Hangout Update, April 16

ValueVector

• Primary common representation is ValueVector, a vectorized (array) uncompressed structure.

• Off-heap native buffers manually reference counted and fronted by Netty4’s ByteBuf abstraction.

• Support zero-copy transfer semantics when moving between operators.• Zero data serialization/dserialization allows direct write to and from sockets along

with batch level metadata• Ultimately generate a JNI operator stub so that individual operators or groups of

operators can be outside core system• Designed to leverage shared mmap between StorageEngine record readers and

Drillbits to minimize overhead and reduce necessity for storage engine level pushdown.

• Data Type variations include required, nullable and repeated• First few implementations made such as SInt32, Variable Length bytes, Nullables • Repeated will support cross field references as for record level and repeated-node

boundaries.

Page 9: Apache Drill Hangout Update, April 16

RPC Framework

• Zero-copy byte buffer transfers wrapped in a protobuf envelope.

• Fully symmetric push+pull based protocol• Top-level envelope utilizes standard protobuf envelope

encoding so that any language can interact: CompleteRpcMessage– Composed of three parts: RpcHeader, ProtobufBody,

RawBody. RawBody is optional (bytes). – For Java, we manually encode/decode the top level envelope

so that we can keep RawBody off-heap• Fully asynchronous using futures

Page 10: Apache Drill Hangout Update, April 16

RPC Protocols: Two Key Types

User to Bit• UserClient and UserServer• Supports RunQuery > Handle, RequestResults > QueryResult, CancelQuery >

Ack• Query Results mode can operate in: STREAM_FULL, STREAM_FIRST,

QUERY_FOR_STATUSBit to Bit• Each Drillbit can interact with all other Drillbits• Locations are managed via a cluster cache• Either Drillbit can act as server or client (bi-directional)• Managed via BitCom: which maintains server sessions and client

connections as necessary• Supports activity such as fragment announcement, send record batch, node

progress, cancel fragment.

Page 11: Apache Drill Hangout Update, April 16

Cluster Coordination and Cache

• Cluster coordination is done through ClusterCoordinator abstraction– Manages node-level service registration, currently singular across

both RPC types– Leverages Netflix’s Curator framework– Manages a cache of available Drillbits and associated capabilities

per node– Used by clients and Drillbits

• DistributedCache implemented through embedded Hazelcast– Sets up a distributed topic for queue depth management– Will be used for query plan caching, other shared state– Expected to be used only by Drillbits, not clients

Page 12: Apache Drill Hangout Update, April 16

Other Discussion

• Timothy: Overview of Supersonic exploration• David: Ideas around HBase and other work

Page 13: Apache Drill Hangout Update, April 16

Where we need help

• Addition of Values operator to Reference Interpreter for SQL parser• Modify reference interpreter to avoid modification of existing records• First-level code reviews• Physical plan definition and documentation• More tests cases• TPC-H logical and physical plans• Simple identity transformer/optimizer (logical > physical)• Execution fragment format• First full-execution level storage operator, potentially using mmap shared memory• Forman implementation for query processing management• Review and evaluation of newer file formats and interaction with in-memory

formats• First POP implementations• Lots of scalar function vector implementations