12
HiveKa Hive on Kafka Szehon Ho, Ashish Singh

Hive on kafka

Embed Size (px)

Citation preview

HiveKa Hive on Kafka

Szehon Ho, Ashish Singh

2 © 2014 Cloudera, Inc. All rights reserved.

Background •  HiveKa was our Cloudera 2014 Hackathon Project

•  Ashish Singh •  Gwen Shapira •  Szehon Ho

3 © 2014 Cloudera, Inc. All rights reserved.

Background •  Enable SQL on all user’s data, even in Kafka cluster •  Implementation via HiveStorageHandler on Kafka

4 © 2014 Cloudera, Inc. All rights reserved.

Apache Kafka •  LinkedIn general-purpose distributed publish-subscribe framework

•  Ingest problem: How to get data into Hadoop •  Standardize data pipelines: Eliminate ad-hoc pipelines. •  Scalable and resilient, and low-latency

5 © 2014 Cloudera, Inc. All rights reserved.

Apache Kafka •  Producer •  Consumer •  Cluster = Brokers

•  Message Store •  Message replication

6 © 2014 Cloudera, Inc. All rights reserved.

Apache Kafka •  Message

•  Key-Value •  Offset

•  Topics •  Partitions

•  Messages in order

7 © 2014 Cloudera, Inc. All rights reserved.

Existing Solution: Camus •  LinkedIn developed Kafka à HDFS pipeline called Camus

1.  Camus’s InputFormat pulls latest message from Kafka into HDFS 2.  Pluggable MessageDecoder (Kafka message bytes -> Writable)

8 © 2014 Cloudera, Inc. All rights reserved.

HiveKa •  We implemented Hive storage-handlers to access Kafka messages directly

from Hive •  ETL: Load data directly into Hive, bypass Camus •  Analytic: Run Hive queries directly on Kafka data

KafkaStorageHandler

Demo

10 © 2014 Cloudera, Inc. All rights reserved.

HiveKa Design

•  Future: •  Avro schema •  Expose pluggable MessageDecoder/SerDe pairs for different Kafka messages.

11 © 2014 Cloudera, Inc. All rights reserved.

Conclusion •  Guide to implementing Hive Storage Handlers:

http://szehon3.wordpress.com/2014/11/09/kafkaesque-hive-thoughts-on-storage-handlers/

•  Website with source code and examples: http://hiveka.weebly.com/ •  Source code: https://github.com/HiveKa/HiveKa

•  Will contribute back to Hive

Thank you.