Intro to Pinot (2016-01-04)

an introduction to pinot

Jean-François Im <[email protected]>2016-01-04 Tue

outline

Introduction

When to use Pinot?

An overview of the Pinot architecture

Managing Data in Pinot

Data storage

Realtime data in Pinot

Retention

Conclusion

2/38

introduction

what is pinot?

∙ Distributed near-realtime OLAP datastore∙ Used at LinkedIn for various user-facing (“Who viewedmy profile,” publisher analytics, etc.), client-facing (adcampaign creation and tracking) and internal analytics(XLNT, EasyBI, Raptor, etc.)

4/38

what is pinot

∙ Offers a SQL query interface on top of a custom-writtendata store

∙ Offers near-realtime ingestion of events from Kafka (afew seconds latency at most)

∙ Supports pushing data from Hadoop∙ Can combine data from Hadoop and Kafka at runtime∙ Scales horizontally and linearly if data size or queryrate increases

∙ Fault tolerant (any component can fail without causingavailability issues, no single point of failure)

∙ Automatic data expiration5/38

example of queries

SELECTweeksSinceEpochSunday,distinctCount(viewerId)

FROM mirrorProfileViewEventsWHERE vieweeId = ... AND(viewerPrivacySetting = ’F’ OR

... OR viewerPrivacySetting = ’’) ANDdaysSinceEpoch >= 16624 ANDdaysSinceEpoch <= 16714

GROUP BY weeksSinceEpochSundayTOP 20 LIMIT 0

6/38

example of queries

7/38

how does “who viewed my profile” work?

8/38

usage of pinot at linkedin

∙ Over 50 use cases at LinkedIn∙ Several thousands of queries per second acrossmultiple data centers

∙ Operates 24x7, exposes metrics for productionmonitoring

∙ The internal de facto solution for scalable dataquerying

9/38

when to use pinot?

design limitations

∙ Pinot is designed for analytical workloads (OLAP), nottransactional ones (OLTP)

∙ Data in Pinot is immutable (eg. no UPDATE statement),though it can be overwritten in bulk

∙ Realtime data is append-only (can only load new rows)∙ There is no support for JOINs or subselects∙ There are no UDFs for aggregation (work in progress)

11/38

when to use pinot?

∙ When you have an analytics problem (How many of “x”happened?)

∙ When you have many queries per day and require lowquery latency (otherwise use Hadoop for one-time adhoc queries)

∙ When you can’t pre-aggregate data to be stored insome other storage system (otherwise use Voldemortor an OLAP cubing solution)

12/38

an overview of the pinotarchitecture

controller, broker and server

∙ There are three components in Pinot: Controller, brokerand server

∙ Controller: Handles cluster-wide coordination usingApache Helix and Apache Zookeeper

∙ Broker: Handles query fan out and query routing toservers

∙ Server: Responds to query requests originating fromthe brokers

14/38


15/38


∙ All of these components are redundant, so there is nosingle point of failure by design

∙ Uses Zookeeper as a coordination mechanism

16/38

managing data in pinot

getting data into pinot

∙ Let’s first look at the offline case. We have data inHadoop that we would like to get into Pinot.

18/38


∙ Data in pinot is packaged into segments, which containa set of rows

∙ These are then uploaded into Pinot

19/38


∙ A segment is a pre-built index over this set of rows∙ Data in Pinot is stored in columnar format (we’ll get tothis later)

∙ Each input Avro file maps to one Pinot segment

20/38


∙ Each segment file that is generated contains both theminimum and maximum timestamp contained in thedata

∙ Each segment file also has a sequential numberappended to the end

∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_0∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_1∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_2

21/38


∙ Data uploaded into Pinot is stored on a segment basis∙ Uploading a segment with the same name overwritesthe data that currently exists in that segment

∙ This is the only way to update data in Pinot

22/38

data storage

data orientation: rows and columns

∙ Most OLTP databases store data in a row-orientedformat

∙ Pinot stores its data in a column-oriented format∙ If you have heard the terms array of structures (AoS)and structure of arrays (SoA), this is the same idea

24/38

data orientation: rows and columns

25/38

benefits of column-orientation

∙ Queries only read the data they need (columns notused in a query are not read)

∙ Individual row lookups are slower, aggregations arefaster

∙ Compression can be a lot more effective, as relateddata is packed together

26/38

a couple of tricks

∙ Pinot uses a couple of techniques to reduce data size∙ Dictionary encoding allows us to deduplicate repetitivedata in a single column (eg. country, state, gender)

∙ Bit packing allows us to pack multiple values in thesame byte/word/dword

27/38

realtime data in pinot

tables: offline and realtime

∙ Pinot has two kinds of tables: offline and realtime∙ An offline table stores data that has been pushed fromHadoop, while a realtime sources its data from Kafka

∙ These two tables are disjoint and can contain the samedata

29/38

data ingestion

∙ Realtime data ingestion is done through Kafka∙ In the open source release, there is a JSON decoderand an Avro decoder for messages

∙ This architecture allows plugging in new data ingestionsources (eg. other message queuing systems), thoughat this time there are no other sources implemented

30/38

hybrid querying

∙ Since realtime and offline tables are disjoint, how arethey queried?

∙ If an offline and realtime table have the same name,when a broker receives a query, it rewrites it to twoqueries, one for the offline and one for the realtimetable

31/38

hybrid querying

∙ Data is partitioned according to a time column, with apreference given to offline data

32/38

advantages of combining offline data and realtimedata

∙ Since there are two data sources for the same data, ifthere is an issue with one (eg. Kafka/Samza issue orHadoop cluster issue), the other one is used to answerqueries

∙ This means that you don’t get called in the middle ofthe night for data-related issues and there’s a largetime window for fixing issues

33/38

retention

retention

∙ Tables in Pinot can have a customizable retentionperiod

∙ Segments will be expunged automatically when theirlast timestamp is past the retention period

∙ This is done by a process called the retention manager

35/38

retention

∙ Offline and realtime tables have different retentionperiods. For example, “who viewed my profile?” has arealtime retention of seven days and an offlineretention period of 90 days.

∙ This means that even if the Hadoop job doesn’t run fora couple of days, data from the realtime flow willanswer the query

36/38

conclusion

conclusion

∙ Pinot is a realtime distributed analytical data store thatcan handle interactive analytical queries running onlarge amounts of data

∙ It’s used for various internal and external use-cases atLinkedIn

∙ It’s open source! (github.com/linkedin/pinot)∙ Ping me if you want to deploy it, I’ll help you out

38/38

Software

Intro to Pinot (2016-01-04)