Upload
jean-francois-im
View
3.549
Download
0
Embed Size (px)
Citation preview
an introduction to pinot
Jean-François Im <[email protected]>2016-01-04 Tue
outline
Introduction
When to use Pinot?
An overview of the Pinot architecture
Managing Data in Pinot
Data storage
Realtime data in Pinot
Retention
Conclusion
2/38
introduction
what is pinot?
∙ Distributed near-realtime OLAP datastore∙ Used at LinkedIn for various user-facing (“Who viewedmy profile,” publisher analytics, etc.), client-facing (adcampaign creation and tracking) and internal analytics(XLNT, EasyBI, Raptor, etc.)
4/38
what is pinot
∙ Offers a SQL query interface on top of a custom-writtendata store
∙ Offers near-realtime ingestion of events from Kafka (afew seconds latency at most)
∙ Supports pushing data from Hadoop∙ Can combine data from Hadoop and Kafka at runtime∙ Scales horizontally and linearly if data size or queryrate increases
∙ Fault tolerant (any component can fail without causingavailability issues, no single point of failure)
∙ Automatic data expiration5/38
example of queries
SELECTweeksSinceEpochSunday,distinctCount(viewerId)
FROM mirrorProfileViewEventsWHERE vieweeId = ... AND(viewerPrivacySetting = ’F’ OR
... OR viewerPrivacySetting = ’’) ANDdaysSinceEpoch >= 16624 ANDdaysSinceEpoch <= 16714
GROUP BY weeksSinceEpochSundayTOP 20 LIMIT 0
6/38
example of queries
7/38
how does “who viewed my profile” work?
8/38
usage of pinot at linkedin
∙ Over 50 use cases at LinkedIn∙ Several thousands of queries per second acrossmultiple data centers
∙ Operates 24x7, exposes metrics for productionmonitoring
∙ The internal de facto solution for scalable dataquerying
9/38
when to use pinot?
design limitations
∙ Pinot is designed for analytical workloads (OLAP), nottransactional ones (OLTP)
∙ Data in Pinot is immutable (eg. no UPDATE statement),though it can be overwritten in bulk
∙ Realtime data is append-only (can only load new rows)∙ There is no support for JOINs or subselects∙ There are no UDFs for aggregation (work in progress)
11/38
when to use pinot?
∙ When you have an analytics problem (How many of “x”happened?)
∙ When you have many queries per day and require lowquery latency (otherwise use Hadoop for one-time adhoc queries)
∙ When you can’t pre-aggregate data to be stored insome other storage system (otherwise use Voldemortor an OLAP cubing solution)
12/38
an overview of the pinotarchitecture
controller, broker and server
∙ There are three components in Pinot: Controller, brokerand server
∙ Controller: Handles cluster-wide coordination usingApache Helix and Apache Zookeeper
∙ Broker: Handles query fan out and query routing toservers
∙ Server: Responds to query requests originating fromthe brokers
14/38
controller, broker and server
15/38
controller, broker and server
∙ All of these components are redundant, so there is nosingle point of failure by design
∙ Uses Zookeeper as a coordination mechanism
16/38
managing data in pinot
getting data into pinot
∙ Let’s first look at the offline case. We have data inHadoop that we would like to get into Pinot.
18/38
getting data into pinot
∙ Data in pinot is packaged into segments, which containa set of rows
∙ These are then uploaded into Pinot
19/38
getting data into pinot
∙ A segment is a pre-built index over this set of rows∙ Data in Pinot is stored in columnar format (we’ll get tothis later)
∙ Each input Avro file maps to one Pinot segment
20/38
getting data into pinot
∙ Each segment file that is generated contains both theminimum and maximum timestamp contained in thedata
∙ Each segment file also has a sequential numberappended to the end
∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_0∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_1∙ mirrorProfileViewEvents_2015-10-04_2015-10-04_2
21/38
getting data into pinot
∙ Data uploaded into Pinot is stored on a segment basis∙ Uploading a segment with the same name overwritesthe data that currently exists in that segment
∙ This is the only way to update data in Pinot
22/38
data storage
data orientation: rows and columns
∙ Most OLTP databases store data in a row-orientedformat
∙ Pinot stores its data in a column-oriented format∙ If you have heard the terms array of structures (AoS)and structure of arrays (SoA), this is the same idea
24/38
data orientation: rows and columns
25/38
benefits of column-orientation
∙ Queries only read the data they need (columns notused in a query are not read)
∙ Individual row lookups are slower, aggregations arefaster
∙ Compression can be a lot more effective, as relateddata is packed together
26/38
a couple of tricks
∙ Pinot uses a couple of techniques to reduce data size∙ Dictionary encoding allows us to deduplicate repetitivedata in a single column (eg. country, state, gender)
∙ Bit packing allows us to pack multiple values in thesame byte/word/dword
27/38
realtime data in pinot
tables: offline and realtime
∙ Pinot has two kinds of tables: offline and realtime∙ An offline table stores data that has been pushed fromHadoop, while a realtime sources its data from Kafka
∙ These two tables are disjoint and can contain the samedata
29/38
data ingestion
∙ Realtime data ingestion is done through Kafka∙ In the open source release, there is a JSON decoderand an Avro decoder for messages
∙ This architecture allows plugging in new data ingestionsources (eg. other message queuing systems), thoughat this time there are no other sources implemented
30/38
hybrid querying
∙ Since realtime and offline tables are disjoint, how arethey queried?
∙ If an offline and realtime table have the same name,when a broker receives a query, it rewrites it to twoqueries, one for the offline and one for the realtimetable
31/38
hybrid querying
∙ Data is partitioned according to a time column, with apreference given to offline data
32/38
advantages of combining offline data and realtimedata
∙ Since there are two data sources for the same data, ifthere is an issue with one (eg. Kafka/Samza issue orHadoop cluster issue), the other one is used to answerqueries
∙ This means that you don’t get called in the middle ofthe night for data-related issues and there’s a largetime window for fixing issues
33/38
retention
retention
∙ Tables in Pinot can have a customizable retentionperiod
∙ Segments will be expunged automatically when theirlast timestamp is past the retention period
∙ This is done by a process called the retention manager
35/38
retention
∙ Offline and realtime tables have different retentionperiods. For example, “who viewed my profile?” has arealtime retention of seven days and an offlineretention period of 90 days.
∙ This means that even if the Hadoop job doesn’t run fora couple of days, data from the realtime flow willanswer the query
36/38
conclusion
conclusion
∙ Pinot is a realtime distributed analytical data store thatcan handle interactive analytical queries running onlarge amounts of data
∙ It’s used for various internal and external use-cases atLinkedIn
∙ It’s open source! (github.com/linkedin/pinot)∙ Ping me if you want to deploy it, I’ll help you out
38/38