Timeline service V2 at the Hadoop Summit SJ 2016

(Big Data)2

How YARN Timeline Service v.2 Unlocks 360-Degree Pla@orm Insights at Scale

Sangjin Lee @sjlee (Twi5er) Li Lu (Hortonworks)

Vrushali Channapa5an @vrushalivc (Twi5er)

Outline• Why v.2?

• Highlights

• Developing for Timeline Service v.2

• SeIng up Timeline Service v.2

• Milestones

• Demo

Why v.2?

• YARN Timeline Service v 1.x

• Gained good adopSon: Tez, HIVE, Pig, etc.

• Keeps improving with v 1.5 APIs and storage implementaSon

• SSll facing some fundamental challenges...

Why v.2?• Scalability and reliability challenges

• Single instance of Timeline Server

• Storage (single local LevelDB instance)

• Usability

• Flow

• Metrics and configuraSon as first-class ciSzens

• Metrics aggregaSon up the enSty hierarchy

Highlightsv.1 v.2

Single writer/reader Timeline Server Distributed writer/collector architecture

Single local LevelDB storage* Scalable storage (HBase)

v.1 enSty model New v.2 enSty model

No aggregaSon Metrics aggregaSon

REST API Richer query REST API

Architecture• SeparaSon of writers (“collectors”) and readers

• Distributed collectors: one collector for each app

• Dedicated RM collector for RM-generated data

• Collector discovery via RM

• Pluggable storage with HBase as default storage

Distributed collectors & readers

!melinereader

!melinereader

Storage

!melinereader

AM !melinecollector

NM

!meline reader pool

app metrics/events

container events/metrics

RM

!meline collector

app/container events

user queries

(worker node running AM)

(worker node running containers)

write flowread flow

Collector discovery

RMAM

app id => address

! start AM container

NM

3melinecollector

" node heartbeat

# allocate response

worker node

3melineclient

New enSty model

• Flows and flow runs as parents of YARN applicaSon enSSes

• First-class configuraSon (key-value pairs)

• First-class metrics (single-value or Sme series)

• Designed to handle mulS-cluster environment out of the box

What is a flow?• A flow is a group of YARN

applicaSons that are launched as parts of a logical app

• Oozie, Scalding, Pig, etc.

• name: “frequent_visitor_stat”

• run id: 1466097809000

• version: “b9b9068”

ConfiguraSon and metrics

• Now explicit top-level a5ributes of enSSes

• Fine-grained updates and queries made possible

• “update metric A to value x”

• “query enMMes where config A = B”

container 1_1

metric: A = 10

metric: B = 100

config: "Foo" = "bar"

ConfiguraSon and metrics

• Now explicit top-level a5ributes of enSSes

• Fine-grained updates and queries made possible

• “update metric A to value x”

• “query enMMes where config A = B”

container 1_1

metric: A = 50

metric: B = 100

config: "Foo" = "bar"

HBase Storage• Scalable backend

• Row Key structure

• efficient range scans

• KeyPrefixRegionSplitPolicy

• Filter pushdown

• Coprocessors for flow aggregaSon (“readless” aggregaSon)

• Cell tags for metadata (applicaSon id, aggregaSon operaSon)

• Cell Smestamps generated during put

• lei shiied with app id added to avoid overwrites

Tables in HBase• flow run

• application

• entity

• flow activity

• app to flow

table: flow run

Row key:

clusterId!userName!flowName!inverted(flowRunId)

most recent flow run stored first

coprocessor enabled

table: applicaSonRow key:

clusterId!userName!flowName!inverted(flowRunId)!AppId

applicaSons within a flow run stored

together

most recent flow run stored first

table: enStyRow key:

userName!clusterId!flowName!inverted(flowRunId)!AppId!entityType!entityId

enSSes within an applicaSon within a flow run stored together per

type

• for example, all containers within a yarn applicaSon will be

stored together

pre-split table

stores information per entity run like info, relatesTo, relatedTo, events, metrics, config

table: flow acSvityRow key:

clusterId!inverted(TopOfTheDay)!userName!flowName

shows the flows that ran on that day

stores informaSon per flow like number of

runs, the run ids, versions

table: appToFlowRow key:

clusterId!appId

- stores mapping of appId to

flowName and flowRunId

Metrics aggregaSon• ApplicaSon level

• Rolls up sub-applicaSon metrics

• Performed in real Sme in the collectors in memory

• Flow run level

• Rolls up app level metrics

• Performed in HBase region servers via coprocessors

• Offline aggregaSon (TBD)

• Rolls up on user, queue, and flow offline periodically

• Phoenix tables

Container 1_1“bytes” : 23




App1“bytes”: 158

App2“bytes”: 50

App3“bytes”: 64

flow1“bytes”: 208

flow2“bytes”: 64

user1“bytes”: 272

queue1“bytes”: 272

App aggregationIn collector

flow aggregationIn hbase

offline aggregation

FlowRun Aggrega:on

via the HBase Coprocessor

App Metrics

Cells in

HBase

FlowRun Metric Sum

App Metrics

Cells in

HBase

FlowRun Metric Sum

FlowRun Aggrega:on

via the HBase Coprocessor

Reader REST API: paths• URLs under /ws/v2/Smeline

• Canonical REST style URLs: /ws/v2/Smeline/clusters/cluster_name/users/user_name/flows/flow_name/runs/run_id

• Path elements may be omi5ed if they can be inferred

• flow context can be inferred by app id

• default cluster is assumed if cluster is omi5ed

Reader REST API: query params• limit, createdTimeStart, createdTimeEnd: constrain the enSSes

• fields (ALL | EVENTS | INFO | CONFIGS | METRICS | RELATES_TO | IS_RELATED_TO): limit the contents to return

• metricsToRetrieve, confsToRetrieve: further limit the contents to return

• metricsLimit: limits the number of values in a Sme series

Reader REST API: query params

• relatesTo, isRelatedTo: filters by associaSon

• *Filters: filters by info, config, metric, event, …

• Supports complex filters including operators

• metricFilter=(((metric1 eq 50) AND (metric2 gt 40)) OR (metric1 lt 20))

Developing: TimelineClientIn your application master:

// create TimelineClient v.2 style TimelineClient client = TimelineClient.createTimelineClient(appId); client.init(conf); client.start();

// bind it to AM/RM client to receive the collector address amRMClient.registerTimelineClient(client);

// create and write timeline entities TimelineEntity entity = new TimelineEntity(); client.putEntities(entity);

// when the app is complete, stop the timeline client client.stop();

Developing: Flow contextIn your app submitter:

ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();

// set the flow context as YARN application tags Set<String> tags = new HashSet<>(); tags.add(TimelineUtils.generateFlowNameTag("distributed grep")); tags.add(TimelineUtils.generateFlowVersionTag( "3df8b0d6100530080d2e0decf9e528e57c42a90a")); tags.add(TimelineUtils.generateFlowRunIdTag(System.currentTimeMillis()));

appContext.setApplicationTags(tags);

SeIng up Timeline Service v.2• Set up the HBase cluster (1.1.x)

• Add the Smeline service jar to HBase

• Install the flow run coprocessor

• Create tables via TimelineSchemaCreator uSlity

• Configure the YARN cluster

• Enable Timeline Service v.2

• Add hbase-site.xml for the Smeline collector and readers

• Start the Smeline reader daemon

Milestone 1 ("Alpha 1")• Merge discussion (YARN-2928) in progress as we speak! ✓ Complete end-to-end read/write

flow

✓ Real Sme applicaSon and flow aggregaSon

✓ New enSty model

✓ HBase Storage

✓ Rich REST API

✓ IntegraSon with Distributed Shell and MapReduce

✓ YARN generic events and system metrics

Milestones - Future• Milestone 2 (“Alpha 2”)

• IntegraSon with new YARN UI

• IntegraSon with more frameworks

• Beta

• Freeze API and storage schema

• Security

• Collectors as containers

• Storage fault tolerance

• ProducSon-ready

• MigraSon-ready

Demo

Contributors• Li Lu, Junping Du, Vinod Kumar Vavilapalli (Hortonworks)

• Varun Saxena, Naganarasimha G. R. (Huawei)

• Sangjin Lee, Vrushali Channapa5an, Joep RoInghuis (Twi5er)

• Zhijie Shen (now at Facebook)

• The HBase and Phoenix community!

Thank you!

Data & Analytics

Timeline service V2 at the Hadoop Summit SJ 2016