19
Introduction to YARN and Apex as YARN Application Priyanka Gugale ([email protected]) September 30 th 2016

Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Embed Size (px)

Citation preview

Page 1: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Introduction to YARN and Apex as YARN Application

Priyanka Gugale ([email protected])September 30th 2016

Page 2: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Apache Apex - Stream ProcessingEasily Operable - Exposes an easy API for developing Operators (part of an

application) and Applications

Highly Scalable - Scales statically as well as dynamically

Highly Performant - Can reach single digit millisecond end-to-end latency

Fault Tolerant - Automatically recovers from failures - without manual intervention

Stateful - Guarantees that no state will be lost

Apex Malhar library

YARN - Native - Uses Hadoop YARN framework for resource negotiation

Page 3: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Apex Platform Overview

3

Page 4: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

An Apex Application is a DAG(Directed Acyclic Graph)

A DAG is composed of vertices (Operators) and edges (Streams).A Stream is a sequence of data tuples which connects operators at end-points called PortsAn Operator takes one or more input streams, performs computations & emits one or more output streams

● Each operator is USER’s business logic, or built-in operator from our open source library● Operator may have multiple instances that run in parallel

Page 5: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

DAG Components

• Tuple● Atomic data that flows over a stream

• Operator● Basic compute unit per tuple

• Stream● Connector abstraction between operators● Tuples flow over this

Operator1

Operator2

Streamtuple

3tuple

1tuple

2

Page 6: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

How Apex is Yarn Native?

Page 7: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Introducing YARN● YARN - Yet Another Resource Negotiator

● framework that facilitates writing arbitrary distributed processing frameworks and applications.

● YARN Applications/frameworks:e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.

Page 8: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Introducing YARNMap Reduce 1YARN

≈ 8Proprietary and Confidential

Job Tracker

Resource Manager

Application Master

Timeline Server

Task Tracker Node Manager

Map Slot

Reduce Slot

Page 9: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Hadoop beyond Batch

YARN for better resource utilization

More applications than MapReduce

Page 10: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

• Resource ManagerManages and allocates cluster resources

Application scheduling

Applications Manager

• Node Manager

Per-machine agent

Manages life-cycle of container

Monitors resources

• Application Master

Per-application

Manages application scheduling and task execution

Hadoop v2 (YARN) Architecture

App Master Cont

NodeManager

Cont Cont

NodeManager

App Master

AppMaster

NodeManager

ResourceManager

MapReduce StatusJob SubmissionNode StatusResource Request

Client

Client

Page 11: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Application Submission workflow

YarnClient

Node RM

(ApplicationsManager + Scheduler)

Node

NM

Node

NMApplication Master

ContainerContainer

1) Submit application

2) Launch application Master

RM = Resource ManagerNM = Node ManagerAM = Application Master = Heartbeats

3) AM registers with RM

4) AM negotiates for containers

5) Launch Container

5) Launch Container

Page 12: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Apex as YARN application

Node

ResourceManager(AsM + Scheduler)

NM Node NM Node NM

YarnClient

AppMaster

YarnContainer

YarnContainer

YarnContainerStrAM

(AppMaster)

YarnContainerStrAMChild

O1 O2

YarnContainerStrAMChild

O3

Apex cliStrAMClient

YarnClient

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

ContainerManagerProtocol

ContainerManagerProtocol

ClientRMProtocol

AMRMProtocol

ContainerManagerProtocol

Page 13: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Application Components of Apex - StrAMClient• Part of apex client interface• Invoked by “launch” command of apex

• Tasks:● Copy required the application package files into HDFS● Validate Logical Plan● Serialize Logical plan to HDFS● Launch Application Master i.e. StrAM

Apache Apex Meetup

Page 14: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Application Components of Apex - StrAM• Streaming Application Master• Started by StrAMClient on a YarnContainer• Tasks:

● Convert logical plan to physical plan● Serialize operators to HDFS● Request for resources to ResourceManager● Start StrAMChild in YarnContainer(s)● Monitor StrAMChild using ContainerManager protocol● Generate Application statistics● Host results on WebService (dtManage)● Checkpointing/Committing Application States● Fault Tolerance● Support Security● Shutdown Application

Apache Apex Meetup

Page 15: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Application Components of Apex - StrAMChild• Deployed on YarnContainer• Started by NodeManager as instructed by StrAM• Instance of StreamingContainer• Contains Operators (compute-related)• Contains BufferServer (stream-related)• Tasks:

● Regularly send heartbeat to StrAM● Execute commands from StrAM● Shutdown or Kill self if instructed● Manage lifecycle of an Operator● Network communication using BufferServer

Apache Apex Meetup

Page 16: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Apex as YARN application

Node

ResourceManager(AsM + Scheduler)

NM Node NM

StrAM(AppMaster)

YarnContainerStrAMChild

O1 O2

YarnContainerStrAMChild

O3

Apex cliStrAMClient

YarnClient

Apache Apex Meetup

ClientRMProtocol

AMRMProtocol

ContainerManagerProtocol

Page 17: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Summary – Apex platform• Enables YARN to be used for Streaming Applications

• Takes care of YARN specific work

• User can focus on business logic defined in Operators

Apache Apex Meetup

Page 18: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Q&A

18

Page 19: Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)

Resources

19

• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare:

http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+ape

x• Free Enterprise License for Startups -

https://www.datatorrent.com/product/startup-accelerator/