18

Click here to load reader

Streaming in Big Data world & Buffer Server

Embed Size (px)

Citation preview

Page 1: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

BufferServerCommunication channel between Operators

Pradeep A. [email protected]

February 03, 2016

Page 2: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

Agenda➔ Quick recap - Apex Application

➔ BufferServer - Overview

➔ Operator Ports

➔ Stream Locality

➔ Partitioning

➔ BufferServer - Features

Page 3: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

Quick Recap - Apex Application➔ Operator

◆ Smallest logically independent building block

◆ Deployed in StramChild i.e. YARN Container

◆ YARN Container: Memory & CPU

➔ Stream◆ Logical connection between Operators

◆ Operators are deployed in Containers

◆ Containers are distributed across network

◆ Physical connections are established Operator 1

Operator2

StreamTuple 0, 1, 2 … n

Operator

Container

Page 4: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Application◆ Directed Acyclic Graph (DAG)

◆ Logical flow of Application

Quick Recap - Apex Application (Contd)

InputAdapter

OutputAdapter

ComputeOperator

1

ComputeOperator

2

Page 5: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Operator Ports

➔ Impact of Stream Locality

➔ Impact of Partitioning

BufferServer - Overview

Page 6: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Data/tuple entry & exit points◆ Data/tuple received on Input Port

◆ Data/tuple transmitted on Output Port

➔ Input Ports◆ process method

➔ Output Ports◆ emit method

Operator Ports

Input Operator Output

Output

Output

Page 7: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Thread Local

➔ Container Local

Stream Locality

Container

Operator1

Operator2

Operator1

Operator2

InlineStream

Container

Page 8: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Node Local

➔ Rack Local

Stream Locality (Contd)

Operator 1

Container 1

BufferServer Operator 2

Container 2

Node

Operator 1

Container 1

BufferServer

Node 1

Operator 2

Container 2

Node 2

Page 9: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Simple Partitioning (Downstream)

Partitioning

Operator 1

Container 1

Buffer Server

Operator 2

Partition1

Container 2

Operator2

Partition2

Container 3

Node

Page 10: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Simple Partitioning (Upstream)

Partitioning (Contd)

Operator 1

Partition1

Container 1

BufferServer

Node 1

Container 2

BufferServer

Operator 1

Partition2

Operator2

Container 3

Unifier 1

Page 11: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Complex Partitioning

Partitioning (Contd)

Operator 1

Partition1

Container 1

BufferServer 1

Node 1

Operator2

Partition2

Container 4

Node 2

Container 2

BufferServer 2

Operator 1

Partition2

Operator2

Partition1

Container 3

Unifier 1

Unifier 2

Operator2

Partition3

Container 5

Node 3

Unifier 3

Page 12: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Authentication

➔ Message Types

➔ Tuple Distribution Policies

➔ Disk Spooling

BufferServer - Features

Page 13: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ AuthToken

➔ AuthClient

BufferServer - Authentication

Container 1

BufferServerOperator

1

AuthToken Container 2

Operator1Stream

Authenticate

StreamContext

Page 14: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Data◆ Tuple, Object

➔ Request messages◆ PublishRequest, SubscribeRequest, PurgeRequest, ResetRequest

➔ Control Tuples◆ ResetWindow, BeginWindow (WindowId), EndWindow (WindowId), EndStream (WindowId)

➔ Checkpointing

◆ Checkpoint, CodecState

BufferServer - Message Types

Page 15: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ GiveAll

➔ Random

➔ Round Robin

➔ Least Busy

BufferServer - Tuple Distribution Policies

Page 16: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ BUFFER_SPOOLING

◆ Default: ON

➔ BUFFER_MEMORY_MB

◆ Default: 512MB

BufferServer - Disk Spooling

BufferServer

Disk

Page 17: Streaming in Big Data world & Buffer Server

Apache Apex Meetup

➔ Component Details◆ DataList & FastDataList

◆ Publisher & FastPublisher

◆ Subscriber & FastSubscriber

➔ Communication◆ Socket Connections

◆ Connection Identifiers

➔ Impact of Checkpointing

➔ Backpressure handling

BufferServer Deep Dive - In Next Session