View
4
Download
0
Category
Preview:
Citation preview
Apache Apex Meetup
BufferServerCommunication channel between Operators
Pradeep A. Dalviprad@apache.org
February 03, 2016
Apache Apex Meetup
Agenda➔ Quick recap - Apex Application
➔ BufferServer - Overview
➔ Operator Ports
➔ Stream Locality
➔ Partitioning
➔ BufferServer - Features
Apache Apex Meetup
Quick Recap - Apex Application➔ Operator
◆ Smallest logically independent building block
◆ Deployed in StramChild i.e. YARN Container
◆ YARN Container: Memory & CPU
➔ Stream◆ Logical connection between Operators
◆ Operators are deployed in Containers
◆ Containers are distributed across network
◆ Physical connections are established Operator 1
Operator2
StreamTuple 0, 1, 2 … n
Operator
Container
Apache Apex Meetup
➔ Application◆ Directed Acyclic Graph (DAG)
◆ Logical flow of Application
Quick Recap - Apex Application (Contd)
InputAdapter
OutputAdapter
ComputeOperator
1
ComputeOperator
2
Apache Apex Meetup
➔ Operator Ports
➔ Impact of Stream Locality
➔ Impact of Partitioning
BufferServer - Overview
Apache Apex Meetup
➔ Data/tuple entry & exit points◆ Data/tuple received on Input Port
◆ Data/tuple transmitted on Output Port
➔ Input Ports◆ process method
➔ Output Ports◆ emit method
Operator Ports
Input Operator Output
Output
Output
Apache Apex Meetup
➔ Thread Local
➔ Container Local
Stream Locality
Container
Operator1
Operator2
Operator1
Operator2
InlineStream
Container
Apache Apex Meetup
➔ Node Local
➔ Rack Local
Stream Locality (Contd)
Operator 1
Container 1
BufferServer Operator 2
Container 2
Node
Operator 1
Container 1
BufferServer
Node 1
Operator 2
Container 2
Node 2
Apache Apex Meetup
➔ Simple Partitioning (Downstream)
Partitioning
Operator 1
Container 1
Buffer Server
Operator 2
Partition1
Container 2
Operator2
Partition2
Container 3
Node
Apache Apex Meetup
➔ Simple Partitioning (Upstream)
Partitioning (Contd)
Operator 1
Partition1
Container 1
BufferServer
Node 1
Container 2
BufferServer
Operator 1
Partition2
Operator2
Container 3
Unifier 1
Apache Apex Meetup
➔ Complex Partitioning
Partitioning (Contd)
Operator 1
Partition1
Container 1
BufferServer 1
Node 1
Operator2
Partition2
Container 4
Node 2
Container 2
BufferServer 2
Operator 1
Partition2
Operator2
Partition1
Container 3
Unifier 1
Unifier 2
Operator2
Partition3
Container 5
Node 3
Unifier 3
Apache Apex Meetup
➔ Authentication
➔ Message Types
➔ Tuple Distribution Policies
➔ Disk Spooling
BufferServer - Features
Apache Apex Meetup
➔ AuthToken
➔ AuthClient
BufferServer - Authentication
Container 1
BufferServerOperator
1
AuthToken Container 2
Operator1Stream
Authenticate
StreamContext
Apache Apex Meetup
➔ Data◆ Tuple, Object
➔ Request messages◆ PublishRequest, SubscribeRequest, PurgeRequest, ResetRequest
➔ Control Tuples◆ ResetWindow, BeginWindow (WindowId), EndWindow (WindowId), EndStream (WindowId)
➔ Checkpointing
◆ Checkpoint, CodecState
BufferServer - Message Types
Apache Apex Meetup
➔ GiveAll
➔ Random
➔ Round Robin
➔ Least Busy
BufferServer - Tuple Distribution Policies
Apache Apex Meetup
➔ BUFFER_SPOOLING
◆ Default: ON
➔ BUFFER_MEMORY_MB
◆ Default: 512MB
BufferServer - Disk Spooling
BufferServer
Disk
Apache Apex Meetup
➔ Component Details◆ DataList & FastDataList
◆ Publisher & FastPublisher
◆ Subscriber & FastSubscriber
➔ Communication◆ Socket Connections
◆ Connection Identifiers
➔ Impact of Checkpointing
➔ Backpressure handling
BufferServer Deep Dive - In Next Session
Apache Apex Meetup
● Apache Apex Page○ http://apex.incubator.apache.org
● Mailing Lists○ dev@apex.incubator.apache.org
○ users@apex.incubator.apache.org
● Repository○ https://github.com/apache/incubator-apex-core
○ https://github.com/apache/incubator-apex-malhar
● Issue Tracking○ https://issues.apache.org/jira/browse/APEXCORE
○ https://issues.apache.org/jira/browse/APEXMALHAR
Resources● @ApacheApex
● /groups/7020520
Recommended