Apache Apex Introduction with PubMatic

Preview:

Citation preview

Apache ApexArchitecture

2

Apex Platform Overview

3

Apache Malhar Library

4

Native Hadoop Integration

• YARN is the resource manager

• HDFS used for storing any persistent state

5

Application Programming Model

Directed Acyclic Graph (DAG)

A Stream is a sequence of data tuplesAn Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library

• Operator has many instances that run in parallel and each instance in single-threaded

Directed Acyclic Graph (DAG) is made up of operations and streams

Filtered

Stream

Output Stream

Tuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

6

Application Specification

Apex Engine

Core Features

8

Partitioning and Scaling Out

• Operators can be dynamically scaled• Flexible Streams split• Parallel partitioning• MxN partitioning • Unifiers

9

Advanced Windowing Support

Application window Sliding window and tumbling window Checkpoint window No artificial latency

10

Stateful Fault Tolerance Supported out of the box

– Application state– Application master state– No data loss

Automatic recovery Lunch test Buffer server

11

Processing Semantics At least once At most once Exactly once

12

Data Locality Stream locality for placement of operators

– Rack local – Distributed deployment– Node local – Data does not traverse NIC– Container local – Data doesn’t need to be serialized– Thread local – Operators run in same thread

Data locality

13

Dynamic Updates

Dynamic topology updates– Properties of operators can be changed– New operators can be added

14

ResourcesApache Apex Community Page

Apache Apex LinkedIn Group

Recommended