Upload
others
View
24
Download
0
Embed Size (px)
Citation preview
How to use Akka to make a PERFECT
Streaming system钟翔@Intel
What kind of streaming system we are talking?
2
A B C D
E
Processor
Processor Processor Processor
ProcessorDAG
Shuffle
What is a perfect streaming system?
It should be…
High Throughput to Saturate network
Low latency to ms or us
Scale up to all CPU
Scale out to all machines
No message loss
No duplication
No single point of failure
No down time
Failure or
Upgrade
For
Flexible for different message source
• Any time
• Any where
• Any size
• Interconnect with other system
• Exactly-once
• At-least once
• At most once
Flexible for different message guarantee
Flexible for different business
• Internet company
• Telecom
• Finance
• IT Service
• Education
• Medical care
• Public sector
Easy to study
Easy to troubleshoot
Easy to monitor
One technologycan
meet all thesewith
Simplicity
Do you believe?
Akka make these unbelievable Simple
It is like our human society, driven by messageWhich can scale to 7 billion population!
What is Akka?
• Micro-service(Actor) oriented.
It is a NEW philosophy compare with OO
• Break your application into Micro services instead of object.
• Throw away locks
• Use Immutable Async message to exchange information instead of shared object.
Each micro-service only do ONE simple thing
and do it WELL
Then we can make a PERFECT
Big Data Streaming systemGearpump
What is Gearpump
• Akka based lightweight Real time data processing platform.• Apache License http://gearpump.io
• Akka: • Communication, concurrency, Isolation, and fault-tolerant
Simple and Powerful
Message level streamingLong running daemons
24
Have doubts for Akka?
Will Akka impact Performance?
SOL Shuffle test32 tasks->32 tasks4 nodes 10GbE32 core E52680
Akka is efficient in sending message!
• In single JVM, It can process 50 million message per second.
• In a distributed environment, with simple extension to Akka, Gearpump can process 11 million message/second
TaskTaskTask
Stable streaming with Flow Control
Pass back-pressure level-by-level
No need to worry OOM
27
TaskTaskTask TaskTaskTask TaskTaskTask
Back-pressure
Sliding window
Another option(not used): big-loop-feedback flow control
Will Akka introduce large latency?
• We implement a streaming system with
2ms Latency
• Test run on 100 nodes and 3000 tasks
• Gearpump performance scales out:
29
100 nodes
How it scale up and scale out?
• To scale up, I can start 1000 tasks on my laptop
What about HA?
Gearpump is born with HA
Use Actor Supervision tree for ONE application.Different application is isolated.
Master ClusterHA Design
Client
Hook in and query state
31
As general service
YARN
WorkerWorkerWorker
Master
standbyMaster
StandbyMaster
State
Gossip
Master HA – no SPOF
• Akka Cluster for a centerless HA system• Akka Distributed Data to share global state
CRDT Data type example:
Decentralized: No central meta server
leader
32
Recover from failure instantly
Failure scenarios Recovery time [*]
comment
Cluster Master node Down
0 s Master HA take effect
Message loss ~ 300 ms Still optimizingTarget will be less than10ms
Application AppMaster down ~ 10 seconds timeout detection take a log time
Test environment: 91 worker nodes, 1000 tasks (We use 7 machines to simulate 91 worker nodes)[*]: Recovery time is the time interval between: a) failure happen b) all tasks in topology resume processing data.33
91 worker nodes, 1000 tasks
What about handling message loss?
Use Application Clock to TrackMessage Loss
State
Minclock lowatermarkservice
Replayable Source DAG
Message withApplication Clock
35
Exactly-once with Checkpoint storems latency! No batching!
State
Minclock lowatermarkservice
Replayable Source DAG
Message withApplication Clock
36
Checkpoint Store
Exactly-once means error will NOTpropagate to future
Exactly-once support window operation
• Support Window based statistics
• Support Monoid, support application like:
– HyperLogLog: Unique visitor.
– Count-Min Sketch: page view
How flexible can this streaming system be?
• Location transparent, Compute Anywhere, from Any source with Backpressure.
log
Data Center
dag on device side
Connect with Akka-stream is in plan
How Flexible can this be?Scale out dynamically
• Change Parallelism dynamically in runtime
How Flexible can this be?Dynamic DAG
Delete
• Dynamic Attach • Dynamic RemoveAdd Sub Graph
Dynamic Replace
B
Each node can has its own jar, with zero interference with each other
Is this streaming system easy?
YES!
Three step to use it
1. Download binary from http://gearpump.io
2. Submit jar by UI
3. Monitor Status
DAG Graph API Example - WordCountval context = new ClientContext()
val split = Processor[Split](splitParallism)
val sum = Processor[Sum](sumParallism)
val app = StreamApplication("wordCount", Graph(split ~> sum), UserConfig.empty)
val appId = context.submit(app)
context.close()
class Split(taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = { /* split the line */ }
}
class Sum (taskContext : TaskContext, conf: UserConfig) extends Task(taskContext, conf) {
override def onNext(msg : Message) : Unit = {/* do aggregation on word*/}
}
44
DSL API Example - WordCount
val context = ClientContext()
val app = new StreamApp("dsl", context)
val data = "This is a good start, bingo!! bingo!!"
app.fromCollection(data.lines)
// word => (word, count = 1)
.flatMap(line => line.split("[\\s]+")).map((_, 1))
// (word, count1), (word, count2) => (word, count1 + count2)
.groupByKey().sum.log
val appId = context.submit(app)
context.close()
45
DAG Page
DAG Visualization
Track global min-Clock of all message DAG:
• Node size reflect throughput• Edge width represents flow rate• Red node means something goes wrong
46
DAG VisualizationProcessor Page
Skew analysis Task throughput and latency
Executor JVM deployment
47
Easy to trouble-shooting
• Supervision chain
• All errors are handled as normal message
• When An error happen, we know
– When
– Where
– Why
Master
AppMaster
Executor
Task
Failure
Failure
Failure
Demo
Summary
All Akka technology
Simple, total 25 K line of code.
Easy to use, easy to trouble-shooting
Super flexible
Powerful dashboard.
Web site: http://gearpump.io
Source code: http://github.com/gearpump
Team: 张天伦,王华峰,姜伟华,钟翔,徐骞
About US