A journey into stream processing with
Reactive Streamsand
Akka Streams
Before we get started...
http://scalaupnorth.com/
Scala Up North, September 25 & 26
• Keynote from Bill Venners
• BoldRadius offering Scala training
http://boldradius.com
What to expect
• Core concepts
• What is a stream?
• Common use cases?
• The Reactive Streams specification
• A deep-dive into Akka Streams
• Code walkthrough and demo
• Q&A
Disclaimer• I am not a stream processing expert, but I am passionately
curious about an alternate approach to common problems
• This is a deep topic, the contents of this talk are a starting point for further exploration
• Feel free to jump in
Core ConceptsPart 1 of 5
What is a stream?• Flow of data
• Events, commands, machine data, etc
• Live or at rest
• Bounded or unbounded in size
• Similar to an array laid out in time instead of memory
Appeal of stream processing?• Scaling business logic
• Processing real-time data (fast data)
• Batch processing of large data sets (big data)
• Monitoring, analytics, complex event processing, etc
Scaling business logic• Streams can be useful for modelling and breaking apart
monolithic apps that primarily transform data
• Async stream processing steps can be scaled individually
Processing real-time data
• Ephemeral
• Unbounded in size
• Potential "flooding" downstream
You cannot step twice into the same stream. For as you are stepping in, other waters are ever flowing on to you. — Heraclitus
Push vs pull
Pull1. Consumer calls producer
2. Consumer blocks
3. Producer sends data when available
Works best when producer is faster than consumer
Push1. Producer sends data to consumer
Works best when producer is slower than the consumer
Backpressure
Backpressure?• We need a way to signal when a consumer is able to
process more data
• Propogate backpressure through the entire flow
• Without backpressure data keeps flowing at full speed
• Leads to OOM errors, crashes, etc
Consumer usually has some kind of buffer.
Fast producers can overwhelm the buffer of a slow consumer.
Option 1: Use bounded buffer and drop messages.
Option 2: Increase buffer size if memory available.
Option 3: Pull-based backpressure.
Reactive StreamsPart 2 of 5
Reactive StreamsReactive Streams is a specification and low-level API for library developers.
Compliant RS implementations include the following:
• RxJava (Netflix)
• Reactor (Pivotal)
• Vert.x (RedHat)
• Akka Streams and Slick (Typesafe)
Three main repositories• Reactive Streams for the JVM
• Reactive Streams for JavaScript
• Reactive Streams IO (for network protocols such as TCP, WebSockets and possibly HTTP/2)
• Early exploration kicked off by Netflix
• 2016 timeframe
Reactive Streams JVM API specOnly for library builders, not for direct usage.public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}
public interface Publisher<T> { public void subscribe(Subscriber<? super T> s);}
public interface Subscriber<T> { public void onSubscribe(Subscription s); public void onNext(T t); public void onError(Throwable t); public void onComplete();}
public interface Subscription { public void request(long n); public void cancel();}
Faster publisher responsibilities?• Not generate elements, if it is able to control their
production rate
• Try buffering the elements in a bounded manner until more demand is signalled
• Drop elements until more demand is signalled
• Tear down the stream if unable to apply any of the above strategies
Reactive StreamsVisit the Reactive Streams website for more information.
http://www.reactive-streams.org/
Details:• TCK (Technology Compatibility Kit)
• API (JVM, JavaScript)
• Specifications
• Early conversation on future spec for IO
Akka StreamsPart 3 of 5
Akka StreamsAkka Streams provides a way to express and run a chain of asynchronous processing steps acting on a sequence of elements.
• DSL for async/non-blocking stream processing
• With "free" backpressure
• Conforms to the Reactive Streams spec for compatibility
Basics• Source - A processing stage with exactly one output
• Sink - A processing stage with exactly one input
• Flow - A processing stage which has exactly one input and output
• RunnableFlow - A Flow that has both ends "attached" to a Source and Sink
API designGoals
• Supremely composable
• Exhaustive model, everything you need for stream processing including error handling
API designConsiderations
• Immutable, reuseable stream blueprints
• Explicit materialization step
• No magic at the expense of some extra code
Materialization• Separate the what from the how
• Declarative Source/Flow/Sink to create a blueprint
• FlowMaterializer turns blueprint into actors
• Involves an extra step, but no magic
Error handling• The element causing division by zero will be dropped
• Result will be a Future completed with Success(228)val decider: Supervision.Decider = exc => exc match { case _: ArithmeticException => Supervision.Resume case _ => Supervision.Stop}// ActorFlowMaterializer takes the list of transformations comprising a akka.stream.scaladsl.Flow // and materializes them in the form of org.reactivestreams.Processorimplicit val mat = ActorFlowMaterializer( ActorFlowMaterializerSettings(system).withSupervisionStrategy(decider))val source = Source(0 to 5).map(100 / _)val result = source.runWith(Sink.fold(0)(_ + _))
Dynamic push/pull backpressure• Fast consumer can issue more Request(n) even before more
data arrives
• Producer can accumulate demand
• Total demand of elements is safe to publish
• Consumer's buffer will never overflow
• Default is push-based until consumer cannot cope
Fan out• Broadcast[T] (1 input, n outputs)
• Signals each output given an input signal
• Balance[T] (1 input => n outputs)
• Signals one of its output ports given an input signal
• FlexiRoute[In] (1 input, n outputs)
• Write custom fan out elements using a simple DSL
Fan in• Merge[In] (n inputs , 1 output)
• Picks signals randomly from inputs
• Zip[A,B,Out] (2 inputs, 1 output)
• Zipping into an (A,B) tuple stream
• Concat[T] (2 inputs, 1 output)
• Concatenate streams (first, then second)
val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge}
conflateabstract def conflate[S](seed: (T) ⇒ S, aggregate: (S, T) ⇒ S): Flow[S]
Allows a faster upstream to progress independently of a slower consumer by conflating elements into a summary until the consumer is ready to accept them.
groupedWithinabstract def groupedWithin(n: Int, d: FiniteDuration): Flow[Seq[T]]
Chunk up this stream into groups of elements received within a time window, or limited by the given number of elements, whatever happens first.
Simple streaming from/to Kafkaimplicit val actorSystem = ActorSystem("ReactiveKafka")implicit val materializer = ActorMaterializer()
val kafka = new ReactiveKafka(host = "localhost:9092", zooKeeperHost = "localhost:2181")val publisher = kafka.consume("lowercaseStrings", "groupName", new StringDecoder())val subscriber = kafka.publish("uppercaseStrings", "groupName", new StringEncoder())
// consume lowercase strings from kafka and publish them transformed to uppercaseSource(publisher).map(_.toUpperCase).to(Sink(subscriber)).run()
Akka Streams versus other streams
Part 4 of 5
Akka Streams• Distributed and fault-tolerant
• Sensitive to bidirectional pressure
• Easy to program complex processing flow graphs
Java Streams• Iterators with a weaker but more parallelism-friendly
interface
• Only high-level control (no next/hasNext)
• Transformation, not distribution
• Push or pull chosen statically
RxJava• Pure push model
• Extensive DSL for transformations
• Only allows blocking backpressure
• Unbounded buffering across async boundary
Code review and demoPart 5 of 5
Source code available at https://github.com/rocketpages
Thank you!