Upload
martin-zapletal
View
2.423
Download
4
Embed Size (px)
Citation preview
MANCHESTER LONDON NEW YORK
Martin Zapletal @zapletal_martin#ScalaDays
Data in Motion: Streaming Static Data Efficientlyin Akka Persistence (and elsewhere)
@cakesolutions
Databases
Batch processing
Data at scale
● Reactive● Real time, asynchronous and message driven● Elastic and scalable● Resilient and fault tolerant
Streams
Streaming static data
● Turning database into a stream
Pulling data from source
0 0
5 5
10 10
0 0
0 0
5 5
10 10
5 5
0 0
5 5
10 100 0
10 10
0
5 5
10 105 5 0 0
0
10 10
0 0
5 5
10 10
5 5 0 01 1
Inserts
10 10
0 0
5 55
10 105 5 0 0
Updates
Pushing data from source● Change log, change data capture
0 0
5 5
10 10
0 0
5 5
10 10
1 1
110 0
5 5
10 10
1 1
Infinite streams of finite data source● Consistent snapshot and change log
0 05 510 10
0 0
5 510 10
1 10 0
5 510 10
1 1
0
1
2
3
4
0
5
10
1
5
Inserted value 0
Inserted value 5
Inserted value 10
Inserted value 1
Inserted value 55
Log data structure
Pulling data from a log
10 10 5 5 0 0
0 0
105 5
10
10 10 5 5 0 0
0 0
1015 15
5 510
0 0
15 15
5 5 15 15 10 10 5 5 0 010 10
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
235
Akka Persistence
1 4
Akka Persistence Query● eventsByPersistenceId, allPersistenceIds, eventsByTag
1 4 235
persistence_id1, event 2
persistence_id1, event 3
persistence_id1, event 4
persistence_id1, event 1
Persistence_ id partition_nr
0 00 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
Akka Persistence Query Cassandra● Purely pull● Event (log) data
Actor publisherprivate[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] {
protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } }}
}
private[query] abstract class QueryActorPublisher[MessageType, State: ClassTag](refreshInterval: Option[FiniteDuration]) extends ActorPublisher[MessageType] {
protected def initialState: Future[State] protected def initialQuery(initialState: State): Future[Action] protected def requestNext(state: State, resultSet: ResultSet): Future[Action] protected def requestNextFinished(state: State, resultSet: ResultSet): Future[Action] protected def updateState(state: State, row: Row): (Option[MessageType], State) protected def completionCondition(state: State): Boolean
private[this] def nextBehavior(...): Receive = { if (shouldFetchMore(...)) { listenableFutureToFuture(resultSet.fetchMoreResults()).map(FetchedResultSet).pipeTo(self) awaiting(resultSet, state, finished) } else if (shouldIdle(...)) { idle(resultSet, state, finished) } else if (shouldComplete(...)) { onCompleteThenStop() Actor.emptyBehavior } else if (shouldRequestMore(...)) { if (finished) requestNextFinished(state, resultSet).pipeTo(self) else requestNext(state, resultSet).pipeTo(self) awaiting(resultSet, state, finished) } else { idle(resultSet, state, finished) } }}
}
initialQuery
Cancel
initialFinished
shouldFetchMore
shouldIdle
shouldTerminate
shouldRequestMore
SubscriptionTimeout
Cancel
SubscriptionTimeout
initialNewResultSet
request newResultSet
fetchedResultSet
finished
Cancel
SubscriptionTimeout
requestcontinue
Red transitionsdeliver buffer and update internal state (progress)
Blue transitions asynchronous database query
SELECT * FROM ${tableName} WHERE persistence_id = ? AND partition_nr = ? AND sequence_nr >= ? AND sequence_nr <= ?
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
Events by persistence id
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 1
event 100 event 101 event 102
event 2event 0
0 0
0 1
event 0 event 1
event 100 event 101 event 102
event 2
private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) }
override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1
(Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) }}
private[query] class EventsByPersistenceIdPublisher(...) extends QueryActorPublisher[PersistentRepr, EventsByPersistenceIdState](...) { override protected def initialState: Future[EventsByPersistenceIdState] = { ... EventsByPersistenceIdState(initialFromSequenceNr, 0, currentPnr) }
override protected def updateState( state: EventsByPersistenceIdState, Row: Row): (Option[PersistentRepr], EventsByPersistenceIdState) = { val event = extractEvent(row) val partitionNr = row.getLong("partition_nr") + 1
(Some(event), EventsByPersistenceIdState(event.sequenceNr + 1, state.count + 1, partitionNr)) }}
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
All persistence idsSELECT DISTINCT persistence_id, partition_nr FROM $tableName
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
0 0
0 1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
0
0
0
1
event 1
event 100 event 101 event 102
event 0 event 2
1 0 event 0 event 1 event 2
private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } }}
private[query] class AllPersistenceIdsPublisher(...) extends QueryActorPublisher[String, AllPersistenceIdsState](...) {
override protected def initialState: Future[AllPersistenceIdsState] = Future.successful(AllPersistenceIdsState(Set.empty))
override protected def updateState( state: AllPersistenceIdsState, row: Row): (Option[String], AllPersistenceIdsState) = {
val event = row.getString("persistence_id")
if (state.knownPersistenceIds.contains(event)) { (None, state) } else { (Some(event), state.copy(knownPersistenceIds = state.knownPersistenceIds + event)) } }}
Events by tag
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0 event 0 event 1
event 0
event 2,tag 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 1event 0 event 2,tag 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
event 0
event 0
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0 event 1 event 2,tag 1
event 0
event 0 event 1
0 0
0 1event 100,tag 1
event 101 event 102
event 2,tag 1
1 0event 2,tag 1
event 1,tag 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 2,tag 1
1 0event 2,tag 1
event 0
event 0 event 1
event 1,tag 1
event 1,tag 1
event 2,tag 1
event 0
event 0 event 1
event 1,tag 10 0
0 1event 100,tag 1
event 101 event 102
1 0event 2,tag 1
event 2,tag 1
event 0
event 0 event 1
0 0
0 1event 100,tag 1
event 101 event 102
1 0
event 2,tag 1
event 1,tag 1
0 0
0 1
1 0event 2,tag 1
event 0
event 0 event 1
event 100,tag 1
event 101 event 102
event 2,tag 1
event 1,tag 1
Events by tag
Id 0, event 1
Id 1,event 2
Id 0, event 100
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
Id 0, event 2
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
SELECT * FROM $eventsByTagViewName$tagId WHERE tag$tagId = ? AND timebucket = ? AND timestamp > ? AND timestamp <= ? ORDER BY timestamp ASC LIMIT ?
Id 1,event 2
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
Id 1,event 2
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
Id 0, event 100
Id 1,event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
Id 0, event 2
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
Id 0, event 100
Id 1,event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
event 2,tag 1
Id 0, event 2
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0 event 2,tag 1
1 0 event 0 event 1 event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
tag 1 1/1/2016
tag 1 1/2/2016
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 11 . . .
event 2,tag 1
Id 0, event 100
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 ?1 . . .
event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
Id 0, event 100
Id 0, event 2
Id 0, event 1
0 0
0 1
event 1,tag 1
event 100,tag 1
event 101 event 102
event 0
1 0 event 0 event 1 event 2,tag 1
persistence_id
seq
0 ?1
event 2,tag 1
tag 1 1/1/2016
tag 1 1/2/2016
. . .
seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop
case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) }}
seqNumbers match { case None => replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case Some(s) => s.isNext(pid, seqNr) match { case SequenceNumbers.Yes | SequenceNumbers.PossiblyFirst => seqNumbers = Some(s.updated(pid, seqNr)) replyTo ! UUIDPersistentRepr(offs, toPersistentRepr(row, pid, seqNr)) loop(n - 1)
case SequenceNumbers.After => replyTo ! ReplayAborted(seqNumbers, pid, s.get(pid) + 1, seqNr) // end loop
case SequenceNumbers.Before => // duplicate, discard if (!backtracking) log.debug(s"Discarding duplicate. Got sequence number [$seqNr] for [$pid], " + s"but current sequence number is [${s.get(pid)}]") loop(n - 1) }}
def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit))}
def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop}
def replay(): Unit = { val backtracking = isBacktracking val limit = if (backtracking) maxBufferSize else maxBufferSize - buf.size val toOffs = if (backtracking && abortDeadline.isEmpty) highestOffset else UUIDs.endOf(System.currentTimeMillis() - eventualConsistencyDelayMillis) context.actorOf(EventsByTagFetcher.props(tag, currTimeBucket, currOffset, toOffs, limit, backtracking, self, session, preparedSelect, seqNumbers, settings)) context.become(replaying(limit))}
def replaying(limit: Int): Receive = { case env @ UUIDPersistentRepr(offs, _) => // Deliver buffer case ReplayDone(count, seqN, highest) => // Request more case ReplayAborted(seqN, pid, expectedSeqNr, gotSeqNr) => // Causality violation, wait and retry. Only applicable if all events for persistence_id are tagged case ReplayFailed(cause) => // Failure case _: Request => // Deliver buffer case Continue => // Do nothing case Cancel => // Stop}
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
Akka Persistence Cassandra Replaydef asyncReplayMessages(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) (replayCallback: (PersistentRepr) => Unit): Future[Unit] = Future { new MessageIterator(persistenceId, fromSequenceNr, toSequenceNr, max).foreach(msg => { replayCallback(msg) }) }
class MessageIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long) extends Iterator[PersistentRepr] { private val initialFromSequenceNr = math.max(highestDeletedSequenceNumber(persistenceId) + 1, fromSequenceNr) private val iter = new RowIterator(persistenceId, initialFromSequenceNr, toSequenceNr) private var mcnt = 0L private var c: PersistentRepr = null private var n: PersistentRepr = PersistentRepr(Undefined) fetch() def hasNext: Boolean = ... def next(): PersistentRepr = … ...}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
class RowIterator(persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long) extends Iterator[Row] { var currentPnr = partitionNr(fromSequenceNr) var currentSnr = fromSequenceNr var fromSnr = fromSequenceNr var toSnr = toSequenceNr var iter = newIter()
def newIter() = session.execute(preparedSelectMessages.bind(persistenceId, currentPnr, fromSnr, toSnr)).iterator
final def hasNext: Boolean = { if (iter.hasNext) true else if (!inUse) false } else { currentPnr += 1 fromSnr = currentSnr iter = newIter() hasNext } }
def next(): Row = { val row = iter.next() currentSnr = row.getLong("sequence_nr") row }}
Non blocking asynchronous replayprivate[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
private[this] val queries: CassandraReadJournal = new CassandraReadJournal( extendedActorSystem, context.system.settings.config.getConfig("cassandra-query-journal"))
override def asyncReplayMessages( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long, max: Long)(replayCallback: (PersistentRepr) => Unit): Future[Unit] = queries .eventsByPersistenceId( persistenceId, fromSequenceNr, toSequenceNr, max, replayMaxResultSize, None, "asyncReplayMessages") .runForeach(replayCallback) .map(_ => ())
Benchmarks
500010 00015 00020 00025 00030 00035 00040 000
500010 00015 00020 00025 00030 00035 00040 000
0 0
10 00020 00030 00040 000
0
50 000
Time
(s)
Time
(s)
Time
(s)
Actors
Threads, Actors
Threads 20 40 60 80 100 120 1405000 10000 15000 20000 25000 30000
10 20 30 40 50 60 70
45 00050 000
blockingasynchronous
REPLAY STRONG SCALING
WEAK SCALING
node_id
Alternative architecture
0
1
persistence_id 0, event 0
persistence_id 0, event 1
persistence_id 1, event 0
persistence_id 0, event 2
persistence_id 2, event 0
persistence_id 0, event 3
persistence_id 0, event 0
persistence_id 0, event 1
persistence_id 1, event 0
persistence_id 2, event 0
persistence_id 0, event 2
persistence_id 0, event 3
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 1event o
node_id
0
1
Id 0, event 0
Id 0, event 1
Id 1, event 0
Id 0, event 2
Id 2, event 0
Id 0, event 3
Id 0, event 0
Id 0, event 1
Id 1, event 0
Id 2, event 0
Id 0, event 2
Id 0, event 3 tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch))}
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
val boundStatements = statementGroup(eventsByPersistenceId, eventsByTag, allPersistenceIds)
Future.sequence(boundStatements).flatMap { stmts => val batch = new BatchStatement().setConsistencyLevel(...).setRetryPolicy(...) stmts.foreach(batch.add) session.underlying().flatMap(_.executeAsync(batch))}
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)...session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ...}
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
val eventsByPersistenceIdStatement = statementGroup(eventsByPersistenceIdStatement)val boundStatements = statementGroup(eventsByTagStatement, allPersistenceIdsStatement)...session.underlying().flatMap { s => val ebpResult = s.executeAsync(eventsByPersistenceIdStatement) val batchResult = s.executeAsync(batch)) ...}
tag 1 0
allIds
Id 0, event 1
Id 2, event 1
0 1
0 0 event 0 event 1
Event time processing● Ingestion time, processing time, event time
Ordering
10 2
1 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
0
1
21 12:34:57 1
KEY TIME VALUE
2 12:34:58 2
KEY TIME VALUE
0 12:34:56 0
KEY TIME VALUE
Distributed causal stream merging
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
node_id
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
node_id
persistence_id
seq
0 0
1 . . .
2 . . .
persistence_id
seq
0 1
1 . . .
2 . . .
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 0
node_id
0
1Id 2,event 0
Id 0,event 0
Id 0,event 1
Id 0,event 3
persistence_id
seq
0 2
1 0
2 0Id 0,event 1
Id 0,event 0
Id 1,event 0
node_id
0
1Id 2,event 0
Id 0,event 0
Id 0,event 1
Id 0,event 2
Id 0,event 3
Id 2,event 0
Id 0,event 2
Id 1,event 0
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
node_id
Id 1,event 0
persistence_id
seq
0 3
1 0
2 0
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
node_id
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
Replay
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
node_id
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
node_id
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
1 Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
node_id
persistence_id
seq
0 2
Id 0,event 2
Id 0,event 1
Id 0,event 0
Id 1,event 00
Id 2,event 0
Id 0,event 3
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 1,event 0 0 0 Id 0,
event 0Id 0,event 1
persistence_id
seq
0 2
stream_id seq
0 1
1 2
1
node_id
Exactly once delivery
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 2
Id 0,event 3
Id 1,event 0
Id 0,event 0
Id 0,event 1
Id 2,event 0
Id 0,event 3
Id 1,event 0
ACK ACK ACK ACK ACK
Checkpoint data
StateBackend
Source 1: 6791Source 2: 7252Source 3: 5589Source 4: 6843
State 1: ptr 1State 1: ptr 2Sink 2: ack!Sink 2: ack!
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit}
class KafkaSource(private var offsetManagers: Map[TopicAndPartition, KafkaOffsetManager]) extends TimeReplayableSource { def open(context: TaskContext, startTime: Option[TimeStamp]): Unit = { fetch.setStartOffset(topicAndPartition, offsetManager.resolveOffset(time)) ... } def read(batchSize: Int): List[Message] def close(): Unit}
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... }}
class DirectKafkaInputDStream[K, V, U <: Decoder[K]: ClassTag, T <: Decoder[V]: ClassTag, R]( _ssc: StreamingContext, val kafkaParams: Map[String, String], val fromOffsets: Map[TopicAndPartition, Long], messageHandler: MessageAndMetadata[K, V] => R ) extends InputDStream[R](_ssc) with Logging {
override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = { val untilOffsets = latestLeaderOffsets(maxRetries) ... }}
Exactly once delivery● Durable offset
0 1 2 3 4
0 1 2 3 4
10 2 3 4
10 3 42
Stream source
Stream source
Stream source
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
select
map filter
filtermap
select
select
sele
ct
Optimisation
Worker
Worker
Worker
Worker
select where
select where
WorkerStream source
Stream source
Stream source
select where
select where
Worker
Worker
Workerselect where
select where
Stream source
Stream source
Stream source select where
select where
select where
select where
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
val partitioner = partitionerClassName match { case "org.apache.cassandra.dht.Murmur3Partitioner" => Murmur3TokenFactory case "org.apache.cassandra.dht.RandomPartitioner" => RandomPartitionerTokenFactory case _ => throw new IllegalArgumentException(s"Unsupported partitioner: $partitionerClassName") }
private def splitToCqlClause(range: TokenRange): Iterable[CqlTokenRange] = { if (range.end == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) > ?", startToken)) else if (range.start == tokenFactory.minToken) List(CqlTokenRange(s"token($pk) <= ?", endToken)) else if (!range.isWrapAround) List(CqlTokenRange(s"token($pk) > ? AND token($pk) <= ?", startToken, endToken)) else List( CqlTokenRange(s"token($pk) > ?", startToken), CqlTokenRange(s"token($pk) <= ?", endToken))}
override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions}
override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)}
override def getPreferredLocations(split: Partition): Seq[String] = split.asInstanceOf[CassandraPartition].endpoints.flatMap(nodeAddresses.hostNames).toSeq
override def getPartitions: Array[Partition] = { val partitioner = CassandraRDDPartitioner(connector, tableDef, splitCount, splitSize) val partitions = partitioner.partitions(where) partitions}
override def compute(split: Partition, context: TaskContext): Iterator[R] = { val session = connector.openSession() val partition = split.asInstanceOf[CassandraPartition] val tokenRanges = partition.tokenRanges val metricsUpdater = InputMetricsUpdater(context, readConf)
val rowIterator = tokenRanges.iterator.flatMap( fetchTokenRange(session, _, metricsUpdater))
new CountingIterator(rowIterator, limit)}
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) })
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) }}
object PushPredicateThroughProject extends Rule[LogicalPlan] with PredicateHelper { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case filter @ Filter(condition, project @ Project(fields, grandChild)) if fields.forall(_.deterministic) =>
val aliasMap = AttributeMap(fields.collect { case a: Alias => (a.toAttribute, a.child) })
project.copy(child = Filter(replaceAlias(condition, aliasMap), grandChild)) }}
Table and stream duality
14
35
2
Table and stream duality
14
35
2
1 State X
1 Id 0Event 1
Table and stream duality
14
35
2
1 State X
Id 0Event 2
Id 0Event 1
Snapshot for offset N
Table and stream duality
14
35
2
1 Id 0Event 1
1 State X
Id 0Event 2
Id 0Event 1
4
Table and stream duality
Snapshot for offset N
14
35
2
1 Id 0Event 1
1 State X
Id 0Event 2
Id 0Event 1
4
NId 0Offset 123State X
Id 11Offset 123State X
Cache / view / index / replica / system / service
Continuous stream applying transformation function
Updates to the source of truth data
Original table
Infinite streams application
internet
services
devices
social
Kafka Stream processing
apps
Stream consumer
Search
Apps
Services
Databases
Batch
Batch
Serialisation
Distributed systems
User
Mobile
System
Microservice
Microservice
MicroserviceMicroservice Microservice Microservice
Microservice
CQRS/ES Relational NoSQL
Client 1
Client 2
Client 3
Update
Update
UpdateModel devices Model devices Model devices
Input data Input data Input data
Parameter devices
P
ΔP
ΔP
ΔP
Challenges
● All the solved problems○ Exactly once delivery○ Consistency○ Availability○ Fault tolerance○ Cross service invariants and consistency○ Transactions○ Automated deployment and configuration management○ Serialization, versioning, compatibility○ Automated elasticity○ No downtime version upgrades○ Graceful shutdown of nodes○ Distributed system verification, logging, tracing, monitoring, debugging○ Split brains○ ...
Conclusion
● From request, response, synchronous, mutable state● To streams, asynchronous messaging
● Production ready distributed systems
MANCHESTER LONDON NEW YORK
Questions
MANCHESTER LONDON NEW YORK
@zapletal_martin @cakesolutions
347 708 1518
We are hiringhttp://www.cakesolutions.net/careers