© King.com Ltd 2016 – Commercially confidential
Real-Time Analytics as a Service at KingGyula Fóra
Data Warehouse Engineer
Apache Flink PMC
Page 2
© King.com Ltd 2016 – Commercially confidential
We make awesome mobile games
463 million monthly active users
30 billion events per day
And a lot of data…
Page 3
About King
© King.com Ltd 2016 – Commercially confidential
From streaming perspective…
Page 4
DB
30 billion events / day
Analytics/Processing applications
Terabytes of state
DB
DB
© King.com Ltd 2016 – Commercially confidential
This is awesome, but…
Page 5
End-users are often not Java/Scala developers
Writing streaming applications is pretty hard
Large state and windowing doesn’t help either
Seems to work in my IDE, what next?
We need a “turnkey” solution
© King.com Ltd 2016 – Commercially confidential
The RBea platform
Page 6
Powered by Apache Flink
Scripting on the live streams
Window aggregates
Stateful computations
Scalable + fault tolerant
© King.com Ltd 2016 – Commercially confidential
RBea architecture
Page 7
Events Output
REST API
RBEA web frontend
Libraries
http://hpc-asia.com/wp-content/uploads/2015/09/World-Class-Consultancy-Seeking-Data-Scientist-CA-Hobson-Associates-Matthew-Abel-Recruiter.jpg
Data Scientists
© King.com Ltd 2016 – Commercially confidential
RBea backend implementation
Page 8
One stateful Flink job / game
Stream events and scripts
Events are partitioned by user id
Scripts are broadcasted
Output/Aggregation happens downstream
S1 S2
S3S4
S5
Add/Remove scripts
Event stream Loop over deployed scripts and process
CoFlatMap
Output based on API calls
© King.com Ltd 2016 – Commercially confidential
Dissecting the DSL
Page 9
@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,
Output out,Aggregators agg) {
long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)
Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()
}
© King.com Ltd 2016 – Commercially confidential
Dissecting the DSL
Page 10
@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,
Output out,Aggregators agg) {
long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)
Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()
}
Processing methods by annotation
Event filter conditions
Flexible argument list
Code-generate Java classes
=> void processEvent(Event e, Context ctx);
© King.com Ltd 2016 – Commercially confidential
Dissecting the DSL
Page 11
@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,
Output out,Aggregators agg) {
long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)
Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()
}
Output calls create Output events
Output(KAFKA, “purchases”, “…” )
These events are filtered downstream and sent to a Kafka sink
© King.com Ltd 2016 – Commercially confidential
Dissecting the DSL
Page 12
@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,
Output out,Aggregators agg) {
long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)
Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()
}
Aggregator calls create Aggregate events
Aggr (MYSQL, 60000, “PurchaseCount”, 1)
Flink window operators do the aggregation
© King.com Ltd 2016 – Commercially confidential
Aggregators
Page 13
long size = aggregate.getWindowSize();long start = timestamp - (timestamp % size);long end = start + size;TimeWindow tw = new TimeWindow(start, end);
Event time windowsWindow size / aggregator
Script1
Script2
Window 1Window 2 NumGames
Revenue
W1: 8999W2: 9001
W1: 200W2: 300
MyAggregator
W1: 10W2: 5
Dynamic window assignment
© King.com Ltd 2016 – Commercially confidential
How do we run Flink
Page 15
Standalone => YARN
Few heavy streaming jobs => more and more
RocksDB state backend
Custom deployment/monitoring tools
© King.com Ltd 2016 – Commercially confidential
King Streaming SDK (sneak preview)
Page 17
Goal: Bridge the gap between RBea and Flink
Build data pipelines from RBea processors
Strict event format, limited set of operations
Easy ”stream joins” and pattern matching
Thin wrapper around Flink
© King.com Ltd 2016 – Commercially confidential
King Streaming SDK (sneak preview)
Page 18
Last<GameStart> lastGS = Last.semanticClass(GameStart.class);
readFromKafka("event.myevents.log", "gyula").keyByCoreUserID().join(lastGS).process((event, context) -> {
context.getJoined(lastGS).ifPresent(lastGameStart -> {
context.getAggregators().getCounter("Purchases", MINUTES_10).setDimensions(lastGameStart.getLevel()).increment();
});});
© King.com Ltd 2016 – Commercially confidential
Closing
Page 19
RBea makes streaming accessible to every data scientist at King
We leverage Flink’s stateful and windowed processing capabilities
People love it because it’s simple and powerful