26
Awesome Banking APIs Exposing bigdata and streaming analytics using hadoop, cassandra, akka and spray

Awesome Banking API's

Embed Size (px)

DESCRIPTION

How do you combine comprehensive analysis running on large amount of data with the demand for responsiveness of today's api services? This talk illustrates one of recipes that we currently use at ING to tackle this problem. Our analytical stack combines machine learning algorithms running on hadoop cluster and api services executed by an akka cluster. Cassandra is used as a 'latency adapter' between the fast and the slow path. Our api services are executed by the akka/spray layer. Those services consume both live data sources as well as intermediate results as promoted by the hadoop layer via cassandra. This approach allows us to provide internal api services which are both complete and responsive.

Citation preview

Page 1: Awesome Banking API's

Awesome Banking APIsExposing bigdata and streaming analytics using hadoop, cassandra, akka and spray

Page 2: Awesome Banking API's
Page 3: Awesome Banking API's

Humanize Data

Page 4: Awesome Banking API's

The bank statements

Page 5: Awesome Banking API's

The bank statements How I read the bank bills

Page 6: Awesome Banking API's

The bank statements How I read the bank bills What happened those days

Page 7: Awesome Banking API's

data is the fabric of our lives

Page 8: Awesome Banking API's

Personal history:

Long term Interaction:

Real time events:

Page 9: Awesome Banking API's
Page 10: Awesome Banking API's

>>> from sklearn.datasets import load_iris>>> from sklearn import tree>>> iris = load_iris()>>> clf = tree.DecisionTreeClassifier()>>> clf = clf.fit(iris.data, iris.target)

● Flexible, coincise language● Quick to code and prototype● Portable, visualization libraries

Machine learning libraries:scipy, statsmodels, sklearn, matplotlib, ipython

Web librariesflask, tornado, (no)SQL clients

Page 11: Awesome Banking API's

# Multiple Linear Regression Example

fit <- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results

● Language for statitics● Easy to Analyze and shape data● Advanced statistical package● Fueled by academia and professionals● Very clean visualization packages

Packages for machine learningtime serie forecasting, clustering, classification decision trees, neural networks

Remote procedure calls (RPC)From scala/java via RProcess and Rserve

Page 12: Awesome Banking API's

OK, let’s build some banking apps

Page 13: Awesome Banking API's

core banking systems

SOAP services and DBs

System BUS

customer facing appls

channels

Bank schematic

Page 14: Awesome Banking API's

Challenges

Page 15: Awesome Banking API's

Higher separation !

Bigger and Faster

Less silos

Interactions

with core

systems

Page 16: Awesome Banking API's
Page 17: Awesome Banking API's

Reliable

Low cost↓ ↑

Computing Powerhouse

Page 18: Awesome Banking API's

Reliable

Low latency

Tunable CAP

Data model: hashed rows, sorted wide columns

Architecture model: No SPOF, ring of nodes, omogeneous system

Page 19: Awesome Banking API's

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4●

Page 20: Awesome Banking API's

CoreFlow

HTTPI/O

NoSQLClient

hadoop

BatchDatascience

Cassandra

SOAPClient

Real-time Analytics

Bank core servicesBankTransactions

Data Science

Data Science

Data Science

API

Page 21: Awesome Banking API's

Sprayin’ trait ApiService extends HttpService {

// Create Analytics client actor

val actor = actorRefFactory.actorOf(Props[AnalyticsActor], "analytics-actor")

//curl -vv -H "Content-Type: application/json" localhost:8888/api/v1/123/567

val serviceRoute = {

pathPrefix("api" / "v1") {

pathPrefix( Segment / Segment ) {

(aid, cid) =>

get {

complete {

actor ? (aid, cid)

Create an actor for analytics

Serve the API path

Message is passed on to the analytics actor

https://github.com/natalinobusa/wavr

Page 22: Awesome Banking API's

Latency tradeoffs

Page 23: Awesome Banking API's

Managing computation

Page 24: Awesome Banking API's

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Page 25: Awesome Banking API's

Some lessons learned

● Mix and match technologies is a good thing● Harden the design as you go● Define clear interfaces● Ease integration among teams● Hadoop , Cassandra, and Akka: they work!● Plugin the Data Science !

Page 26: Awesome Banking API's

Thanks !Any questions?