Awesome Banking API's

Preview:

DESCRIPTION

How do you combine comprehensive analysis running on large amount of data with the demand for responsiveness of today's api services? This talk illustrates one of recipes that we currently use at ING to tackle this problem. Our analytical stack combines machine learning algorithms running on hadoop cluster and api services executed by an akka cluster. Cassandra is used as a 'latency adapter' between the fast and the slow path. Our api services are executed by the akka/spray layer. Those services consume both live data sources as well as intermediate results as promoted by the hadoop layer via cassandra. This approach allows us to provide internal api services which are both complete and responsive.

Citation preview

Awesome Banking APIsExposing bigdata and streaming analytics using hadoop, cassandra, akka and spray

Humanize Data

The bank statements

The bank statements How I read the bank bills

The bank statements How I read the bank bills What happened those days

data is the fabric of our lives

Personal history:

Long term Interaction:

Real time events:

>>> from sklearn.datasets import load_iris>>> from sklearn import tree>>> iris = load_iris()>>> clf = tree.DecisionTreeClassifier()>>> clf = clf.fit(iris.data, iris.target)

● Flexible, coincise language● Quick to code and prototype● Portable, visualization libraries

Machine learning libraries:scipy, statsmodels, sklearn, matplotlib, ipython

Web librariesflask, tornado, (no)SQL clients

# Multiple Linear Regression Example

fit <- lm(y ~ x1 + x2 + x3, data=mydata)

summary(fit) # show results

● Language for statitics● Easy to Analyze and shape data● Advanced statistical package● Fueled by academia and professionals● Very clean visualization packages

Packages for machine learningtime serie forecasting, clustering, classification decision trees, neural networks

Remote procedure calls (RPC)From scala/java via RProcess and Rserve

OK, let’s build some banking apps

core banking systems

SOAP services and DBs

System BUS

customer facing appls

channels

Bank schematic

Challenges

Higher separation !

Bigger and Faster

Less silos

Interactions

with core

systems

Reliable

Low cost↓ ↑

Computing Powerhouse

Reliable

Low latency

Tunable CAP

Data model: hashed rows, sorted wide columns

Architecture model: No SPOF, ring of nodes, omogeneous system

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4●

CoreFlow

HTTPI/O

NoSQLClient

hadoop

BatchDatascience

Cassandra

SOAPClient

Real-time Analytics

Bank core servicesBankTransactions

Data Science

Data Science

Data Science

API

Sprayin’ trait ApiService extends HttpService {

// Create Analytics client actor

val actor = actorRefFactory.actorOf(Props[AnalyticsActor], "analytics-actor")

//curl -vv -H "Content-Type: application/json" localhost:8888/api/v1/123/567

val serviceRoute = {

pathPrefix("api" / "v1") {

pathPrefix( Segment / Segment ) {

(aid, cid) =>

get {

complete {

actor ? (aid, cid)

Create an actor for analytics

Serve the API path

Message is passed on to the analytics actor

https://github.com/natalinobusa/wavr

Latency tradeoffs

Managing computation

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Some lessons learned

● Mix and match technologies is a good thing● Harden the design as you go● Define clear interfaces● Ease integration among teams● Hadoop , Cassandra, and Akka: they work!● Plugin the Data Science !

Thanks !Any questions?