18
Big & Fast: A quest for relevant and real-time analytics Natalino Busa @natalinobusa

Big and fast a quest for relevant and real-time analytics

Embed Size (px)

DESCRIPTION

Our retail banking market demands now more than ever to stay close to our customers, and to carefully understand what services, products, and wishes are relevant for each customer at any given time. This sort of marketing research is often beyond the capacity of traditional BI reporting frameworks. In this talk, we illustrate how we team up data scientists and big data engineers in order to create and scale distributed analyses on a big data platform.

Citation preview

Page 1: Big and fast a quest for relevant and real-time analytics

Big & Fast: A quest for relevant and real-time analytics

Natalino Busa@natalinobusa

Page 2: Big and fast a quest for relevant and real-time analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Page 3: Big and fast a quest for relevant and real-time analytics

Big and Fast. Methodology Architecture Roles and organization

Page 4: Big and fast a quest for relevant and real-time analytics

Conversion is the ultimate form of permission marketing

Permission marketing is about the honour of being heard.

How to earn it ? Provide the right suggestions, at the right time. This is what makes data analysis valuable

Page 5: Big and fast a quest for relevant and real-time analytics

When do you really know your customer ?

know about last unique:

5 songs?

100 songs?

10’000 songs?

Page 6: Big and fast a quest for relevant and real-time analytics

Old & New stuff.

We evolve slowly, our personality, our habits.

But events and trends can affect us on a short notice

How do you combine old with new?

Page 7: Big and fast a quest for relevant and real-time analytics

The customer’s contextComplex on many dimensions:

Personal history: amount of transactions ever done

Long term Interaction:how the users’ action correlate with others

Real time events:Trends and recent events

Page 8: Big and fast a quest for relevant and real-time analytics

The customer’s context

context is related to time:

slow changing: the defining characteristic of a person

fast changing: events which influence our lives, trends

Require very different technology solutions !!!

Page 9: Big and fast a quest for relevant and real-time analytics

Challenges

millions of billions of

Not much time to reactwindow of opportunity sometimes is just a few seconds

Load of information to processyou want to understand well the user history

Page 10: Big and fast a quest for relevant and real-time analytics

Slow and fast

ranking and preference analysis

segmentation and clustering

short term trending topics

rule-based recommendations

10’s Terabytes of Data. This can take hours ….

100’s of events per second.This must be fast ….

Page 11: Big and fast a quest for relevant and real-time analytics

Hadoop: Distributed Data OS

ReliableDistributed, Replicated File System

Low cost↓ Cost vs ↑ Performance/Storage

Computing Powerhouse

All clusters CPU’s working in parallel for running queries

Page 12: Big and fast a quest for relevant and real-time analytics

Scala / Akka / Spray: a WEB API reactive framework

ActorA Actor

B

ActorC

msg 1msg 2

msg 3

msg 4● it scales horizontally (can run in cluster mode)

● maximum use of the available cores/memory

1. processing is non-blocking, threads are re-used

2. can parallelize computing power across many actors

Very fast: 1000’s messages/sec

Very reliable: auto recovery

Page 13: Big and fast a quest for relevant and real-time analytics

Distributed computing: lambda architecture

BatchComputing

HTTP RESTful API

In-MemoryDistributed Database

In-memoryDistributed DB’s

Lambda ArchitectureBatch + Streaming

low-latencyWeb API services

StreamingComputing

Data Warehouses Messaging Busses

Page 14: Big and fast a quest for relevant and real-time analytics

Distributed computing: some techs

Hadoop

Cassandra

millions of billions of

λ= conversions

( lamda )

Page 15: Big and fast a quest for relevant and real-time analytics

All Things Distributed

Distributing computing and storage

more machines = more storage/computing

Open Source software solutions

mature enough for pragmatic adopters

Near realtime + big data technologies

Hadoop, Scala, Akka, Spray, Cassandra

Page 16: Big and fast a quest for relevant and real-time analytics

Science & Engineering

Statistics, Data Science

PythonRVisualization

IT InfraBig Data

JavaScalaSQL

Hadoop: Big Data Infrastructure, Data Science on large datasets

Big Data and Fast Data requires different profiles to be able to achieve the best results

Page 17: Big and fast a quest for relevant and real-time analytics

Parallelism Mathematics Programming

Languages Machine Learning Statistics

Big Data Algorithms Cloud Computing

Natalino Busa@natalinobusa

www.natalinobusa.com

Thanks !Any questions?

Page 18: Big and fast a quest for relevant and real-time analytics

Natalino Busa@natalinobusa