26
© 2016 MapR Technologies ® Handling the Extremes Scaling and Streaming in Finance

Handling the Extremes: Scaling and Streaming in Finance

Embed Size (px)

Citation preview

® 1®© 2016 MapR Technologies 1© 2016 MapR Technologies

®

Handling the ExtremesScaling and Streaming in Finance

® 2®© 2016 MapR Technologies 2

Agenda• History

– Past, present, future• Messaging platforms

– Defining the extremes• Use cases

– Email, fraud• Resources• Q&A

® 3®© 2016 MapR Technologies 3

Message Bus

Specialized Storage

Operational Applications

J2EE AppServer

Relational Database

Legacy Business Platforms

• IT must integrate all the products

• Inability to operationalize the insight rapidly

• Can’t deal with high speed data ingestion and processing

• Scale up architecture leads to high cost

Specialized Storage

Analytical Applications

Analytic Database ETL Tool BI Tool

® 4®© 2016 MapR Technologies 4

Converged Data Platform

Analytical Applications

Operational Applications

Converged ApplicationsComplete Access to Real-time and

Historical Data in One Platform

Developers Creating Database and Event Based

Applications

(Bottom Line Initiatives) (Top Line Initiatives)

Analysts Creating BI Reports and KPIs on Data

Warehouse

Historical Data Current Data

® 5®© 2016 MapR Technologies 5

Application Development and Deployment

Oracle

Bulk Load

Machine Learning

Data LakePredictive

Modeling

BI / Reporting

Insights DB

Events(Kafka)

NoSQL

SQL Server

Graph DB

Microservice(.NET)

Microservice(NodeJS)

Microservice(Java)

Customer Insights

SQL Server

IIS, ASP.NET

DesktopBrowser

(Javascript, jQuery)

SQL

HTML, CSS, JS

MicrosoftReportingService

2005 Today DesktopBrowser

(Javascript, 20+ Frameworks)

Tablet

Native Android

Native iOS

JSON

JSON, CSS, HTML, JS

Backend for Frontend

(Java)

® 6®© 2016 MapR Technologies 6

Application Development and Deployment

Oracle

Bulk Load

Machine Learning

Data LakePredictive

Modeling

BI / Reporting

Insights DB

Events(Kafka)

NoSQL

SQL Server

Graph DB

Microservice(.NET)

Backend for Frontend

(Java)

Microservice(NodeJS)

Microservice(Java)

DesktopBrowser

(Javascript, 20+ Frameworks)

Tablet

Native Android

Native iOS

Customer Insights

JSON

JSON, CSS, HTML, JS

SQL Server

IIS, ASP.NET

DesktopBrowser

(Javascript, jQuery)

SQL

HTML, CSS, JS

MicrosoftReportingService

2005 Today

® 7®© 2016 MapR Technologies 7

Web-Scale StorageMapR-FS MapR-DB

Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability

MapR StreamsEvent StreamingDatabase

MapR Platform Services: Open API ArchitectureAssures Interoperability, Avoids Lock-in

HDFS API

POSIXNFS

SQL,HBase

APIJSONAPI

KafkaAPI

® 8®© 2016 MapR Technologies 8

Converged Application Benefits• Consumers scale horizontally with partitions

• 1:1 mapping between consumer and partition• Enables predictable scaling as production needs grow

• Data can be seamlessly replicated to another cluster• Enables HA with zero code changes

• Data is indexed dynamically according to receivers, senders• Scales beyond the capabilities of Kafka

• Snapshots can be taken to capture state• Enables faster testing and deployment of

applications

® 9®© 2016 MapR Technologies 9© 2016 MapR Technologies© 2016 MapR Technologies

Messaging platforms

® 10®© 2016 MapR Technologies 10

Producers Consumers

Astream isanunboundedsequenceofeventscarriedfromasetofproducerstoasetofconsumers.

What’s a Stream?

Producersandconsumersdon’thavetobeawareofeachother,insteadtheyparticipateinsharedtopics.

Thisiscalledpublish/subscribe.

/Events:Topic

® 11®© 2016 MapR Technologies 11

Ability to Handle the “Extreme”

• 1+ Trillion Events– per day

• Millions of Producers– Billions of events per second

• Multiple Consumers– Potentially for every event

• Multiple Data Centers– Plan for success– Plan for drastic failure

Think that is crazy? Consider having 100 servers and performing:

Monitoring and Application logs…– 100 metrics per server– 60 samples per minute– 50 metrics per request– 1,000 log entries per request (abnormally

small, depends on level)– 1million requests per day

~ 2 billion events per day, for one small (ish) use case

Extreme Average Reality

® 12®© 2016 MapR Technologies 12

Producing and Consuming is Easyproducer = new KafkaProducer<>();

ProducerRecord<> event = new ProducerRecord<>(“/Events:Topic”, “MyEvent”);

producer.send(event);

consumer = new KafkaConsumer<>();

consumer.subscribe(“/MyStream:MyTopic”);

while(true) {ConsumerRecords<> events = consumer.poll(1000);Iterator<> newEvents = records.iterator();while(newEvents.hasNext()) {

System.out.println(newEvents.next().toString());}

}

/Events:Topic

® 13®© 2016 MapR Technologies 13

Producers and Consumers

/Events:Topic Analytics

Consumers

Stream ProcessorsSocial Platforms

Servers (Logs, Metrics)

Sensors

Mobile Apps

Other Apps & Microservices

Alerting Systems

Stream Processing Frameworks

Databases & Search Engines

Dashboards

Other Apps & Microservices

® 14®© 2016 MapR Technologies 14

Considering a Messaging Platform• 50-100k messages per second used to be good

– Not really good to handle decoupled communication between services

• Kafka model is BLAZING fast– Kafka 0.9 API with message sizes at 200 bytes– MapR Streams on a 5 node cluster sustained 18 million events / sec– Throughput of 3.5GB/s and over 1.5 trillion events / day

• Manual sharding is not a “great” solution– Adding more servers should be easy and fool proof, not painful– Yes, I have lived through this

® 15®© 2016 MapR Technologies 15© 2016 MapR Technologies

Use Cases in Finance

® 16®© 2016 MapR Technologies 16

Event-based Data Drives Applications

FailureAlerts

Real-time application & network monitoring

Trending now

WebPersonalized Offers

Real-time Fraud Detection

Ad optimizationSupply Chain Optimization

® 17®© 2016 MapR Technologies 17

How E-Mail Works…

® 18®© 2016 MapR Technologies 18

Fighting Fraudulent E-Mail• Phishing attempts

• Malware

• Spam

® 19®© 2016 MapR Technologies 19

Prevention Options• Train people to not click random links in emails

– This will NEVER happen (Honestly!)

• E-mail appliances to prevent users from seeing emails– Most typically require users to intervene– Costly

® 20®© 2016 MapR Technologies 20

Constructing an E-Mail Management Pipeline

Postfix Mail Server

E-Mail Stream

MTA

Spam FiltersPhishing Classification Internal Affairs

Legal Archive

MTA Postfix Mail Server

® 21®© 2016 MapR Technologies 21

Benefits of Approach• Customizable pipeline

• Can learn and apply new policies– Spam– Phishing classification– Fraud attempts

• Retention policies– Auditable– Simple search and discovery– Litigation hold

® 22®© 2016 MapR Technologies 22

ClassifiersFighting Fraudulent Web Traffic

Activity Stream

Click Stream

Deviation from Normal

Blacklist Activities

Whitelist Activities

User Activity Profile

Known Bad Classifier

All OK Classifier

Session Alteration Stream Notify Security

® 23®© 2016 MapR Technologies 23

Similarities between Marketing and Fraud?

Customer 360 Website Fraud

• Build a user profile– What are their normal usage patterns

• Build “segmented” profiles– What do real users normally do

• Dynamically alter website– Prevent user functionality

• Kick-off external workflows– Notify security team

• Build a user profile– What type of content do they like

• Build “segmented” profiles– Company affiliation

• Dynamically alter website– Show alternate content

• Kick-off external workflows– Nurture emails

® 24®© 2016 MapR Technologies 24

Not All Data Platforms are the Same

® 25®© 2016 MapR Technologies 25

Learn More about Converged Applications

Check out our Converged Application BlueprintVisit www.mapr.com/appblueprint

® 26®© 2016 MapR Technologies 26

@kingmesal

[email protected]

Engage with us!

kingmesal