Fraud Detection for Israel BigThings Meetup

Real Time Anomaly DetectionPatterns and reference architectures

Gwen Shapira, System Architect

Overview• Intro• Review Problem• Quick overview of key technology• High level architecture• Deep Dive into NRT Processing• Completing the Puzzle – Micro-batch, Ingest and Batch

Gwen Shapira• 15 years of moving data• Formerly consultant, engineer• System Architect @ Confluent• Kafka Committer• @gwenshap

There’s a Book on That

Founded by creators of Kafka - @jaykreps, @nehanarkhede, @junrao

We help you gather, transport, organize, and analyze all of your stream data

What we offer• Confluent Platform• Kafka plus critical bug fixes not yet applied in Apache release• Kafka ecosystem projects• Enterprise support• Training and Professional Services

The Problem

Credit Card Transaction Fraud

Coupon Fraud

Video Game Strategy

Health Insurance Fraud

How do we React• Human Brain at Tennis

• Muscle Memory• Reaction Thought• Reflective Meditation

Overview of Key Technologies

The Basics

• Messages are organized into topics

• Producers push messages• Consumers pull messages• Kafka runs in a cluster. Nodes are called brokers

Topics, Partitions and Logs

Each partition is a log

Each Broker has many partitions

Partition 0 Partition 0

Partition 1 Partition 1

Partition 2

Partition 1

Partition 0

Partition 2 Partion 2

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

Producers load balance between partitions

Partition 0

Partition 1

Partition 2

Partition 1

Partition 0

Partition 2

Partition 0

Partition 1

Partion 2

Client

Consumers

Consumer Group Y

Consumer Group X

Consumer

Kafka Cluster

Partition A (File)

Partition B (File)

Partition C (File)

Consumer

Order retained with in partition

Order retained with in partition but not over

partitionsOff

Off sets are kept per consumer group

Consumer-Producer Pattern

Keeping Things Simple• Consume records from Kafka Topic• Filter, transform, join, lookups, aggregate• Write to another Kafka Topic• https://github.com/confluentinc/examples/tree/master/specifi

c-avro-consumer

Kafka Makes Streams Easy• Producers partition the data• Consumers load balance partitions• Add / remove consumers any way you want• Will work with any framework (or none!)

Coming Soon to Kafka Near You

• KafkaConnect - Export / Import for Kafka - 0.9.0 (Its here!)• KStream

• Consumer-Producer client - Processor (0.10.0 - April?)• DSLs:

• KStream (a bit like Spark) - (0.10.0 - April?)• SQL - ???

KConnect - Its a thing• Easy to add connectors to Kafka• Existing connectors

• JDBC• HDFS• MySQL * 2• ElasticSearch * 4• Cassandra• S3 * 2• MQTT• Twitter

• Kafka Connectors:• http://www.confluent.io/developers/connectors• http://docs.confluent.io/2.0.0/connect/index.html

• KStreams:• https://github.com/gwenshap/kafka-examples/blob/master/

KafkaStreamsAvg

SparkStreaming

Spark Example1. val conf = new SparkConf().setMaster("local[2]”)

2. val sc = new SparkContext(conf)

3. val lines = sc.textFile(path, 2)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

Spark Streaming Example1. val conf = new SparkConf().setMaster("local[2]”)

2. val ssc = new StreamingContext(conf, Seconds(1))

3. val lines = ssc.socketTextStream("localhost", 9999)

4. val words = lines.flatMap(_.split(" "))

5. val pairs = words.map(word => (word, 1))

6. val wordCounts = pairs.reduceByKey(_ + _)

7. wordCounts.print()

8. SSC.start()

Spark Streaming

Confidentiality Information Goes Here

DStream

Single Pass

Source Receiver RDD

Filter Count Print

Source Receiver RDD

Single Pass

Filter Count Print

Pre-first Batch

First Batch

Second Batch

Confidentiality Information Goes Here

DStream

Single Pass

Source Receiver RDD

Filter Count

Source Receiver RDD

Single Pass

Filter Count

Pre-first Batch

First Batch

Second Batch

Stateful RDD 1

Stateful RDD 2

Stateful RDD 1

High Level Architecture

Real-Time Event Processing Approach

Hadoop Cluster IIStorage Processing

Hadoop Cluster I

ClientClientFlume Agents

Hbase / Memory

Spark Streaming

Hive/ImpalaMap/

ReduceSpark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT Changes and

Counters

Local Cache

Clients:(Swipe here!)

Web App

Adjust NRT Statistics

Yarn / Mesos

Analytics Layer

ClientClientKStreams

Analytical Adjustment

s and Pattern

detection

Adjusting NRT Stats Batch Time Adjustments

Review of NRT

Changes and

CountersLocal Cache

Web App

Connecor

Connector

KStreamProcessor

Profile Updates

Model Updates

Transactions

Local Store

Decisions

RedoLog

KStreamProcessorKStreamProcessor

NRT Processing

Focus on NRT First

Hadoop Cluster IIStorage Processing

Hadoop Cluster I

ClientClientProcessor

Hbase / Memory

Spark Streaming

Hive/ImpalaMap/

ReduceSpark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT Changes and

Counters

Local Cache

Web App

Adjust NRT Statistics

Streaming Architecture – NRT Event Processing

Initial Events TopicEvent Processing Logic

Local Memory

HBase Client

Answer Topic

Able to respond with in 10s of milliseconds

Partitioned NRT Event Processing

Initial Events Topic

Event Processing Logic

Local Cache

HBase Client

Answer Topic

TopicPartition A

Partition B

Partition C

Producer

Partitioner

Producer

Partitioner

Producer

Partitioner

Custom Partitioner

Better use of local memory

Questions?http://confluent.io

@confluentInc@gwenshap

gwen@confluent.io

Fraud Detection for Israel BigThings Meetup

Technology

Page rank for anomaly detection - Big Things meetup in Israel

Go on GAE (Go Israel Meetup)

Rise of the machines -- Owasp israel -- June 2014 meetup

Wearables meetup

Ansible Israel Kickoff Meetup

Tamp bay word press meetup – npr feb-11_2015 meetup

POTENTIAL FRAUD INDICATORS AND HIGH RISK AREASiced.cag.gov.in/wp-content/uploads/B-01/B-01 Sadu Israel Fraud... · POTENTIAL FRAUD INDICATORS AND HIGH RISK AREAS Dr. Sadu Israel,

Web Design Principles - Find Meetup groups near you - Meetup

Mobile Test Coverage- Israel 4th meetup

Sydney IoE Meetup Community - 1st Meetup Presentation

Fort Worth IBD Meetup Investor’s Business Daily Meetup …armchairinvestor.com/wp-content/uploads/2016/12/IBD-Meetup-Fort... · Fort Worth IBD Meetup Investor’s Business Daily

Fraud Meetup

Talend spark meetup 03042017 - Paris Spark Meetup

Our Lessons Learned - Berlin Meetup organizer´s Meetup

Ansible Israel Meetup

Reinventing Fraud Prevention & Underwriting with Machine ... · Propriety and Confidential BlueVine – flexible business lines of credit and invoice factoring Tel Aviv, Israel •

Birmingham Meetup

Rebuild presentation - IoT Israel MeetUp

Mississauga Big Data Meetup 26 - kumarvn's blogCredit Card Fraud Dataset Uber Pickups Dataset Health Insurance Marketplace Business Domain Expertise --- --- --- Machine Learning Expertise

Meetup Lfoppiano