Upload
centro-de-investigacion-para-la-gestion-tecnologica-del-riesgo-cigtr
View
133
Download
8
Tags:
Embed Size (px)
DESCRIPTION
Ponencia / Lecture. Stephen Moody. Executive Manager. DETICA NETREVEAL.
Citation preview
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 1
Advanced techniques for detecting complex fraud schemes in large datasets Dr SJ Moody Madrid July 2013
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 2
Contents
• Recent Fraud Trends and Statistics
• Evolution of Fraud Attacks and Protection Technologies
• State of the art detection and prevention
• Social Network Analytics
- Application fraud, bust out fraud and identity theft
• Random Forest Rule Optimization
• Third Party and Open Source Data
- Counterparty and Trade Finance Risk
• Data Sharing
- “Crash for cash” insurance fraud
• Kohonen Maps and Dynamic Time Warping
- Detecting unauthorized (“rogue”) trading
• Visual Analysis
• In-Memory Graph Analytics
• Future Technologies
Date/reference/classification 3
1/2 47% of Spanish companies admits
having been a victim of some kind of
fraud (2011).
50 fraudsters, 15,000 cards,
€50mn losses
Figures from a single organised scam
revealed by Europol in 2011
1 in 7 Motor personal injury claims in the UK
are thought to be linked to “crash for cash” insurance fraud
2x the budget deficit of the EU members -
Total loss due to fraud and unacceptable
evasion, as estimated by European
Commission
$15bn Fines that the largest banks in the US
and EU have paid to settle regulatory
investigations in 2012
$1tn The Association of Certified Fraud
Examiners estimates the cost of fraud in
U.S. organizations at 7% of annual
revenues, or $994 billion.
500,000 Police officers are involved in fighting
fraud in the EU
10% According to ECA, 10% of the total
claims expenditure in EU is fraudulent
Fraud figures – Spain, EU and Global
“...it is used by serious criminals to fund anything from
human trafficking to drug dealing...”
James Brokenshire
Minister for Crime and Security
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 4
Fraud Losses
£52 Billion
£16
Billion £21
Billion
£9
Billion
Fraud losses – UK (Source: National Fraud Authority June 2013)
£5
Bn
£14
Billion
Private Sector
Financial Services
£5 Billion Individuals
Public Sector
Tax
40% of frauds are
Cyber Enabled
Organised
Crime
At least £18 Billion in losses
is due to Organised Crime
Identity
Theft
Adult Population
27% of the
adult population
has now been a
victim of
identity theft
Cyber
2012 2013
Insider Enabled
fraud increased
by 42 % from
2012 to 2013
Cross
Border
462 of 7,503 OCGs in
the UK have
International Links
Date/reference/classification
Evolution of Fraud Attack Methodologies
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 5
Losses
Insurance Claims
Benefit Fraud
Tax Fraud
Internal Fraud
First Party Fraud
Application Fraud
Staged/Induced Accidents
Tax Refund Fraud
Identify Theft
Cyber Attack + Fraud
Phishing
Account Takeover
Automated
Opportunistic
Planned
Organised
Time / Confidence / Sophistication
“After penetrating the
computer network, the
crime ring allegedly made
more than 4,500 ATM
transactions in about 20
countries around the
world” Fox news
"I've poured a pint of
water down the back of
the TV... but, I was told
to do it by the man from
the telly repair shop.”
Woman from Wales
Date/reference/classification
Evolution Of Detection Technologies
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 6
Manual Review
Investigator’s Nose
Data Analysis Tools
Automated Rules
Data Mining
Analytics
Machine Learning
Data Matching
Fraud Watchlists
Data Visualisation
Entity Resolution
Data Fusion
Social Network Analytics
Rule Discovery
/ Testing
Identity Manipulation
Identity Theft
Distributed “below
the radar” frauds
Detection Innovations
Criminal Innovations Rapidly mobilised
“cyber enabled” global
attacks
Next generation?
Date/reference/classification
State Of The Art Fraud Detection
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 7
Historic data
Real-time data
Federated data
Open source
Rules Engine
Entity
Resolution
Social
Network
Analytics
Watchlist Match
Predictive Analytics
Task
Management
Visual
Analysis and
Search
Case
Management
Global Threat Research
and Model Development
Packaged Domain Threat Model
Feedback
Confirmed Frauds
3rd party data
Date/reference/classification
Application Fraud Example
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 8
One of these intends
to commit First Party
Fraud with the aid of
an Insider
One of these is an
identity stolen by an
organised crime
group
One of these is an
innocent high value
customer
But how can we tell the difference?
John Smith 12/04/1976
54 Acacia Rd London
07766543223 [email protected]
Jim Jones 15/08/1981
17 Guildford Rd London
07779876554 [email protected]
Sarah Green 25/01/1979
Flat 3 Woking Rd London
07766554332 [email protected]
Date/reference/classification
Apply Application Rules and Watchlist Match
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 9
They all pass the basic checks – shall we
take them all on as customers?
Credit Check
Watchlist Check
Credit Check
Watchlist Check
Credit Check
Watchlist Check
John Smith 12/04/1976
54 Acacia Rd London
07766543223 [email protected]
Jim Jones 15/08/1981
17 Guildford Rd London
07779876554 [email protected]
Sarah Green 25/01/1979
Flat 3 Woking Rd London
07766554332 [email protected]
Date/reference/classification
Automatic Entity Resolution
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 10
Holds another account
with a different phone
number and email
address.
No historic data found.
One previous application
declined with a different
phone number and
address.
Failed Credit Check.
John Smith 12/04/1976
54 Acacia Rd London
07766543223 [email protected]
Jim Jones 15/08/1981
17 Guildford Rd London
07779876554 [email protected]
Sarah Green 25/01/1979
Flat 3 Woking Rd London
07766554332 [email protected]
07766545112 [email protected]
91 Stoke Rd London
07779876555
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 11
Entity Resolution Example
Javier Bordona de los Santos 2435284F
Javier Bordona 2435284D 89236745
2435284F 89236745
Person
Account
Three separate source
documents are now connected
via the two resolved entities, the
person and the account
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 12
Entity Resolution Example
This is the start of the network
identification process
Customer
Record
Bank
Account
Transaction
Date/reference/classification © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 13
Network Identification Process
Further entities are resolved
and all possible links are
established
Customer
Record
Bank
Account
Transaction
Date/reference/classification © BAE Systems Detica 2011 COMMERCIAL IN CONFIDENCE 14
Network Identification Process
In “6 degrees of separation”
everything is connected to
everything
Date/reference/classification © BAE Systems Detica 2011 COMMERCIAL IN CONFIDENCE 15
Network Identification Process
But not all links are equal…
We lived at the same address…
But was it at the same time?
We work for the same company?
But are we directors?
We transact with each other
But how frequently?
Date/reference/classification © BAE Systems Detica 2011 COMMERCIAL IN CONFIDENCE 16
Network Identification Process
Analysing links
Date/reference/classification © BAE Systems Detica 2011 COMMERCIAL IN CONFIDENCE 17
Fraud Identification Process
Fraud Profiles
• Behaviours
• Patterns
• Hidden links
Staged Accidents
Unauthorised Trading
Identity Theft
First Party Fraud
Internal Fraud
Trade Finance Fraud
Identifying fraud patterns
Date/reference/classification © BAE Systems Detica 2011 COMMERCIAL IN CONFIDENCE 18
Real-time fraud assessment for prevention
New application, claim or transaction
• Check event level model . . . . . .
• Check network level model . . . . Fraud!
No risk found
COMMERCIAL IN CONFIDENCE
© Detica 2011
Fraud
Resolved, fraud risk assessed networks
Date/reference/classification
Automated Social Network Analytics
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 19
John Smith
Sarah Green
Existing Customer
High wealth
Property owner
Staff Member
Jim Jones
Shared low
income address
First Party “Bust Out
Fraud” facilitated by an
Insider
Organised Identity Theft
Fraud Network
Good Customer with
evidence of increasing
wealth
Date/reference/classification
Social Network Fraud Analytics
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 20
Document for risk
assessment
Document variables
D1, D2…..Dm
Entities connect multiple
documents over time
Derive Scoring Model
Entity Variables
E1, E2 …. En Risk Flow
Entity risk scores propagated
to connecting documents
Network sub clusters
are isolated
Network Variables
N1, N2 …. Np
Network risk scores
propagated to documents
Date/reference/classification
Fraud 3
Not Fraud 85
Fraud 7
Not Fraud 15
Random Forest Optimised Fraud Scoring Model
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
21
Document Variables, Propagated Entity Variables, Propagated Network Variables
Sample Data
Train 1 Test 1
T samples
M variables
“Forest” of
Decision Trees
with M randomly
selected
variables
Train 1 Test 1 Train T Test T
…
Train the
model
Decision = most
popular vote
Test the
model Vote1 Vote 2
Vote T
Test Data
Fraud 10
Not Fraud 100
Connected
to Fraud?
Yes No
Time since
account opening?
t < 6
months
Fraud 6
Not Fraud 1
Decision trees can deal with
incomplete data, categorical or
numerical data, and generate
human understandable rules
that can be easily tested for
accuracy
Historic outcome data
The “Random Forest” is an ensemble of decision trees each built on a random
subset of the input variables. The resulting models are highly accurate and
have the added benefit of providing model error estimates and variable
importance ratings.
Date/reference/classification
Third Party Data Sources and Open Source
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 22
My bank
Counterparty Risk
Diversified risk across
4 counterparties?
Adding third party data such
as the Bureau Van Dijk Orbis
data changes the risk profile
as all 4 counterparties lead to
the same ultimate owner
Negative news indicates this
organisation is in trouble –
identified via automated
sentiment analysis
Trade Finance Fraud
Trade Finance Agreement
Third party data shows that
entities both sides of the
transaction are owned by the
same beneficiary. Is this
money laundering?
Date/reference/classification
Data Sharing for Fraud Detection
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 23
The Insurance Fraud Bureau in the UK pools over 98% of insurance policies and claims to identify organised fraud rings via automated Social Network Analytics – arrests increased by a factor of 30
A recent bank data sharing fraud proof of concept revealed £100M+ in hidden fraud perpetrated against 3 retail banks
NFIB
IFB
CIFAS
Law
Enforcement
Banks Insurers
Government
Public Commercial
BFB
?
Date/reference/classification
Organised “Staged Accident” network with data from 6 different insurers
• $1.26 million claim value
• 14 claims
• 8 Accidents
• 5 new policies
• 11 total policies
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2009 - 5 new policies
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2009 October - Accident 1
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2009 October - Accident 2
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2010 January – Accident 3
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2010 July – Accident 4
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2010 August – Accident 5
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2010 August – Accident 6
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2011 December – Accident 7
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
2011 December – Accident 8
What happens if we don’t
intervene at this point?
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
A second ring developed in parallel
1
2
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
A third ring developed creating a large network
1
2
3
Date/reference/classification
Large network statistics
• Claims
• 160 total claims
• $3.79 million value of claims
• 50 injuries reported to group
insurers
• Claims made by suspect parties
and victims
• Policies
• 310 policies in total
• $1.08million premiums
• Policies held by suspect parties
and victims
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
Network Analytics and Kohonen Cluster Maps to Detect Unauthorised Trading
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 37
Derived
Network
Variables
Trader Networks – aggregated data across silos
of alerts. Links between traders, trading books,
sales and trading counterparties.
Input Vector
X1 X2 . . . . . . Xn
Trader
Behavioural Map
Trader profile calculated
each week to look for
changes in behaviour
Week 1
Week 2
Date/reference/classification
Dynamic Time Warping for Unauthorised Trading
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 38
Each Key Risk
Indicator alert is
counted up each day –
creating a time series
signature per week per
trader.
Time Warping Matrix: Non-Linear Transformation to
align time series ignoring small differences in features
Time Warping is used to
match “out of phase”
signatures or search for
“fuzzy” time series patterns
– e.g. involving
Cancelled or Amended
Trades and Dummy
Counterparty Trades. Tolerance
Date/reference/classification
Discontinuous changes in attack method
© BAE Systems Detica 2013 COMMERCIAL IN CONFIDENCE 39
• Criminals – think, plot and plan and can change their attack method rapidly in
discontinuous ways
• A human adversary requires a human defender in the protection process
Evolution
Invention
Slow change
Rapid change
Predictive analytics
Human analyst +
appropriate tools
Date/reference/classification
Visual Analysis and Investigation
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 40
Social Network Database
Link Analysis
Temporal Analysis
Geographic Analysis
Case Management
Simple intuitive
applications for
potentially thousands of
end user investigators
and case processors
In Memory Graph
Analytic Database
“What if” analysis
Graph Analytics
Ad Hoc Analytics
Workflow
Data Analysts / Scientists
Operational
Response
Date/reference/classification
In Memory Graph Analytics • Calculate Social Network Analytic metrics
• Betweeness, Centrality, Degree, Depth, Span, etc
• Aggregate data on the fly and add to the graph
• Derive new variables and attach these to entities and documents
• Test new hypothesis
• Query the graph via a Structured Graph Query Language (SGQL)
• Find rings and arbitrary graph patterns that match a newly discovered threat pattern
• Derive new link types on the fly
• Collapse and summarise link paths
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 41
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
“Decorate” the graph via calculations
1) Calculate the
“degree” – number of
connections attached
to each entity
2) Highlight addresses
with more than 8
connections in green
and scale icon size
with number of
connections
3) Highlight vehicles
with more than 4
connections in blue
and scale icon size
with number of
connections
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
Defining a “fraud ring” query
62 © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
find f1, i1, i2, f2 in [family] f1
linkto [motorClaimContacts] m1
linkto incident i1 linkto [motorClaimContacts]
linkto family f2 where f2!=f1 linkto
motorClaimContacts linkto different incident
i2 linkto [motorClaimContacts] linkto f1
Structured Graph Query Language (SGQL)
Date/reference/classification
Summary
63
Data Enrichment /
Context
Analytics
Rules
Behaviour/Outliers
Clustering
(Kohonen)
Optimisation
(Random Forest)
Events
Transactions
Entity
Resolution
Social
Network
Analytics
Data
Sharing
Strength of
Defence
Basic
Strong
Industry
Leader
Visualisation
Graph Analytics
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Third Party
Data and
Open
Source
Date/reference/classification
Future Technologies • Latest evolution in fraud attack methods involves globally organised fraud
schemes that combine cyber attacks with fraud – e.g.
• DDOS distraction attack
• Hack into account systems – change limits on pre-paid cards
• Account takeover of thousands of identities
• Clone cards
• Withdraw cash from hundreds of ATMs all over the world
• Politically motivated attacks to follow?
• Future Technology Response
• Combine cyber defence and intelligence systems alerts with fraud systems
- Understand that data theft or cyber intrusion may be a prelude to fraud
• Move to real-time social network analytics to spot fraud groups that may mobilise in
hours rather than weeks or months
• Increased identity security to prevent account takeover – mobile phones provide some
opportunities for added protection – e.g. fingerprint access to phone – phone becomes
card, GPS proximity verification of phone and card etc
64 © BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE
Date/reference/classification
For more information on how Detica NetReveal can
help your organisation, please contact:
Detica NetReveal
BlueFin Building
110 Southwark Street
London
SE1 0TA
United Kingdom
www.deticanetreveal.com
Head Office
Surrey Research Park
Guildford
Surrey
GU2 7Y
Tel: +44 (0)1483 816000
International Offices
Australia
Belgium
Canada
Dubai
France
Germany
Ireland
India
Poland
Singapore
Spain
The Netherlands
UK
USA
© BAE Systems plc 2012. All Rights Reserved.
BAE SYSTEMS, DETICA, NETREVEAL, Detica NetReveal are trade
marks of BAE Systems plc.
Detica Limited is a BAE Systems company registered in England and
Wales under number 1337451. Its registered office is at Surrey
Research Park, Guildford, England, GU2 7YP
© BAE Systems Detica 2012 COMMERCIAL IN CONFIDENCE 65
Contact Details If you have any questions regarding this document or would like to
find out more about Detica NetReveal® please contact:
Stephen Moody
Head of Product Management
+44 (0)1483 816572