Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
TOWARDS A HOLISTIC FRAUD MANAGEMENT PRODUCT Aris Papadopoulos [email protected]!
A data-mining fraud management framework • A holistic, big-data mining approach to narrow risk and potentially create space
for strategic expansion (credit/loan application scoring, marketing/recommendations etc.)
• A cross-layer, cross-channel architecture building models from data on multiple levels: Industry level, processor level, merchant level, down to individual buyers (also include geographical, seasonal patterns).
• A 360 view that will predict individual buyers behavior down to a “segment of one”.
• Intelligent knowledge fusion from multiple agents operating at all levels mentioned above.
• Full value chain management based on transparent, actionable conversion of insights.
Aris Papadopoulos - A fraud management framework - Private & Confidential 2
Fraud examples • Stolen/lost card • Counterfeit card • ID theft
• Account takeover • Skimming: Stealing card info during legit transaction • Carding: Producing and trying card information • Triangulation: Receiving payment on auction sites and sending goods bought with
stolen card • Phishing • Botnets • Reshipping: Employed to re-ship goods coming from fraudulent purchases • Affiliate fraud: Affiliate that drives consumer traffic • Clean fraud: Transaction info seems legitimate in isolation • Chargeback fraud (“friendly” fraud): Bogus mail-not-received claim
Aris Papadopoulos - A fraud management framework - Private & Confidential 3
Goals
• Detect and prevent fraud with the greatest accuracy • Minimise false positives • Minimise chargebacks • Minimise the number and the cost of manual reviews • Transparent analytics and insightful reporting to discover the appropriate
anti-fraud strategy • Powerful strategy management tool to implement it • Ultimately, maximise customer’s revenues
Aris Papadopoulos - A fraud management framework - Private & Confidential 4
Scoring Manual review Chargeback management
Reporting/ analytics
Strategy management
Key features
• Customizable workflows, queues, priorities, roles etc. during manual review • Intelligent routing to the most appropriate agent • Dispute and chargeback management • Chargeback prediction • Device fingerprinting • Device-account-owner dynamics monitoring • Geolocation and IP monitoring with proxy detection • Affiliate monitoring • 3rd party services integration
Aris Papadopoulos - A fraud management framework - Private & Confidential 5
Scoring Manual review Chargeback management
Reporting/ analytics
Strategy management
Key features (cont.)
• Automatic rules creation from data • Post-authorization retrospective scoring until shipping • Compliance to local policies, regulations, tax procedures • Customizable reports • Real time and historic monitoring and ad-hoc data analytics • Events and alerts management • Anti-fraud strategy management:
• Easy rules management • Models management
• Cross-channel
Aris Papadopoulos - A fraud management framework - Private & Confidential 6
Scoring Manual review Chargeback management
Reporting/ analytics
Strategy management
M-payments • A strategic opportunity.
• Gartner: From $170bn in 2012 to $600bn by 2016 (other analysts predict even higher volumes).
• In a global population of 7bn, 5bn have mobile phones and only 2bn have bank accounts. • A collaboration/collision field for: banks, mobile operators, credit card
companies, payment gateways, device makers. • Two models, three broad markets:
• M-commerce: Replace cash and cards to buy products. • Applies to sophisticated (1bn) and developed (4bn) markets.
• Mobile money transfers: Replace bank accounts. • Applies to emerging (2bn) and developed markets.
• Some mobile payments companies: • Comviva, Fundamo (VISA), Gemalto, Monetise, Oberthur, Sybase (SAP), Utiba,
Aris Papadopoulos - A fraud management framework - Private & Confidential 7
M-payments (cont.) • New models are needed to accommodate:
• The more vulnerable nature of mobile devices theft which may result to data theft and account takeover.
• Increased rates of authentication false positives due to miss-typing (due to small size etc.). • New device-account patterns. • Mobility-sensitive behavior patterns such as IP-velocity, out-of-band authentication etc. • Mobile payments innovation such as mobile POS (Square, Paypal Beacon etc.), digital
wallets (Google etc.). • Possibly different regulations depending on location.
Aris Papadopoulos - A fraud management framework - Private & Confidential 8
A scoring framework
Aris Papadopoulos - A fraud management framework - Private & Confidential 9
Scoring Manual review Chargeback management
Reporting/ analytics
Strategy management
Rules
Rules
Predictive Analytics
Neural Networks
Support Vector Machines
Anomaly detection
Distance-based
Gaussian
Clustering
Centroid
Hierarchical
Velocity analysis
Markov Processes
Aris Papadopoulos - A fraud management framework - Private & Confidential 10
Detect known patterns
Detect complex known patterns
Detect unknown patterns Detect segments
Detect patterns in the time-
domain
Multi-agent knowledge fusion
Ensemble learning Meta-learning
Opt
imiz
atio
n
Artificial Immune Systems
Decision Trees
Aris Papadopoulos Private and confidential
Rules • Aim: Decide an action (class) given a state (feature values). • Pros:
• Comprehensible
• Cons: • Scale poorly • Especially with noisy data
• Product feature: • Automatic rule learning
• Example algorithm: PRISM, RIPPER (underlying method: Covering aka Separate-and-conquer) with pruning to simplify.
• Fuzzy rules: Writing rules in natural language.
• Examples of rules engines execution algorithms: RETE, TREAT.
Aris Papadopoulos - A fraud management framework - Private & Confidential 11
Practically: • Many false negatives => Insufficient
Rul
e-ba
sed
syst
ems
Decision trees • Closely related to rules. Main differences:
• Only one output (with rules come priority and conflict resolution methods). • Less comprehensible (you need to see at the entire structure, while rules are independent
chunks of knowledge).
• Example algorithms: ID3, C4.5/C5.0 (underlying method: divide-and-conquer to minimize entropy).
• Devise rules indirectly from decision trees: Combine separate-and-conquer with divide-and-conquer.
Aris Papadopoulos - A fraud management framework - Private & Confidential 12 R
ule-
base
d sy
stem
s
Neural networks • Predict the probability that a transaction is fraudulent given potentially vast
amounts of features (parameters) in real world’s complex patterns (non-linear classification).
• Cons: • A pattern must have been seen before, to be recognised. • Non-convex.
• Contextual challenge: Uneven classes: Fraud rate by order 0.8% (2012) => potentially many false positives. • To tackle this M.Krivko combined a rules-based system with a logistic regression classifier (i.e. a
single NN unit) deployed for a bank (“a hybrid model for plastic card fraud detection systems” -2010).
Aris Papadopoulos - A fraud management framework - Private & Confidential 13 P
redi
ctiv
e A
naly
tics
Neural Networks (cont.) • Learning algorithms:
• Back Propagation (supervised) • Self-Organizing Map (unsupervised)
• Advanced algorithms: • Deep learning: Self-taught neural networks that can be trained to recognize hierarchies of
higher level patterns. No manual classification during training. • Product feature: Identify patterns in false negatives? • Algorithm: Calculate the cost of each instance from self, through the NN. After training, perceptrons
of the last hidden layer output the probability of the input in the world they “fantasise”. • Underlying algorithm: Sparse coding: Learn a base (“bottleneck”), to which all instances can be
decomposed. • Dynamic NN architecture: Vary the number of hidden layers and the number of perceptrons
at each layer to minimize the error of the maximum likelihood model. • Nature-inspired heuristics: Genetic Algorithms, Simulated Annealing etc.
Aris Papadopoulos - A fraud management framework - Private & Confidential 14 P
redi
ctiv
e A
naly
tics
Artificial Immune Systems • In “Credit Card Fraud detection with Artificial Immune System” (2008) Cadi et
al. showed that a GA-optimized AIS outperformed the corresponding NN. • Natural: The immune system protects the body by recognizing antigens. When
B-cells encounter antigens, they respond with antibodies which attack the antigens. • AIS: B-cells respond like pattern matchers. Antigens and B-cells are represented as feature
vectors.
• If the affinity of the B-cell to the antigen is high, the B-cell becomes stimulated and produces mutated clones (better fit for the particular antigen). • AIS: Affinity: A similarity metric in the features vector space (see clustering).
• As the new B-cells are added, inactive ones die (survival of the fittest for network diversity).
Aris Papadopoulos - A fraud management framework - Private & Confidential 15 P
redi
ctiv
e A
naly
tics
Artificial Immune Systems (cont.) • Training an AIS (clustering within
classification): • Present the new antigen. • From a pool of memory B-cells select the
one with greater affinity (similarity metric in the features vector space, see clustering) with the antigen and clone. The clones may also mutate.
• B-cells stimulation is proportional to antigen affinity.
• After repeating until a certain threshold if average stimulation is reached, the process stops and the closest B-cell to the antigen is selected as the recognized class (added to the memory pool).
• Classification: • K-nearest neighbours to the closest B-cell
from the memory pool.
Aris Papadopoulos - A fraud management framework - Private & Confidential 16 P
redi
ctiv
e A
naly
tics
Support Vector Machines • SVMs are an extension of logistic regression to
non-linear classification. • It is reported to perform better than NNs under
circumstances. • It transforms features to the kernel space in
order to learn the maximum margin hyperplane that separates the classes.
Aris Papadopoulos - A fraud management framework - Private & Confidential 17 P
redi
ctiv
e A
naly
tics
Optimization • Use nature-inspired heuristics
• Genetic Algorithms • A population of random solutions is produced. The next generation of solutions is produced by
mutating (random changes) the best solutions of the previous generation or by breeding (combining) the best solutions.
• Simulated Annealing • Gradually decreasing “temperature”, decreases the level of randomization in search.
to search for the optimal: • Number of hidden layers in a neural network • Number of perceptrons at each layer • Feature subset out of all data (feature engineering)
Aris Papadopoulos - A fraud management framework - Private & Confidential 18
Clustering • Aim: Compute groups according to a similarity metric:
• Euclidean distance • Pearson correlation • Cosine similarity
• Product feature: • Cluster buyers in different groups (custom input features) and apply the appropriate
scorecard. • Identify common features inside a group, e.g. discover the common features insde a group
of fraudulent transactions (reporting) to adjust strategy. • Example algorithms:
• Centroid clustering (e.g. K-means) • Fuzzy C-means (membership to more than one clusters allowed by adding a fuzzifier in the
objective function that determines the degree of membership). • Hierarchical clustering
Aris Papadopoulos - A fraud management framework - Private & Confidential 19
Anomaly Detection • Aim: Detect outliers i.e. transaction patterns that have not been seen before. • Example algorithms:
• Distance from any cluster beyond a given threshold. • Gaussian distribution: Probability of an instance given the Gaussian distribution calculated
by previous instances. • The Gaussian distro is used when the actual distribution is unknown, because it is universal in
social sciences. • Scalable outlier detection. • Fuzzy-logic combinations of the above.
Aris Papadopoulos - A fraud management framework - Private & Confidential 20
Velocity • Aim: Compute the posterior probability of a sequence of different transactions
(filtering), given a Markov Model consisting of: • A transition model • An observation model (for a Hidden Markov Model where the latent variables are “legit” or
“fraud” transaction).
• Each transaction may be represented by: • The amount spent • The time elapsed from previous transaction etc.
• Algorithm: • Forward algorithm (conditional independence)
• Scalability note: Heuristics e.g. Gibbs sampling. • Learn an HMM: Baum-Welch
Aris Papadopoulos - A fraud management framework - Private & Confidential 21
Multi-agent knowledge fusion • Product feature:
• Intelligent fusion of the results from the discussed smart agents. • Allowing users fine-tuning, e.g. “overriding” rules:
• Enforce a scorecard • Exceptions (e.g. airline tickets etc.)
• Ensemble strategies: • Same vs different input representation • Multi-agent (different agents work in parallel and then results are combined) • Multi-stage (classifiers are trained on subsets based on the results of previous classifiers,
e.g. classify initial false positives to final true/false positive) • Cascading (next classifier used if previous result had low confidence)
Aris Papadopoulos - A fraud management framework - Private & Confidential 22
Multi-agent knowledge fusion (cont.) • Homogeneous classifiers combination techniques:
• Voting: • Max of all predictions, Max over average (for each class) among all classifiers, Max of products,
Max of sums, unanimity, accuracy weighted voting, accuracy on training/testing etc. • Data manipulation:
• Boosting: Classifiers trained on instances where previous classifiers failed (e.g. train on false positives).
• Bagging: Train each classifier on a different portion of the training set chosen stochastically. • Correlation reduction: Train each classifier on a different portion of the training set chosen based on
patterns (to the limit that each classifier is trained on one class).
Aris Papadopoulos - A fraud management framework - Private & Confidential 23
Multi-agent knowledge fusion (cont.) • Meta-learning (stacking):
• Used for heterogeneous classifiers (e.g. a NN and a SVM). • Replaces voting with a meta-classifier that learns to classify based on the outputs of the
previous levels classifiers.
Aris Papadopoulos - A fraud management framework - Private & Confidential 24
Social integration • Potential to narrow risk even further, e.g.:
• A location sign-in may prevent a false positive, otherwise potentially signaled due to location change.
• Black listed individuals may be an extra feature for those people having them in their network.
• Typing patterns may be used to enhance authentication.
Aris Papadopoulos - A fraud management framework - Private & Confidential 25