If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud

Embed Size (px)

Citation preview

  • Slide 1

If They Cheat, Can We Catch Them With Predictive Modeling Richard A. Derrig, PhD, CFE President Opal Consulting, LLC Senior Consultant Insurance Fraud Bureau of Massachusetts CAS Predictive Modeling October 11-12, 2007 Slide 2 Insurance Fraud- The Problem ISO/IRC 2001 Study: Auto and Workers Compensation Fraud a Big Problem by 27% of Insurers. CAIF: Estimation (too large) Mass IFB: 1,500 referrals annually for Auto, WC, and Other P-L lines. Slide 3 Fraud Definition PRINCIPLES Clear and willful act Proscribed by law Obtaining money or value Under false pretenses Abuse, Unethical:Fails one or more Principles Slide 4 HOW MUCH CLAIM FRAUD? (CRIMINAL or CIVIL?) Slide 5 10% Fraud Slide 6 REAL PROBLEM-CLAIM FRAUD Classify all claims Identify valid classes Pay the claim No hassle Visa Example Identify (possible) fraud Investigation needed Identify gray classes Minimize with learning algorithms Slide 7 Company Automation - Data Mining Data Mining/Predictive Modeling Automates Record Reviews No Data Mining without Good Clean Data (90% of the solution) Insurance Policy and Claim Data; Business and Demographic Data Data Warehouse/Data Mart Data Manipulation Simple First; Complex Algorithms When Needed Slide 8 DATA Slide 9 Computers advance Slide 10 Experience and Judgment Artificial Intelligence Systems Regression & Tree Models Fuzzy Clusters Neural Networks Expert Systems Genetic Algorithms All of the Above FRAUD IDENTIFICATION Slide 11 MATHEMATICAL MODELS Databases Vector Spaces Topological Spaces Stochastic Processes Scoring Mappings to R Linear Functionals Slide 12 DM Databases Scoring Functions Graded Output Non-Suspicious Claims Routine Claims Suspicious Claims Complicated Claims Slide 13 DM Databases Scoring Functions Graded Output Non-Suspicious Risks Routine Underwriting Suspicious Risks Non-Routine Underwriting Slide 14 POTENTIAL VALUE OF A PREDICTIVE MODELING SCORING SYSTEM Screening to Detect Fraud Early Auditing of Closed Claims to Measure Fraud, Both Kinds Sorting to Select Efficiently among Special Investigative Unit Referrals Providing Evidence to Support a Denial Protecting against Bad-Faith Slide 15 PREDICTIVE MODELING SOME PUBLIC TECHNIQUES Fuzzy Logic and Controllers Regression Scoring Systems Unsupervised Techniques: Kohonen and PRIDIT EM Algorithm (Medical Bills) Tree-based Methods Slide 16 FUZZY SETS COMPARED WITH PROBABILITY Probability: Measures randomness; Measures whether or not event occurs; and Randomness dissipates over time or with further knowledge. Fuzziness: Measures vagueness in language; Measures extent to which event occurs; and Vagueness does not dissipate with time or further knowledge. Slide 17 Fuzzy Clustering & Detection: k-Means Clustering Fuzzy Logic: True, False, Uncertain Fuzzy Numbers: Membership Value Fuzzy Clusters: Partial Membership App1: Suspicion of Fraud App2: Town (Zip) Rating Classes REF: Derrig-Ostazewski 1995 Slide 18 FUZZY SETS TOWN RATING CLASSIFICATION When is one Town near another for Auto Insurance Rating? -Geographic Proximity (Traditional) -Overall Cost Index (Massachusetts) - Geo-Smoothing (Experimental) Geographically close Towns do not have the same Expected Losses. Clusters by Cost Produce Border Problems: Nearby Towns Different Rating Territories. Fuzzy Clusters acknowledge the Borders. Are all coverage clusters correct for each Insurance Coverage? Fuzzy Clustering on Five Auto Coverage Indices is better and demonstrates a weakness in Overall Crisp Clustering. Slide 19 Slide 20 Fuzzy Clustering of Fraud Study Claims by Assessment Data Membership Value Cut at 0.2 Slide 21 AIB FRAUD INDICATORS Accident Characteristics (19) No report by police officer at scene No witnesses to accident Claimant Characteristics (11) Retained an attorney very quickly Had a history of previous claims Insured Driver Characteristics(8) Had a history of previous claims Gave address as hotel or P.O. Box Slide 22 Supervised Models Regression: Fraud Indicators Fraud Indicators Serve as Independent Dummy Variables Expert Evaluation Categories Serve as Dependent Target Regression Scoring Systems REF1: Weisberg-Derrig, 1998 REF2: Viaene et al., 2002 Slide 23 Unsupervised Models Kohonen Self-Organizing Features Fraud Indicators Serve as Independent Features Expert Evaluation Categories Can Serve as Dependent Target in Second Phase Self-Organizing Feature Maps T. Kohonen 1982-1990 (Cybernetics) Reference vectors map to OUTPUT format in topologically faithful way. Example: Map onto 40x40 2-dimensional square. Iterative Process Adjusts All Reference Vectors in a Neighborhood of the Nearest One. Neighborhood Size Shrinks over Iterations Slide 24 MAPPING: PATTERNS-TO-UNITS Patterns Slide 25 KOHONEN FEATURE MAP SUSPICION LEVELS Slide 26 FEATURE MAP SIMILIARITY OF A CLAIM Slide 27 DATA MODELING EXAMPLE: CLUSTERING Data on 16,000 Medicaid providers analyzed by unsupervised neural net Neural network clustered Medicaid providers based on 100+ features Investigators validated a small set of known fraudulent providers Visualization tool displays clustering, showing known fraud and abuse Subset of 100 providers with similar patterns investigated: Hit rate > 70% Cube size proportional to annual Medicaid revenues 1999 Intelligent Technologies Corporation Slide 28 SELF ORGANIZING MAP Binary Features d(m,0) = Suspicion Level Given c = {features} c m c Guilt by Association Slide 29 PRIDIT Unique Embedded Score Data: Features have no natural metric-scale Model: Stochastic process has no parametric form Classification: Inverse image of one dimensional scoring function and decision rule Feature Value: Identify which features are important Slide 30 PRIDIT METHOD OVERVIEW 1. DATA: N Claims, T Features, K sub T Responses, Monotone In Fraud 2. RIDIT score each possible response: proportion below minus proportion above, score centered at zero. 3. RESPONSE WEIGHTS: Principal Component of Claims x Features with RIDIT in Cells (SPSS, SAS or S Plus software) 4. SCORE: Sum weights x claim Ridit score. 5. PARTITION: above and below zero. Slide 31 PRIDIT METHOD RESULTS 1. DATA: N Claims Clustered in Varying Degrees of Fraud 2. FEATURES: Each Feature Has Importance Weight 3. CLUSTERS: Size Can Change With Experience 4. SCORE: Can Be Checked On New Database 5. DECISIONS: Scores Near Zero = Too Little Information. Slide 326 months) TRT724%-.76.24 Indep. Medical examiner questioned extent of treatment TRT811%-.89.11 Medical audit raised questions about charges TRT94%-.96.04"> TABLE 1 Computation of RIDIT Scores Variable Label Proportion of Yes B t1 ("Yes") B t2 (No") Large # of Visits to ChiropractorTRT144%-.56.44 Chiropractor provided 3 or more modalities on most visits TRT212%-.88.12 Large # of visits to a physical therapist TRT38%-.92.08 MRI or CT scan but no inpatient hospital charges TRT420%-.80.20 Use of high volume medical provider TRT531%-.69.31 Significant gaps in course of treatment TRT69%-.91.09 Treatment was unusually prolonged (> 6 months) TRT724%-.76.24 Indep. Medical examiner questioned extent of treatment TRT811%-.89.11 Medical audit raised questions about charges TRT94%-.96.04 Slide 33 TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass 1 0.440.120.080.20.310.090.240.110.04.072 2 0.440.120.080.2-0.690.090.240.110.04.072 3 0.44-0.88-0.920.20.31-0.91-0.760.110.04-.251 4 -0.560.120.080.20.310.090.240.110.04.042 5 -0.56-0.880.080.20.310.090.240.110.04.022 6 0.440.120.080.20.310.090.240.110.04.072 7 -0.560.120.080.20.310.09-0.76-0.890.04-.101 8 -0.440.120.080.2-0.690.090.240.110.04.022 9 -0.56-0.880.08-0.80.310.090.240.11-0.96.052 10 -0.560.120.080.20.310.090.240.110.04.042 Slide 34 TABLE 2 Weights for Treatment Variables Variable PRIDIT Weights W () Regression Weights TRT1.30.32*** TRT2.19.19*** TRT3.53.22*** TRT4.38.07 TRT5.02.08* TRT6.70-.01 TRT7.82.03 TRT8.37.18*** TRT9-.13.24** Regression significance shown at 1% (***), 5% (**) or 10% (*) levels. Slide 35 TABLE 3 PRIDIT Transformed Indicators, Scores and Classes ClaimTRT 1 TRT 2 TRT 3 TRT 4 TRT 5 TRT 6 TRT 7 TRT 8 TRT 9 ScoreClass 1 0.440.120.080.20.310.090.240.110.04.072 2 0.440.120.080.2-0.690.090.240.110.04.072 3 0.44-0.88-0.920.20.31-0.91-0.760.110.04-.251 4 -0.560.120.080.20.310.090.240.110.04.042 5 -0.56-0.880.080.20.310.090.240.110.04.022 6 0.440.120.080.20.310.090.240.110.04.072 7 -0.560.120.080.20.310.09-0.76-0.890.04-.101 8 -0.440.120.080.2-0.690.090.240.110.04.022 9 -0.56-0.880.08-0.80.310.090.240.11-0.96.052 10 -0.560.120.080.20.310.090.240.110.04.042 Slide 36 TABLE 7 AIB Fraud and Suspicion Score Data Top 10 Fraud Indicators by Weight PRIDITAdj. Reg. ScoreInv. Reg. Score ACC3ACC1ACC11 ACC4ACC9CLT4 ACC15ACC10CLT7 CLT11ACC19CLT11 INJ1CLT11INJ1 INJ2INS6INJ3 INJ5INJ2INJ8 INJ6INJ9INJ11 INS8TRT1 LW6TRT9 Slide 37 EM Algorithm Hidden Exposures - Overview Modeling hidden risk exposures as additional dimension(s) of the loss severity distribution via EM, Expectation-Maximization, Algorithm Considering the mixtures of probability distributions as the model for losses affected by hidden exposures with some parameters of the mixtures considered missing (i.e., unobservable in practice) Approach is feasible due to advancements in the computer driven methodologies dealing with partially hidden or incomplete data models Empirical data imputation has become more sophisticated and the availability of ever faster computing power have made it increasingly possible to solve these problems via iterative algorithms Slide 38 Figure 1: Overall distribution of the 348 BI medical bill amounts from Appendix B compared with that submitted by provider A. Left panel: frequency histograms (provider As histogram in filled bars). Right panel: density estimators (provider As density in dashed line) Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 9, 11/18/02 Slide 39 Figure 2: EM Fit Left panel: mixture of normal distributions fitted via the EM algorithm to BI data Right panel: Three normal components of the mixture. Source: Modeling Hidden Exposures in Claim Severity via the EM Algorithm, Grzegorz A. Rempala, Richard A. Derrig, pg. 13, 11/18/02 Slide 40 Decision Trees In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure. www.wikipedia.org Slide 41 Different Kinds of Decision Trees Single Trees (CART, CHAID) Ensemble Trees, a more recent development (TREENET, RANDOM FOREST) A composite or weighted average of many trees (perhaps 100 or more) There are many methods to fit the trees and prevent overfitting Boosting: Iminer Ensemble and Treenet Bagging: Random Forest Slide 42 The Methods and Software Evaluated 1) TREENET5) Iminer Ensemble 2) Iminer Tree6) Random Forest 3) SPLUS Tree7) Nave Bayes (Baseline) 4) CART8) Logistic (Baseline) Slide 43 Ensemble Prediction of IME Requested Slide 44 TREENET ROC Curve IME AUROC = 0.701 Slide 45 Ranking of Methods/Software 1 st Two Surrogates Slide 46 Implementation Outline Included at End Slide 47 NON-CRIMINAL FRAUD? Slide 48 General Deterrence Ins. System, Ins Dept + DIA, Med, and Other Government Oversight Specific Deterrence Company SIU, Auditor, Data, Predictive Modeling for Claims and Underwriting. Non-Criminal Fraud Slide 49 FRAUD INDICATORS VALIDATION PROCEDURES Canadian Coalition Against Insurance Fraud (1997) 305 Fraud Indicators (45 vehicle theft) No one indicator by itself is necessarily suspicious. Problem: How to validate the systematic use of Fraud Indicators? Solution 1: Kohonen Self-Organizing Feature Map (Brockett et.al, 1998 Solution 2: Logistic Regression (Viaene, et.al, 2002) Solution 3: PRIDIT Method (Brockett et.al, 2002) Solution 4: Regression Tree Methods (Derrig, Francis, 2007) Slide 50 REFERENCES Brockett, Patrick L., Derrig, Richard A., Golden, Linda L., Levine, Albert and Alpert, Mark, (2002), Fraud Classification Using Principal Component Analysis of RIDITs, Journal of Risk and Insurance, 69:3, 341-373. Brockett, Patrick L., Xiaohua, Xia and Derrig, Richard A., (1998), Using Kohonen Self- Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud, Journal of Risk and Insurance, 65:245-274 Derrig, Richard A., (2002), Insurance Fraud, Journal of Risk and Insurance, 69:3, 271-289. Derrig, Richard A., and Ostaszewski, K., (1995) Fuzzy Techniques of Pattern Recognition in Risk & Claim Classification, 62:3, 147-182. Francis, Louise and Derrig, Richard A., (2007) Distinguishing the Forest from the TREES: Working Paper. Rempala, G.A., and Derrig, Richard A., (2003), Modeling Hidden Exposures in Claim Severity via the EM Algorithm, North American Actuarial Journal, 9(2), 108-128. Viaene, S., Derrig, Richard A. et. al, (2002) A Comparison of State-of-the-Art Classification Techniques for Expert Automobile Insurance Fraud Detection, Journal of Risk and Insurance, 69:3, 373-423. Weisberg, H.I. and Derrig R.A., (1998), Quantitative Methods for Detecting Fraudulent Automobile Bodily Injury Claims, RISQUES, vol. 35, pp. 75-101, July-September (In French, English available) Slide 51 Slide 52 Claim Fraud Detection Plan STEP 1:SAMPLE: Systematic benchmark of a random sample of claims. STEP 2:FEATURES: Isolate red flags and other sorting characteristics STEP 3:FEATURE SELECTION: Separate features into objective and subjective, early, middle and late arriving, acquisition cost levels, and other practical considerations. STEP 4:CLUSTER: Apply unsupervised algorithms (Kohonen, PRIDIT, Fuzzy) to cluster claims, examine for needed homogeneity. Slide 53 Claim Fraud Detection Plan STEP 5:ASSESSMENT: Externally classify claims according to objectives for sorting. STEP 6:MODEL: Supervised models relating selected features to objectives (logistic regression, Nave Bayes, Neural Networks, CART, MARS) STEP7:STATIC TESTING: Model output versus expert assessment, model output versus cluster homogeneity (PRIDIT scores) on one or more samples. STEP 8:DYNAMIC TESTING: Real time operation of acceptable model, record outcomes, repeat steps 1-7 as needed to fine tune model and parameters. Use PRIDIT to show gain or loss of feature power and changing data patterns, tune investigative proportions to optimize detection and deterrence of fraud and abuse.