View
2
Download
0
Category
Preview:
Citation preview
February 2019
Forecasting Suspicious Account Activity at
Large-Scale Online Service ProvidersHassan Halawa1, Konstantin Beznosov1, Baris Coskun2, Meizhu Liu3, Matei Ripeanu1
1 University of British Columbia2 Amazon Web Services
3 Yahoo! Research
Automated attacks
2
operating on alarge-scaleexploiting
unsafe decisionsmade by
individual users
■ Phishing
3
■ Phishing □ Online Services
4
5
■ Phishing■ Overview □ Current vs. Proposed Current Defenses Proposed
Reactive
Signatures
Proactive
Anomalies
EvolvingAttacks
FalsePositives
Forecasting
EarlyWarning
identifyingattack/attacker patterns
miningbehavioral /usage patterns
6
■ Phishing■ Overview □ Current vs. Proposed □ Highlights
■ Experiment at a Large-Scale Online Service Provider(4 months production data / 100+ million users / 100+ billion login events)
■ Promising Performance as an Early Warning System (AUROC ~ 0.92 / FPR ~ 0.5% / ACC ~ 99.5% / REC ~ 50.6% / PRE ~ 18.3% using only a 1 week historical trace and predicting 1 month in advance)
■ Supervised ML Pipeline for Forecasting(predict future suspicious account activity from historical traces)
■ Evaluation Across Varied Classification Exercises (1 week trace → [7, 90] day forecast / 3 weeks → [21, 34] days)
7
Account Registration
Account Compromised
Account Flagged
Account Remediation
Legitimate Owner
Time
Overview of thelifecycle of a compromised account lifecycle
■ Phishing■ Overview■ Approach □ Account Lifecycle
✔ ✘
Legitimate Owner
Legitimate Owner
Attacker
8
■ Phishing■ Overview■ Approach □ Account Lifecycle □ ML Pipeline
Goal: Forecast suspicious account activity using supervised machine learning
Data Source
Data Pre-Processing
Model Selection
Ground Truth User Activity
Susp. Acct. Classifier
Model Evaluation
Unstructured Data
Structured & Labeled Data
Susp. Acct. Population& Susp. Acct. Scores Defense Systems
MetricsAUROC, BTR, PRE, REC, FPR
9
Time
Overview of aclassification exercise
Training Interval Testing IntervalBuffer Window
(BW)
■ Phishing■ Overview■ Approach □ Account Lifecycle □ ML Pipeline □ Classification Exercise Data Source
Label Window (LW)
Ground Truth
Data Window (DW)
User Activity
Label Window (LW)
Ground Truth
Data Window (DW)
User Activity
Susp. Acct. Classifier Susp. Acct. Population& Susp. acct. Scores Defense Systems
10
■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises
Notation DW - Data Window, BW - Buffer Window,LW - Label Window, H - Prediction Horizon
Time
HyperparameterOptimization
Overfit Check
PreprocessingImpact
Performance forWider Windows
11
■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises □ AUROC
12
■ Phishing■ Overview■ Approach■ Evaluation □ Classification Exercises □ AUROC □ PRE/REC vs. Horizon
13
■ Phishing■ Overview■ Approach■ Evaluation■ Recap
■ Experiment at a Large-Scale Online Service Provider(4 months production data / 100+ million users / 100+ billion login events)
■ Promising Performance as an Early Warning System (AUROC ~ 0.92 / FPR ~ 0.5% / ACC ~ 99.5% / REC ~ 50.6% / PRE ~ 18.3% using only a 1 week historical trace and predicting 1 month in advance)
■ Supervised ML Pipeline for Forecasting(predict future suspicious account activity from historical traces)
■ Evaluation Across Varied Classification Exercises (1 week trace → [7, 90] day forecast / 3 weeks → [21, 34] days)
February 2019
Forecasting Suspicious Account Activity at
Large-Scale Online Service ProvidersHassan Halawa1, Konstantin Beznosov1, Baris Coskun2, Meizhu Liu3, Matei Ripeanu1
1 University of British Columbia2 Amazon Web Services
3 Yahoo! Research
15
Backup/Discussion Slides
16
■ Discussion □ Account Suspiciousness vs. Vulnerability
Time
Suspicious
in Future (Forecast)
Mining Historical Behavioral/Usage Patterns
Vulnerable
at Present
17
■ Discussion □ Current vs. Proposed
(1)Throttled Outbox
Delayed Inbox
(2)Personalized Controls
Targeted Education
(3)Efficient Compromise-Detection Campaigns
(1)Email ClassificationAnomaly Detection
(2)HTTPS Browser Lock
Two Factor Auth.
(3)Incident Response
User Reports
Feedback based on identifying vulnerable users
Feedback based on identifying attack patterns
AttackLaunchedPhishing emails
(1) Operator
Filter
SystemInfiltratedEmail in
inbox
(2)UserFilter
UserVictimizedCredentials
stolen
(3) Remediation
FilterCompromise
Detected
V Robust
Users
Honeypots
DifferentialDefenses
Prioritization
18
(1) Operator Filters (2) User Filters (3) Remediation Filters
Defense Resource Prioritization
Targeted User Education
Efficient InspectionEffective Exercises
Throttling DuringEmergencies Captive Portals
Mitigate Adversarial
Learning
Personalised Control & Advice
Infer Attack Origin
IdentifyNew Attacks
Design of new defense mechanisms■ Discussion □ Proposed Defense Mechanisms
19
■ Discussion □ Evaluation of Mechanisms ML Pipeline Vulnerability Classifier Vuln. Population
& Vuln. Scores Proposed Defenses
Simulation Analytical Models Practical Experiments
Output MetricsCost, Effectiveness
Evaluation of proposed defense mechanisms
InputAttack Propagation
Population DistributionDefense Parameters
S I R
S I R
S I R
V Robust
A/B TestDefense ApplicationTargeted Education
Defense EvaluationSecurity Exercise
20
Vulnerable Robust
Long-TermVulnerability
Scores
Context-SpecificVulnerability
Scores
Proposed Defenses
■ Discussion □ Context-Specific Defenses
■ Limited Access to User Data
■ Limited Computational Resources
■ Imperfect Groundtruth
■ Aggressive Pruning Heuristics
21
■ Discussion □ Results Presented as Lower Bounds
22
■ Discussion □ Buffer Window Sizing
23
“Social engineering, in the context of information security, refers to psychological manipulation of people into performing actions or divulging confidential information. A type of confidence trick for the purpose of information gathering, fraud, or system access, it differs from a traditional "con" in that it is often one of many steps in a more complex fraud scheme.”
■ Discussion □ Social Engineering
■ Cost of attack
■ Multi-Stage Attacks
■ Similar dynamics to epidemics
24
■ Discussion □ Focusing on the Vulnerable Population as a key defense Element
■ Targeted
■ Efficient
■ Proactive
■ Robust
25
■ Discussion □ Advantages of Proposed Paradigm
26
■ Discussion □ Robustness
■ Current defenses are attack/attacker centric
■ Based on attacker-controlled behavior/features
■ Attackers can employ adversarial strategies
■ Discussion □ Reactive Defenses
Focus on identifying attacks/attackers
27
[SNS’11] Tao Stein, Erdong Chen, and Karan Mangla. 2011. Facebook immune system. In Proceedings of the 4th Workshop on Social Network Systems (SNS'11). ACM, pp. 8, New York, NY, USA.
Begin Attack
Initial Detection
DefenderResponds
AttackerDetects
Attack
Mutate
Detect
Defense
Attacker Controls
Defender Controls
28
■ Discussion □ User Education
■ First line of defense
■ Direct cost (attack) vs. Indirect cost (effort)
■ Distribute cost proportional to user vulnerability
■ Paternalism
■ Fairness (Service Discrimination)
29
■ Discussion □ Legal/Ethical Considerations
■ Feasibility to develop a vulnerability classifier
■ Inaccuracies in predicting the vulnerable population
■ Some defense mechanisms may violate user expectations
■ Targeted protection may be confusing / complex
30
■ Discussion □ Adoption Challenges
■ Offline Worlds
■ Online Worlds
■ Our Experience
31
■ Discussion □ Related Work
■ Large-scale social-bot infiltration feasible
■ Defense system leveraging the proposed paradigm
■ Deployed at Telefonica’s OSN Tuenti (50+ million users)
32
■ Discussion □ Our Experience (Integro)
33
■ Discussion □ Integro
[ECS’16] Boshmaf, Y., Logothetis, D., Siganos, G., Lería, J., Lorenzo, J., Ripeanu, M., Beznosov, K., and Halawa, H. (2016). Íntegro: Leveraging Victim Prediction for Robust Fake Account Detection in Large Scale OSNs.Elsevier Computers & Security. 61: 142-168.
Recommended