Scalable Anomaly Ranking of Attributed Neighborhoods

Preview:

Citation preview

Scalable Anomaly Ranking of Attributed Neighborhoods

SDMMay 5th, 2016

Bryan Perozzi, Leman AkogluStony Brook University

SDM 2016: Best Paper Runner-up Award!

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Internal

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Boundary Cut Edges

Internal + Boundary Conductance

Internal

External

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

What’s an Anomaly, Anyhow? Given an attributed subgraph how to

quantify its quality? Structure Only

Internal Measures Average Degree

Boundary Cut Edges

Internal + Boundary Conductance

Structure + Attributes SODA [Gupta+, 14] Attributed Weighted Normalized Cut [Gunnermann+,13]

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Outline Problem of Anomaly Ranking Metric: Normality Optimizing Normality Experimental Results Understanding Graphs with Normality

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition)

high

low

Given an attributed subgraph how to quantify quality?

Internal structural density

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition)

high

low

chess biking

Given an attributed subgraph how to quantify quality?

Internal structural density AND attribute coherence

neighborhood “focus”

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality (intuition) Given an attributed subgraph

how to quantify quality? Internal

structural density AND attribute coherence

neighborhood “focus” Boundary

structural sparsity, OR external separation

“exoneration”

high

lowBryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Motivation: no good cuts in real-world graphs social circles overlap

“Exoneration”: by (a) null model, (b) attributes

Normality (intuition)[Leskovec+ ‘08]

[McAuley+ ‘14]

(b) neighborhood overlap(a) hub effect

edges expected,not surprising

separable bydifferent “focus”

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1internal

consistency

Null model

dot-product, orKronecker’s “focus” vector

chess biking

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

The measure of Normality

1external

separability

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Anomaly Mining of Entity Neighborhoods (AMEN)

Given a community, can we find the weights which maximize its normality?

2

1

latent

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Optimizing Normality

2

1

3

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Optimizing Normality

: one attribute f with largest x

x

: all f with positive x

Normality becomes

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Size Invariant Scoring So far: Normality of a community grows with

size Bad for comparisons

Need to “Normalize” normality for ranking

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Illustrative examplessplit-radix FFTtelescopic op-amps

telescopic cascodemultidecade

… …

reciprocal splitreserve

… …

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Example neighborhoods

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Synthetic Anomaly Detection

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality vs Conductance, DBLP

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality vs Conductance, Google+

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Normality Distribution

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Feature distribution, DBLP

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Feature distribution, LastFM

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Thanks! Any questions?

Bryan Perozzi

Papers, Code, Contact Info:

www.perozzi.net

Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods

Recommended