Mining Large Graphschristos/TALKS/16-09-AMAZON/faloutsos... · 2016. 9. 9. · CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU

CMU SCS

Mining Large Graphs: Patterns, Anomalies, and Fraud

Detection

Christos Faloutsos CMU

CMU SCS

Thank you!

•  Alex Smola

•  Aditi Chaudhuri

•  Tiffany Kuan

Amazon'16 (c) 2016, C. Faloutsos 2

CMU SCS

Foils: at


http://www.cs.cmu.edu/~christos/TALKS/16-09-AMAZON/

CMU SCS

(c) 2016, C. Faloutsos 4

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

Amazon'16

CMU SCS


Graphs - why should we care? •  Recommendation systems

•  App-stores + reviews

Amazon'16

Yelp ….

CMU SCS


Graphs - why should we care?

Amazon'16

~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’

CMU SCS



>$10B; ~1B users

Amazon'16

CMU SCS



Internet Map [lumeta.com]

Food Web [Martinez ’91]

Amazon'16

CMU SCS


Graphs - why should we care? •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic and

anomaly detection •  ....

•  Many-to-many db relationship -> graph

Amazon'16

CMU SCS

Motivating problems •  P1: patterns? Fraud detection?

•  P2: patterns in time-evolving graphs / tensors


time

destination

CMU SCS




time

destination

Patterns anomalies

CMU SCS




time

destination

Patterns anomalies*

* Robust Random Cut Forest Based Anomaly Detection on Streams Sudipto Guha, Nina Mishra , Gourav Roy, Okke Schrijvers, ICML’16

CMU SCS


Roadmap


•  Part#1: Patterns & fraud detection •  Part#2: time-evolving graphs; tensors •  Conclusions

Amazon'16

CMU SCS


Part 1: Patterns, &

fraud detection

CMU SCS


Laws and patterns •  Q1: Are real graphs random?

Amazon'16

CMU SCS


Laws and patterns •  Q1: Are real graphs random? •  A1: NO!!

– Diameter (‘6 degrees’; ‘Kevin Bacon’) –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

Amazon'16

CMU SCS


Solution# S.1 •  Power law in the degree distribution [Faloutsos x 3

SIGCOMM99]

log(rank)

log(degree)

internet domains

att.com

ibm.com

Amazon'16

CMU SCS


Solution# S.1 •  Power law in the degree distribution [Faloutsos x 3

SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16

CMU SCS


Solution# S.1 •  Q: So what?

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16

CMU SCS


Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs:

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16

= friends of friends (F.O.F.)

CMU SCS


Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: 100^2 * N= 10 Trillion

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16


CMU SCS


Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: 100^2 * N= 10 Trillion

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16


CMU SCS


Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16

~0.8PB -> a data center(!)

DCO @ CMU

Gaussian trap


CMU SCS


Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Amazon'16

~0.8PB -> a data center(!)

Gaussian trap

CMU SCS


Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix (‘eig()’)

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

Amazon'16

A x = λ x

CMU SCS


Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs

– Patterns: Degree; Triangles – Anomaly/fraud detection – Graph understanding

•  Part#2: time-evolving graphs; tensors •  Conclusions

Amazon'16

CMU SCS


Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

Amazon'16

CMU SCS


Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns? –  2x the friends, 2x the triangles ?

Amazon'16

CMU SCS


Triangle Law: #S.3 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

Amazon'16

CMU SCS


Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax

2) Q: Can we do that quickly? A:

details

Amazon'16

CMU SCS


Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax

2) Q: Can we do that quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)

Amazon'16

A x = λ x

details

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

37 Amazon'16 37 (c) 2016, C. Faloutsos

? ?

?

CMU SCS




CMU SCS




CMU SCS




CMU SCS




CMU SCS

MORE Graph Patterns


✔ ✔ ✔

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

CMU SCS

MORE Graph Patterns


•  Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)

•  Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.

CMU SCS


Roadmap


– Patterns – Anomaly / fraud detection

•  CopyCatch •  Spectral methods (‘fBox’) •  Belief Propagation

•  Part#2: time-evolving graphs; tensors •  Conclusions Amazon'16

Patterns anomalies

CMU SCS

Fraud •  Given

– Who ‘likes’ what page, and when

•  Find – Suspicious users and suspicious

products


CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

CMU SCS

Fraud •  Given

– Who ‘likes’ what page, and when

•  Find – Suspicious users and suspicious

products


CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes

CMU SCS

Our intuition ▪  Lockstep behavior: Same Likes, same time

Graph Patterns and Lockstep Behavior

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes Amazon'16 47 (c) 2016, C. Faloutsos

CMU SCS



Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

B CD A E

Reordered Pages

4

3 2 1

Reord

ere

d U

sers

0

10

20

30

40

50

60

70

80

90

Tim

e

Likes Amazon'16 48 (c) 2016, C. Faloutsos

CMU SCS



Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

B CD A E

Reordered Pages

4

3 2 1

Reord

ere

d U

sers

0

10

20

30

40

50

60

70

80

90

Tim

e

Suspicious Lockstep Behavior Likes

Amazon'16 49 (c) 2016, C. Faloutsos

CMU SCS

MapReduce Overview ▪  Use Hadoop to search for

many clusters in parallel:

1.  Start with randomly seed

2.  Update set of Pages and center Like times for each cluster

3.  Repeat until convergence

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes


CMU SCS

Deployment at Facebook ▪  CopyCatch runs regularly (along with many other

security mechanisms, and a large Site Integrity team)

08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01

Num

ber

of u

sers

cau

ght

Date of CopyCatch run

3 months of CopyCatch @ Facebook

#users caught

time Amazon'16 51 (c) 2016, C. Faloutsos

CMU SCS

Deployment at Facebook

23%

58%

5%9%

5%

Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering

Manually labeled 22 randomly selected clusters from February 2013

Fake acct


CMU SCS

Deployment at Facebook

23%

58%

5%9%

5%

Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering

Manually labeled 22 randomly selected clusters from February 2013

Most clusters (77%) come from real but compromised users

Fake acct


CMU SCS


Roadmap





CMU SCS


Problem: Social Network Link Fraud

Amazon'16

Target: find “stealthy” attackers missed by other algorithms

Clique

Bipartite core

41.7M nodes 1.5B edges

CMU SCS


Problem: Social Network Link Fraud

Amazon'16

Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective. ICDM 2014, Shenzhen, China.

Target: find “stealthy” attackers missed by other algorithms

Takeaway: use reconstruction error between true/latent representation!

CMU SCS


Roadmap



•  CopyCatch •  Spectral methods (‘fBox’, suspiciousness) •  Belief Propagation


CMU SCS

Suspicious Patterns in Event Data


A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.

2-modes

n-modes

?

? ?

CMU SCS



Which is more suspicious? 20,000 Users Retweeting same 20 tweets 6 times each All in 10 hours

225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses

vs.

ICDM 2015

Answer: volume * DKL(p|| pbackground)

CMU SCS



A General Suspiciousness Metric for Dense Blocksin Multimodal Data

AnonymousAnonymousAnonymous




Abstract—Which seems more suspicious: 5,000 tweets from200 users on 5 IP addresses, or 10,000 tweets from 500 userson 500 IP addresses but all with the same hashtag and all in10 minutes? The literature has many methods that try to finddense blocks in matrices, and, recently, tensors, but no methodgives a principled way to score the suspiciouness of dense blockswith different numbers of modes and rank them to draw humanattention accordingly. Dense blocks are worth inspecting, typicallyindicating fraud, emerging trends, or some other noteworthydeviation from the usual. Our main contribution is that we showhow to unify these methods and how to give a principled answer toquestions like the above. Specifically, (a) we give a list of axiomsthat any metric of suspicousness should satisfy; (b) we propose anintuitive, principled metric that satisfies the axioms, and is fast tocompute; (c) we propose CROSSSPOT, an algorithm to spot denseregions, and sort them in importance (“suspiciousness”) order.Finally, we apply CROSSSPOT to real data, where it improvesthe F1 score over previous techniques by 68% and finds retweet-boosting and hashtag-hijacking in social datasets spanning 0.3

billion posts.

I. INTRODUCTION

Imagine your job at Twitter is to detect when fraudstersare trying to manipulate the most popular tweets for a giventrending topic. Given time pressure, which is more worthy ofyour investigation: 2,000 Twitter users, all retweeting the same20 tweets, 4 to 6 times each; or 225 Twitter users, retweetingthe same 1 tweet, 10 to 15 times each? Now, what if the latterbatch of activity happened within 3 hours, while the formerspanned 10 hours? What if all 225 users of the latter groupused the same 2 IP addresses?

Figure 1 shows an example of these patterns from TencentWeibo, one of the largest microblogging platforms in China;our method CROSSSPOT detected a block of 225 users, using2 IP addresses (“blue circle” and “red cross”), retweeting thesame tweet 27K times, within 200 minutes. Further, manualinspection shows that several of these users get activated every5 minutes. This type of lockstep behavior is suspicious (say,due to automated scripts), and it leads to dense blocks, asin Figure 1. These blocks may span several modes (user-id,timestamp, hashtag, etc.). Although our main motivation isfraud detection in a Twitter-like setting, our proposed approachis suitable for numerous other settings, like distributed-denial-of-service (DDoS) attacks, link fraud, click fraud, even health-insurance fraud, as we discuss next.

Thus, the core question we ask in this paper is: what isthe right way to compare the severity/suspiciousness/surpriseof two dense blocks, that span 2 or more modes? Informally,the problem is:

200 Minutes

225Users

27,313Retweets

...

· · ·

10:09

10:14

10:19

10:24

10:32

10:37

11:04

15:37

21:13

. . .“John123” l : : l l l“Joe456” l : l : : : :“Job789” l : l : l l l“Jack135” : l : : : l : l“Jill246” : l l l :

“Jane314” : : l l l l l“Jay357” l l l l : l

.

.

.

Fig. 1. Density in multiple modes is suspicious. Left: A dense block of 225users on Tencent Weibo (Chinese Twitter) retweeting one tweet 27,313 timesfrom 2 IP addresses over 200 minutes. Right: magnification of a subset of thisblock. l and : indicate the two IP addresses used. Notice how synchronizedthe behavior is, across several modes (IP-address, user-id, timestamp).

Informal Problem 1 (Suspiciousness score) Given a K-mode dataset (tensor) X , with counts of events (that arenon-negative integer values), and two subtensors Y

1

and Y2

,which is more suspicious and worthy of further investigation?

Why multimodal data (tensor): Graphs and social networkshave attracted huge interest - and they are perfectly modeled asK=2 mode datasets, that is, matrices. With K=2 modes we canmodel Twitter’s “who-follows-whom” network [1][2], Face-book’s “who-friends-whom” and “who-Likes-what” graphs[3], eBay’s “who-buys-from-whom” graph [4], financial ac-tivities of “who-trades-what-stocks”, and scientific relationsof “who-cites-whom.” Several high-impact datasets make useof higher mode relations. With K=3 modes, we can considerhow all of the above graphs change over time or what wordsare used in product reviews on eBay or Amazon. With K=4modes, we can analyze network traces for intrusion detectionand distributed denial of service (DDoS) attacks by lookingfor patterns in the source IP, destination IP, destination port,and timestamp [5][6]. Health-insurance fraud detection isanother example, with (patient-id, doctor-id, prescription-id,timestamp) where corrupt doctors prescribe fake, expensivemedicine to senile or corrupt patients [7][8].

Why are dense regions worth inspecting: Dense regionsare surprising in all of the examples above. Past work hasrepeatedly found that dense regions in these tensors correspondto suspicious, lockstep behavior: Purchased Page Likes onFacebook result in a few users “Liking” the same “Pages”always at the same time (when the order for the Page Likes isplaced) [3]. Spammers paid to write deceptively high (or low)reviews for restaurants or hotels will reuse the same accountsand often even the same text [9][10]. Zombie followers, botnetswho are set up to build social links, will inflate the numberof followers to make their customers seem more popularthan they actually are [1][11]. This high-density outcome hasa reason: Spammers have constrained resources (users, IP

Retweeting: “Galaxy Note Dream Project: Happy Happy Life Traveling the World”

Recall0 0.5 1

Prec

isio

n

0

0.2

0.4

0.6

0.8

1

CrossSpotHOSVD (r = 20)HOSVD (r = 10)HOSVD (r = 5)MAF

CrossSpot

Mass of injected 30*30*30 blocks512 256 128 64 32 16

Rec

all

0

0.2

0.4

0.6

0.8

1

HOSVDCrossSpot (HOSVD seed)

CrossSpot

HOSVD

(a) Performance (b) Recall with HOSVD seed

Fig. 3. Finding dense blocks: CROSSSPOT outperforms baselines in finding3-mode blocks, and directly method improves the recall on top of HOSVD.

• SVD has small recall but high precision. However, theSVD can hardy catch small, sparse injected blocks suchas 30⇥30 submatrices of mass 16 and 32, even thoughthey are denser than the background. Higher decomposi-tion rank brings higher classification accuracy.

Finding dense high-order blocks in multimodal data: Wegenerate random tensor data with parameters as (1) the numberof modes k=3, (2) the size of data N

1

=1,000, N2

=1,000and N

3

=1,000 and (3) the mass of data C=10,000. Weinject b=6 blocks of k0=3 modes into the random data, so,I = {1, 2, 3}. Each block has size 30⇥30⇥30 and massc2{16, 32, 64, 128, 256, 512}. The task is again to classify thetensor entries into suspicious and normal classes. Figure 3(a)reports the performances of CROSSSPOT and baselines. Weobserve that in order to find all the six 3-mode injected blocks,our proposed CROSSSPOT has better performance in precisionand recall than baselines. The best F1 score CROSSSPOT givesis 0.891, which is 46.0% higher than the F1 score given bythe best of HOSVD (0.610). If we use the results of HOSVDas seeds to CROSSSPOT, the best F1 score of CROSSSPOTreaches 0.979. Figure 3(b) gives the recall value of everyinjected block. We observe that CROSSSPOT improves therecall over HOSVD, especially on slightly sparser blocks.

Finding dense low-order blocks in multimodal data: Wegenerate random tensor data with parameters as (1) the numberof modes k=3, (2) the size of data N

1

=1,000, N2

=1,000 andN

3

=1,000 and (3) the mass of data C=10,000. We inject b=4blocks into the random data:

• Block #1: The number of modes is k01

=3 and I1

={1,2,3}.The size is 30⇥30⇥30 and the block’s mass is c

1

=512.• Block #2: The number of modes is k0

2

=2 and I2

={1,2}.The size is 30⇥30⇥1,000 and the block’s mass is c

2

=512.• Block #3: k0

3

=2, I3

={1,3}; 30⇥1,000⇥30 of c3

=512.• Block #4: k0

4

=2, I4

={2,3}; 1,000⇥30⇥30 of c4

=512.

Note, blocks 2-4 are dense in only 2 modes and random in thethird mode. Table V report the classification performances ofCROSSSPOT and baselines. We show the overall evaluations(precision, recall and F1 score) and recall value of every block.We observe that CROSSSPOT has 100% recall in catching the3-mode block #1, while the baselines have 85-95% recall.More impressively, CROSSSPOT successfully catches the 2-mode blocks, where HOSVD has difficulty and low recall.The F1 score of overall evaluation is as large as 0.972 with68.8% improvement.

Testing robustness of the random seed number: We test howthe performance of our CROSSSPOT improves when we usemore seed blocks in the low-order block detection experiments.

(a) Robustness (b) Convergence

Fig. 4. CROSSSPOT is robust to the number of random seeds. In detectingthe 4 low-order blocks, when we use 41 seeds, the best F1 score has reportedthe final result of as many as 1,000 seeds. CROSSSPOT converges very fast:the average number of iterations is 2.87.

TABLE VI. BIG DENSE BLOCKS WITH TOP METRIC VALUESDISCOVERED IN THE RETWEETING DATASET.

# User⇥tweet⇥IP⇥minute Mass c Suspiciousness

CROSSSPOT1 14⇥1⇥2⇥1,114 41,396 1,239,8652 225⇥1⇥2⇥200 27,313 777,7813 8⇥2⇥4⇥1,872 17,701 491,323

HOSVD1 24⇥6⇥11⇥439 3,582 131,1132 18⇥4⇥5⇥223 1,942 74,0873 14⇥2⇥1⇥265 9,061 381,211

Figure 4(a) shows the best F1 score for different numbers ofrandom seeds. We find that when we use 41 random seeds, thebest F1 score is close to the results when we use as many as1,000 random seeds. Thus, once we exceed a moderate numberof random seeds, the performance is fairly robust to the numberof random seeds.

Efficiency analysis: CROSSSPOT can be parallelized intomultiple machines to search dense blocks with different setsof random seeds. The time cost of every iteration is linearin the number of non-zero entries in the multimodal dataas we have discussed in Section V. Figure 4(b) reports thecounts of iterations in the procedure of 1000 random seeds.We observe that usually CROSSSPOT takes 2 or 3 iterations tofinish the local search. Each iteration takes only 5.6 seconds.Tensor decompositions such as HOSVD and PARAFAC usedin MAF often take more time. On the same machine, HOSVDmethods of rank r=5, 10 and 20 take 280, 1750, 34,510seconds respectively. From Table V and Figure 4(a), evenwithout parallelization, we know that CROSSSPOT takes only230 seconds to have the best F1 score 0.972, while HOSVDneeds more time (280 seconds if r=5) to have a much smallerF1 score 0.324.

D. Retweeting Boosting

Table VI shows big, dense block patterns of TencentWeibo retweeting dataset. CROSSSPOT reports blocks of highmass and high density. For example, we spot that 14 usersretweet the same content for 41,396 times on 2 IP addressesin 19 hours. Their coordinated, suspicious behaviors resultin a few tweets that seem extremely popular. We observethat CROSSSPOT catches more suspicious (bigger and denser)blocks than HOSVD does: HOSVD evaluates the number ofretweets per user, item, IP, or minute, but does not considerthe block’s density, mass nor the background.

Table VII shows an example of retweeting boosting fromthe big, dense 225⇥1⇥2⇥200 block reported by our proposedCROSSSPOT. A group of users (e.g., A, B, C) retweet thesame message “Galaxy note dream project: Happy happy life

ICDM 2015

CMU SCS


Roadmap





CMU SCS


E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

CMU SCS



CMU SCS



CMU SCS


E-bay Fraud detection - NetProbe

CMU SCS

Popular press

And less desirable attention: •  E-mail from ‘Belgium police’ (‘copy of

your code?’) Amazon'16 (c) 2016, C. Faloutsos 72

CMU SCS


Roadmap



•  CopyCatch •  Spectral methods (‘fBox’) •  Belief Propagation; fast computation & unification


CMU SCS

UnifyingGuilt-by-AssociationApproaches:TheoremsandFastAlgorithms

Danai Koutra U Kang

Hsing-Kuo Kenneth Pao

Tai-You Ke Duen Horng (Polo) Chau

Christos Faloutsos

ECML PKDD, 5-9 September 2011, Athens, Greece

CMU SCS Problem Definition:

GBA techniques


Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects)

Amazon'16

CMU SCS

Are they related? •  RWR (Random Walk with Restarts)

–  google’s pageRank (‘if my friends are important, I’m important, too’)

•  SSL (Semi-supervised learning) – minimize the differences among neighbors

•  BP (Belief propagation) –  send messages to neighbors, on what you

believe about them


CMU SCS

Are they related? •  RWR (Random Walk with Restarts)

–  google’s pageRank (‘if my friends are important, I’m important, too’)

•  SSL (Semi-supervised learning) – minimize the differences among neighbors

•  BP (Belief propagation) –  send messages to neighbors, on what you

believe about them


YES!

CMU SCS

Correspondence of Methods


Method Matrix Unknown known RWR [I – c AD-1] × x = (1-c)ySSL [I + a(D - A)] × x = y

FABP [I + a D - c’A] × bh = φh

0 1 0 1 0 1 0 1 0

? 0 1 1

1 1 1

d1 d2

d3 final

labels/ beliefs

prior labels/ beliefs

adjacency matrix

Amazon'16

CMU SCS Results: Scalability


FABP is linear on the number of edges.

# of edges (Kronecker graphs)

runt

ime

(min

)

Amazon'16

CMU SCS

Summary of Part#1 •  *many* patterns in real graphs

– Power-laws everywhere – Gaussian trap

•  Avg << Max

– Long (and growing) list of tools for anomaly/fraud detection


Patterns anomalies

CMU SCS


Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

– P2.1: time-evolving graphs –  [P2.2: with side information (‘coupled’ M.T.F.) – Speed]

•  Conclusions

Amazon'16

CMU SCS


Part 2: Time evolving

graphs; tensors

CMU SCS

Graphs over time -> tensors! •  Problem #2.1:

– Given who calls whom, and when – Find patterns / anomalies


smith

CMU SCS




CMU SCS




Mon Tue

CMU SCS



Amazon'16 (c) 2016, C. Faloutsos 98 callee

caller

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’:

– Given author-keyword-date – Find patterns / anomalies

Amazon'16 (c) 2016, C. Faloutsos 99 keyword

author

MANY more settings, with >2 ‘modes’

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’’:

– Given subject – verb – object facts – Find patterns / anomalies

Amazon'16 (c) 2016, C. Faloutsos 100 object

subject

MANY more settings, with >2 ‘modes’

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’’’:

– Given <triplets> – Find patterns / anomalies

Amazon'16 (c) 2016, C. Faloutsos 101 mode2

mode1

MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

CMU SCS

Graphs & side info •  Problem #2.2: coupled (eg., side info)

– Given subject – verb – object facts •  And voxel-activity for each subject-word

– Find patterns / anomalies

Amazon'16 (c) 2016, C. Faloutsos 102 object

subject

fMRI voxel activity `apple tastes sweet’

SKIP

CMU SCS

Graphs & side info •  Problem #2.2: coupled (eg., side info)

– Given subject – verb – object facts •  And voxel-activity for each subject-word

– Find patterns / anomalies

Amazon'16 (c) 2016, C. Faloutsos 103 ‘sweet’

‘apple’

fMRI voxel activity `apple tastes sweet’

‘apple’

SKIP

CMU SCS


Roadmap


– P2.1: time-evolving graphs –  [P2.2: with side information (‘coupled’ M.T.F.) – Speed]

•  Conclusions

Amazon'16

CMU SCS Answer to both: tensor

factorization •  Recall: (SVD) matrix factorization: finds

blocks


N users

M products

‘meat-eaters’ ‘steaks’

‘vegetarians’ ‘plants’

‘kids’ ‘cookies’

~ + +

CMU SCS Answer to both: tensor

factorization •  PARAFAC decomposition


= + + subject

object

politicians artists athletes

CMU SCS

Answer: tensor factorization •  PARAFAC decomposition •  Results for who-calls-whom-when

–  4M x 15 days


= + + caller

callee

?? ?? ??

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity


=


evolving graphs


~200 calls to EACH receiver on EACH day!



=


evolving graphs


~200 calls to EACH receiver on EACH day! Amazon'16 110 (c) 2016, C. Faloutsos

=

Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

CMU SCS


Roadmap


– P2.1: Discoveries @ phonecall network –  [P2.2: Discoveries in neuro-semantics – Speed]

•  Conclusions

Amazon'16

CMU SCS Coupled Matrix-Tensor Factorization

(CMTF)

Y X

a1 aF

b1 bF

c1 cF

a1 aF

d1 dF


+

+

user

product

User-attr.

CMU SCS Coupled Matrix-Tensor Factorization

(CMTF)

Y X

a1 aF

b1 bF

c1 cF

a1 aF

d1 dF


+

+

CMU SCS

Neuro-semantics

13�

�

3.�Additional�Figures�and�legends��

Figure�S1.�Presentation�and�set�of�exemplars�used�in�the�experiment. Participants were presented 60 distinct word-picture pairs describing common concrete nouns. These consisted of 5 exemplars from each of 12 categories, as shown above. A slow event-related paradigm was employed, in which the stimulus was presented for 3s, followed by a 7s fixation period during which an X was presented in the center of the screen. Images were presented as white lines and characters on a dark background, but are inverted here to improve readability. The entire set of 60 exemplars was presented six times, randomly permuting the sequence on each presentation.

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

•  Brain Scan Data*

•  9 persons •  60 nouns

•  Questions •  218 questions •  ‘is it alive?’, ‘can

you eat it?’








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from


*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html

SKIP

CMU SCS

Neuro-semantics

13�

�






you eat it?’


Patterns?








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

SKIP

CMU SCS

Neuro-semantics

13�

�






you eat it?’

Tofullyspecify

amodelwithint

hiscom-

putationalmode

lingframework,o

nemustfirst

defineasetofin

termediateseman

ticfeatures

f 1(w)f 2(w)…f n(w)tobeextract

edfromthetext

corpus.Inthispa

per,eachintermed

iatesemantic

featureisdefined

intermsoftheco

-occurrence

statisticsofthein

putstimuluswo

rdwwitha

particularotherwo

rd(e.g.,“taste”)or

setofwords

(e.g.,“taste,”“tas

tes,”or“tasted”)w

ithinthetext

corpus.Themode

listrainedbythe

applicationof

multipleregressio

ntothesefeature

sf i(w)andthe

observedfMRIim

ages,soastoobt

ainmaximum-

likelihoodestima

tesforthemodel

parametersc vi

(26).Oncetrained,

thecomputational

modelcanbe

evaluatedbygivin

gitwordsoutside

thetraining

setandcomparin

gitspredictedfM

RIimagesfor

thesewordswith

observedfMRIda

ta.Thiscom

putationalmodeli

ngframeworkis

basedontwokey

theoreticalassump

tions.First,it

assumesthesema

nticfeaturesthatd

istinguishthe

meaningsofarbit

raryconcretenoun

sarereflected

inthestatisticsof

theirusewithina

verylargetext

corpus.Thisassum

ptionisdrawnfro

mthefieldof

computationallin

guistics,wherest

atisticalword

distributionsare

frequentlyusedt

oapproximate

themeaningofd

ocumentsandw

ords(14–17).

Second,itassume

sthatthebrainact

ivityobserved

whenthinkinga

boutanyconcrete

nouncanbe

derivedasaweig

htedlinearsumo

fcontributions

fromeachofits

semanticfeatures.

Althoughthe

correctnessofthis

linearityassumpt

ionisdebat-

able,itisconsist

entwiththewide

spreaduseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptionthatf

MRIactivationo

ftenreflectsa

linearsuperpositio

nofcontributions

fromdifferent

sources.Ourtheore

ticalframeworkdo

esnottakea

positiononwheth

ertheneuralactiv

ationencoding

meaningislocali

zedinparticular

corticalre-

gions.Instead,it

considersallcort

icalvoxelsand

allowsthetrainin

gdatatodetermin

ewhichloca-

tionsaresystemat

icallymodulated

bywhichas-

pectsofwordme

anings.

Results.Weevaluated

thiscomputational

mod-elusing

fMRIdatafromn

inehealthy,college

-ageparticipan

tswhoviewed60

differentword-pic

turepairspre

sentedsixtimes

each.Anatomicall

yde-finedreg

ionsofinterestwe

reautomaticallyla

beledaccording

tothemethodolog

yin(28).The60

ran-domlyo

rderedstimuliinc

ludedfiveitems

fromeachof1

2semanticcatego

ries(animals,body

parts,buildings

,buildingparts,clo

thing,furniture,in

sects,kitcheni

tems,tools,vegeta

bles,vehicles,and

otherman-mad

eitems).Areprese

ntativefMRIimage

foreachstim

uluswascreatedb

ycomputingthem

eanfMRIre

sponseoveritssi

xpresentations,a

ndthemeanof

all60oftheserep

resentativeimages

wasthensub

tractedfromeach

[fordetails,see(

26)].Toinstan

tiateourmodeling

framework,wefir

stchoseas

etofintermediate

semanticfeatures.

Tobeeffective,

theintermediatese

manticfeaturesm

ustsimultane

ouslyencodethew

idevarietyofsema

nticcontento

ftheinputstimul

uswordsandfacto

rtheobserved

fMRIactivationin

tomoreprimitivec

om-

Predicted

“celery” = 0.84

“celery”“airplan

e”

Predicted:

Observed:

AB

+...

high

average below average

Predicted “celer

y”:+ 0.35+ 0.32

“eat”“taste”

“fill”

Fig.2.Predicting

fMRIimages

forgivenstimulus

words.(A)

Formingapredict

ionforpar-

ticipantP1fort

hestimulus

word“celery”afte

rtrainingon

58otherwords.Le

arnedc vico-efficients

for3ofthe25s

e-manticfe

atures(“eat,”“tast

e,”and“fill”

)aredepictedby

thevoxelcolo

rsinthethreeima

gesatthetop

ofthepanel.Thec

o-occurrenc

evalueforeacho

fthesefeaturesfor

thestimulusword

“celery”is

showntothelefto

ftheirrespectiveim

ages[e.g.,thevalu

efor“eat(celery)”i

s0.84].The

predictedactivation

forthestimuluswo

rd[shownatthebo

ttomof(A)]isal

inearcombinationo

fthe25semantic

fMRIsignatures,w

eightedby

theirco-occurrence

values.Thisfigure

showsjustonehor

izontalslice[z=

–12mminMontr

ealNeurological

Institute(MNI)spa

ce]ofthepredict

edthree-dim

ensionalimage.(

B)Predictedand

observedfMRIima

gesfor“celery”a

nd“airplane”after

trainingthatuses5

8otherwords.The

twolongredandb

lueverticalstreaks

nearthetop(post

eriorregion)ofthe

predictedandobse

rvedimagesaret

heleftandrightfu

siformgyri.

A B

C

Mean overparticipants

Participant P5

Fig.3.Locations

ofmostac

curatelypre-

dictedvoxels.Sur

face(A)andg

lassbrain(B)

renderingofthecor

rela-tionbetw

eenpredicted

andactualvoxelac

tiva-tionsfor

wordsoutside

thetrainingsetfor

par-ticipantP

5.Thesepanelssho

wclusterscontaining

atleast10contigu

ousvoxels,eachof

whosepredicted-

actualcorrelationis

atleast0.28.These

voxelclustersared

istributedthroughou

tthecortexan

dlocatedinthelef

tandrightoccipit

alandparietallobe

s;leftandrightfu

siform,postcentra

l,andmiddlefronta

lgyri;leftinferiorfr

ontalgyrus;medial

frontalgyrus;anda

nteriorcingulate.

(C)Surfacerenderi

ngofthepredicted-

actualcorrelation

averagedoveralln

ineparticipan

ts.Thispanelrepres

entsclusterscontai

ningatleast10co

ntiguousvoxels,ea

chwithaveragec

orrelationofatleast

0.14.

30MAY2008V

OL320SCIENC

Ewww.science

mag.org1192RESEAR

CHARTICLES

on May 30, 2008 www.sciencemag.org Downloaded from

Tofullyspecify

amodelwithin

thiscom-

putationalmode

lingframework

,onemustfirst

defineasetof

intermediatese

manticfeatures

f 1(w)f 2(w)…f n(w)tobeextrac

tedfromthetext

corpus.Inthisp

aper,eachinter

mediatesemanti

cfeature

isdefinedinte

rmsoftheco-o

ccurrence

statisticsofthe

inputstimulus

wordwwitha

particularotherw

ord(e.g.,“taste”

)orsetofword

s(e.g.,“ta

ste,”“tastes,”or

“tasted”)within

thetextcorpus.

Themodelistra

inedbytheappl

icationof

multipleregressio

ntothesefeatu

resf i(w)andthe

observedfMRI

images,soasto

obtainmaximum

-likelihoo

destimatesfor

themodelparam

etersc vi

(26).Oncetraine

d,thecomputatio

nalmodelcanbe

evaluatedbygiv

ingitwordsou

tsidethetrainin

gsetand

comparingitsp

redictedfMRIim

agesforthesew

ordswithobser

vedfMRIdata.

Thiscomputatio

nalmodeling

frameworkis

basedontwok

eytheoreticalas

sumptions.First

,itassumes

thesemanticfea

turesthatdisting

uishthemeaning

sofarbitraryco

ncretenounsare

reflected

inthestatisticso

ftheirusewithi

naverylargetex

tcorpus.

Thisassumption

isdrawnfromthe

fieldofcomputa

tionallinguistic

s,wherestatist

icalword

distributionsare

frequentlyused

toapproximate

themeaningof

documentsand

words(14–17).

Second,itassum

esthatthebrain

activityobserve

dwhenth

inkingaboutan

yconcretenou

ncanbe

derivedasawe

ightedlinearsu

mofcontributio

nsfromea

chofitsseman

ticfeatures.Alth

oughthe

correctnessofth

islinearityassu

mptionisdebat

-able,it

isconsistentw

iththewidesprea

duseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptiontha

tfMRIactivatio

noftenreflects

alinearsu

perpositionofc

ontributionsfrom

differentsources.

Ourtheoreticalfr

ameworkdoesn

ottakeaposition

onwhetherthen

euralactivation

encoding

meaningisloc

alizedinpartic

ularcorticalre

-gions.I

nstead,itconsid

ersallcorticalv

oxelsand

allowsthetrain

ingdatatodeterm

inewhichloca-

tionsaresystem

aticallymodulat

edbywhichas-

pectsofwordm

eanings.

Results.Weevaluate

dthiscomputatio

nalmod-

elusingfMRId

atafromninehea

lthy,college-age

participantswho

viewed60diffe

rentword-pictur

epairspr

esentedsixtime

seach.Anatom

icallyde-

finedregionsofi

nterestwereauto

maticallylabele

daccordin

gtothemethodo

logyin(28).Th

e60ran-

domlyordered

stimuliincluded

fiveitemsfrom

eachof12sema

nticcategories(a

nimals,bodypart

s,building

s,buildingparts

,clothing,furnit

ure,insects,

kitchenitems,to

ols,vegetables,

vehicles,andoth

erman-ma

deitems).Arepr

esentativefMRI

imagefor

eachstimuluswa

screatedbycomp

utingthemean

fMRIresponse

overitssixpres

entations,andth

emeanof

all60oftheser

epresentativeim

ageswas

thensubtractedf

romeach[ford

etails,see(26)]

.Toinsta

ntiateourmodeli

ngframework,w

efirstchosea

setofintermedia

tesemanticfeatu

res.Tobe

effective,theint

ermediatesema

nticfeaturesmu

stsimultan

eouslyencodeth

ewidevarietyof

semanticcontent

oftheinputstim

uluswordsand

factorthe

observedfMRIa

ctivationintomor

eprimitivecom-

Predicted

“celery” = 0.84

“celery”

“airplane”

Predicted:

Observed:

AB

+...

high

average below averag

e

Predicted “cele

ry”:+ 0.35+ 0.32


“fill”

Fig.2.Predictin

gfMRIimages

forgivenstimu

luswords.(A)

Formingapredic

tionforpar-

ticipantP1for

thestimulus

word“celery”aft

ertrainingon

58otherwords.L

earnedcvico-

efficientsfor3of

the25se-

manticfeatures

(“eat,”“taste,”

and“fill”)ared

epictedbythe

voxelcolorsinth

ethreeimages

atthetopofthe

panel.Theco-

occurrencevalue

foreachofthese

featuresforthes

timulusword“ce

lery”isshownto

theleftoftheirre

spectiveimages[

e.g.,thevaluefor

“eat(celery)”is

0.84].Thepredict

edactivationfor

thestimulusword

[shownatthebo

ttomof(A)]isa

linearcombinatio

nofthe25sema

nticfMRIsignatu

res,weightedby

theirco-occurren

cevalues.Thisf

igureshowsjust

onehorizontals

lice[z=

–12mminMont

realNeurologica

lInstitute(MNI)

space]ofthepr

edictedthree-dim

ensionalimage.

(B)Predictedan

dobservedfMR

Iimagesfor

“celery”and“air

plane”aftertrain

ingthatuses58

otherwords.The

twolongredand

blueverticalstre

aksnearthetop

(posteriorregion

)ofthepredicte

dandobs

ervedimagesare

theleftandrigh

tfusiformgyri.

A B

C


Participant P5

Fig.3.Location

sofmostac

curatelypre-

dictedvoxels.S

urface(A)and

glassbrain(B)

renderingofthe

correla-tionbet

weenpredicted

andactualvoxel

activa-tionsfor

wordsoutside

thetrainingsetf

orpar-ticipantP

5.Thesepanelssh

owclusterscontai

ningatleast10c

ontiguousvoxels,

eachofwhose

predicted-actualc

orrelationisatleas

t0.28.Thesevox

elclustersaredis

tributedthrougho

utthecortexan

dlocatedinthele

ftandrightoccip

italandparietal

lobes;leftandri

ghtfusiform,

postcentral,andm

iddlefrontalgyri;

leftinferiorfronta

lgyrus;medialfron

talgyrus;andant

eriorcingulate

.(C)Surfaceren

deringofthepr

edicted-actualcor

relationaveraged

overallnine

participants.This

panelrepresents

clusterscontaining

atleast10contig

uousvoxels,each

withaverage

correlationofatle

ast0.14.

30MAY2008

VOL320SCI

ENCEwww.sc

iencemag.org

1192RESEARCHART

ICLES


Tofullyspecify

amodelwithint

hiscom-

putationalmode

lingframework,o

nemustfirst

defineasetofin

termediateseman

ticfeatures

f 1(w)f 2(w)…f n(w)tobeextract

edfromthetext

corpus.Inthispa

per,eachintermed

iatesemantic

featureisdefined

intermsoftheco

-occurrence

statisticsofthein

putstimuluswo

rdwwitha

particularotherwo

rd(e.g.,“taste”)or

setofwords

(e.g.,“taste,”“tas

tes,”or“tasted”)w

ithinthetext

corpus.Themode

listrainedbythe

applicationof

multipleregressio

ntothesefeature

sf i(w)andthe

observedfMRIim

ages,soastoobt

ainmaximum-

likelihoodestima

tesforthemodel

parametersc vi

(26).Oncetrained,

thecomputational

modelcanbe

evaluatedbygivin

gitwordsoutside

thetraining

setandcomparin

gitspredictedfM

RIimagesfor

thesewordswith

observedfMRIda

ta.Thiscom

putationalmodeli

ngframeworkis

basedontwokey

theoreticalassump

tions.First,it

assumesthesema

nticfeaturesthatd

istinguishthe

meaningsofarbit

raryconcretenoun

sarereflected

inthestatisticsof

theirusewithina

verylargetext

corpus.Thisassum

ptionisdrawnfro

mthefieldof

computationallin

guistics,wherest

atisticalword

distributionsare

frequentlyusedt

oapproximate

themeaningofd

ocumentsandw

ords(14–17).

Second,itassume

sthatthebrainact

ivityobserved

whenthinkinga

boutanyconcrete

nouncanbe

derivedasaweig

htedlinearsumo

fcontributions

fromeachofits

semanticfeatures.

Althoughthe

correctnessofthis

linearityassumpt

ionisdebat-

able,itisconsist

entwiththewide

spreaduseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptionthatf

MRIactivationo

ftenreflectsa

linearsuperpositio

nofcontributions

fromdifferent

sources.Ourtheore

ticalframeworkdo

esnottakea

positiononwheth

ertheneuralactiv

ationencoding

meaningislocali

zedinparticular

corticalre-

gions.Instead,it

considersallcort

icalvoxelsand

allowsthetrainin

gdatatodetermin

ewhichloca-

tionsaresystemat

icallymodulated

bywhichas-

pectsofwordme

anings.

Results.Weevaluated

thiscomputational

mod-elusing

fMRIdatafromn

inehealthy,college

-ageparticipan

tswhoviewed60

differentword-pic

turepairspre

sentedsixtimes

each.Anatomicall

yde-finedreg

ionsofinterestwe

reautomaticallyla

beledaccording

tothemethodolog

yin(28).The60

ran-domlyo

rderedstimuliinc

ludedfiveitems

fromeachof1

2semanticcatego

ries(animals,body

parts,buildings

,buildingparts,clo

thing,furniture,in

sects,kitcheni

tems,tools,vegeta

bles,vehicles,and

otherman-mad

eitems).Areprese

ntativefMRIimage

foreachstim

uluswascreatedb

ycomputingthem

eanfMRIre

sponseoveritssi

xpresentations,a

ndthemeanof

all60oftheserep

resentativeimages

wasthensub

tractedfromeach

[fordetails,see(

26)].Toinstan

tiateourmodeling

framework,wefir

stchoseas

etofintermediate

semanticfeatures.

Tobeeffective,

theintermediatese

manticfeaturesm

ustsimultane

ouslyencodethew

idevarietyofsema

nticcontento

ftheinputstimul

uswordsandfacto

rtheobserved

fMRIactivationin

tomoreprimitivec

om-

Predicted

“celery” = 0.84

“celery”“airplan

e”

Predicted:

Observed:

AB

+...

high

average below average

Predicted “celer

y”:+ 0.35+ 0.32


“fill”

Fig.2.Predicting

fMRIimages

forgivenstimulus

words.(A)

Formingapredict

ionforpar-

ticipantP1fort

hestimulus

word“celery”afte

rtrainingon

58otherwords.Le

arnedc vico-efficients

for3ofthe25s

e-manticfe

atures(“eat,”“tast

e,”and“fill”

)aredepictedby

thevoxelcolo

rsinthethreeima

gesatthetop

ofthepanel.Thec

o-occurrenc

evalueforeacho

fthesefeaturesfor

thestimulusword

“celery”is

showntothelefto

ftheirrespectiveim

ages[e.g.,thevalu

efor“eat(celery)”i

s0.84].The

predictedactivation

forthestimuluswo

rd[shownatthebo

ttomof(A)]isal

inearcombinationo

fthe25semantic

fMRIsignatures,w

eightedby

theirco-occurrence

values.Thisfigure

showsjustonehor

izontalslice[z=

–12mminMontr

ealNeurological

Institute(MNI)spa

ce]ofthepredict

edthree-dim

ensionalimage.(

B)Predictedand

observedfMRIima

gesfor“celery”a

nd“airplane”after

trainingthatuses5

8otherwords.The

twolongredandb

lueverticalstreaks

nearthetop(post

eriorregion)ofthe

predictedandobse

rvedimagesaret

heleftandrightfu

siformgyri.

A B

C


Participant P5

Fig.3.Locations

ofmostac

curatelypre-

dictedvoxels.Sur

face(A)andg

lassbrain(B)

renderingofthecor

rela-tionbetw

eenpredicted

andactualvoxelac

tiva-tionsfor

wordsoutside

thetrainingsetfor

par-ticipantP

5.Thesepanelssho

wclusterscontaining

atleast10contigu

ousvoxels,eachof

whosepredicted-

actualcorrelationis

atleast0.28.These

voxelclustersared

istributedthroughou

tthecortexan

dlocatedinthelef

tandrightoccipit

alandparietallobe

s;leftandrightfu

siform,postcentra

l,andmiddlefronta

lgyri;leftinferiorfr

ontalgyrus;medial

frontalgyrus;anda

nteriorcingulate.

(C)Surfacerenderi

ngofthepredicted-

actualcorrelation

averagedoveralln

ineparticipan

ts.Thispanelrepres

entsclusterscontai

ningatleast10co

ntiguousvoxels,ea

chwithaveragec

orrelationofatleast

0.14.

30MAY2008V

OL320SCIENC

Ewww.science

mag.org1192RESEAR

CHARTICLES


Tofullyspecify

amodelwithin

thiscom-

putationalmode

lingframework

,onemustfirst

defineasetof

intermediatese

manticfeatures

f 1(w)f 2(w)…f n(w)tobeextrac

tedfromthetext

corpus.Inthisp

aper,eachinter

mediatesemanti

cfeature

isdefinedinte

rmsoftheco-o

ccurrence

statisticsofthe

inputstimulus

wordwwitha

particularotherw

ord(e.g.,“taste”

)orsetofword

s(e.g.,“ta

ste,”“tastes,”or

“tasted”)within

thetextcorpus.

Themodelistra

inedbytheappl

icationof

multipleregressio

ntothesefeatu

resf i(w)andthe

observedfMRI

images,soasto

obtainmaximum

-likelihoo

destimatesfor

themodelparam

etersc vi

(26).Oncetraine

d,thecomputatio

nalmodelcanbe

evaluatedbygiv

ingitwordsou

tsidethetrainin

gsetand

comparingitsp

redictedfMRIim

agesforthesew

ordswithobser

vedfMRIdata.

Thiscomputatio

nalmodeling

frameworkis

basedontwok

eytheoreticalas

sumptions.First

,itassumes

thesemanticfea

turesthatdisting

uishthemeaning

sofarbitraryco

ncretenounsare

reflected

inthestatisticso

ftheirusewithi

naverylargetex

tcorpus.

Thisassumption

isdrawnfromthe

fieldofcomputa

tionallinguistic

s,wherestatist

icalword

distributionsare

frequentlyused

toapproximate

themeaningof

documentsand

words(14–17).

Second,itassum

esthatthebrain

activityobserve

dwhenth

inkingaboutan

yconcretenou

ncanbe

derivedasawe

ightedlinearsu

mofcontributio

nsfromea

chofitsseman

ticfeatures.Alth

oughthe

correctnessofth

islinearityassu

mptionisdebat

-able,it

isconsistentw

iththewidesprea

duseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptiontha

tfMRIactivatio

noftenreflects

alinearsu

perpositionofc

ontributionsfrom

differentsources.

Ourtheoreticalfr

ameworkdoesn

ottakeaposition

onwhetherthen

euralactivation

encoding

meaningisloc

alizedinpartic

ularcorticalre

-gions.I

nstead,itconsid

ersallcorticalv

oxelsand

allowsthetrain

ingdatatodeterm

inewhichloca-

tionsaresystem

aticallymodulat

edbywhichas-

pectsofwordm

eanings.

Results.Weevaluate

dthiscomputatio

nalmod-

elusingfMRId

atafromninehea

lthy,college-age

participantswho

viewed60diffe

rentword-pictur

epairspr

esentedsixtime

seach.Anatom

icallyde-

finedregionsofi

nterestwereauto

maticallylabele

daccordin

gtothemethodo

logyin(28).Th

e60ran-

domlyordered

stimuliincluded

fiveitemsfrom

eachof12sema

nticcategories(a

nimals,bodypart

s,building

s,buildingparts

,clothing,furnit

ure,insects,

kitchenitems,to

ols,vegetables,

vehicles,andoth

erman-ma

deitems).Arepr

esentativefMRI

imagefor

eachstimuluswa

screatedbycomp

utingthemean

fMRIresponse

overitssixpres

entations,andth

emeanof

all60oftheser

epresentativeim

ageswas

thensubtractedf

romeach[ford

etails,see(26)]

.Toinsta

ntiateourmodeli

ngframework,w

efirstchosea

setofintermedia

tesemanticfeatu

res.Tobe

effective,theint

ermediatesema

nticfeaturesmu

stsimultan

eouslyencodeth

ewidevarietyof

semanticcontent

oftheinputstim

uluswordsand

factorthe

observedfMRIa

ctivationintomor

eprimitivecom-

Predicted

“celery” = 0.84

“celery”

“airplane”

Predicted:

Observed:

AB

+...

high

average below averag

e

Predicted “cele

ry”:+ 0.35+ 0.32


“fill”

Fig.2.Predictin

gfMRIimages

forgivenstimu

luswords.(A)

Formingapredic

tionforpar-

ticipantP1for

thestimulus

word“celery”aft

ertrainingon

58otherwords.L

earnedcvico-

efficientsfor3of

the25se-

manticfeatures

(“eat,”“taste,”

and“fill”)ared

epictedbythe

voxelcolorsinth

ethreeimages

atthetopofthe

panel.Theco-

occurrencevalue

foreachofthese

featuresforthes

timulusword“ce

lery”isshownto

theleftoftheirre

spectiveimages[

e.g.,thevaluefor

“eat(celery)”is

0.84].Thepredict

edactivationfor

thestimulusword

[shownatthebo

ttomof(A)]isa

linearcombinatio

nofthe25sema

nticfMRIsignatu

res,weightedby

theirco-occurren

cevalues.Thisf

igureshowsjust

onehorizontals

lice[z=

–12mminMont

realNeurologica

lInstitute(MNI)

space]ofthepr

edictedthree-dim

ensionalimage.

(B)Predictedan

dobservedfMR

Iimagesfor

“celery”and“air

plane”aftertrain

ingthatuses58

otherwords.The

twolongredand

blueverticalstre

aksnearthetop

(posteriorregion

)ofthepredicte

dandobs

ervedimagesare

theleftandrigh

tfusiformgyri.

A B

C


Participant P5

Fig.3.Location

sofmostac

curatelypre-

dictedvoxels.S

urface(A)and

glassbrain(B)

renderingofthe

correla-tionbet

weenpredicted

andactualvoxel

activa-tionsfor

wordsoutside

thetrainingsetf

orpar-ticipantP

5.Thesepanelssh

owclusterscontai

ningatleast10c

ontiguousvoxels,

eachofwhose

predicted-actualc

orrelationisatleas

t0.28.Thesevox

elclustersaredis

tributedthrougho

utthecortexan

dlocatedinthele

ftandrightoccip

italandparietal

lobes;leftandri

ghtfusiform,

postcentral,andm

iddlefrontalgyri;

leftinferiorfronta

lgyrus;medialfron

talgyrus;andant

eriorcingulate

.(C)Surfaceren

deringofthepr

edicted-actualcor

relationaveraged

overallnine

participants.This

panelrepresents

clusterscontaining

atleast10contig

uousvoxels,each

withaverage

correlationofatle

ast0.14.

30MAY2008

VOL320SCI

ENCEwww.sc

iencemag.org

1192RESEARCHART

ICLES


…

airplane

dog

noun

s

questions

voxels Amazon'16 116 (c) 2016, C. Faloutsos

Patterns?








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from








Predicted:

Observed:

A B

+.. .

high

average

belowaverage


+ 0.35 + 0.32




A

B

C

Mean over

participants

Participant P

5




RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

SKIP

CMU SCS Neuro-semantics

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1

Group 2 Group 4

Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?


Nouns Questions Nouns Questions

Nouns Questions

beetle can it cause you pain?


Nouns Questions



Nouns Questions



Nouns Questions

beetle can it cause you pain?bear does it grow?cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’


bed does it use electricity?house can you sit on it?car does it cast a shadow?


Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very di↵erent from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v

1

and v

2

which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

kv1 � v̂1k2 + kv2 � v̂2k2 < kv1 � v̂2k2 + kv2 � v̂1k2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watch

we obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/

~

epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-

SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

117 Amazon'16 (c) 2016, C. Faloutsos

= SKIP



50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1

Group 2 Group 4

Group 3




Nouns Questions



Nouns Questions



Nouns Questions



Nouns Questions








v

1

and v

2





5 Experiments



~






Small items -> Premotor cortex

= SKIP



50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1

Group 2 Group 4

Group 3




Nouns Questions



Nouns Questions



Nouns Questions



Nouns Questions








v

1

and v

2





5 Experiments



~






Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014

Small items -> Premotor cortex

SKIP

CMU SCS

Part 2: Conclusions

•  Time-evolving / heterogeneous graphs -> tensors

•  PARAFAC finds patterns •  (GigaTensor/HaTen2 -> fast & scalable)


=

CMU SCS


Roadmap


•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Acknowledgements and Conclusions

Amazon'16

CMU SCS


Thanks

Amazon'16

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

CMU SCS


Cast

Akoglu, Leman

Chau, Polo

Kang, U

Hooi, Bryan

Amazon'16

Koutra, Danai

Beutel, Alex

Papalexakis, Vagelis

Shah, Neil

Araujo, Miguel

Song, Hyun Ah

Eswaran, Dhivya

Shin, Kijung

CMU SCS


CONCLUSION#1 – Big data •  Patterns Anomalies

•  Large datasets reveal patterns/outliers that are invisible otherwise

Amazon'16

CMU SCS


CONCLUSION#2 – tensors

•  powerful tool

Amazon'16

=


CMU SCS


References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

Amazon'16

CMU SCS

TAKE HOME MESSAGE: Cross-disciplinarity


=

CMU SCS

My goals wrt sabbatical @ amzn •  Win-win setting: •  For Amazon: state-of-the-art methods for

anomaly detection •  For my students: focus on real-world

research problems


CMU SCS

My goals wrt sabbatical @ amzn •  Win-win setting: •  For Amazon: state-of-the-art methods for

anomaly detection •  For my students: focus on real-world

research problems


Documents

Mining Large Graphschristos/TALKS/16-09-AMAZON/faloutsos... · 2016. 9. 9. · CMU SCS Mining Large Graphs: Patterns, Anomalies, and Fraud Detection Christos Faloutsos CMU