92
CMU SCS Mining Large Graphs Christos Faloutsos CMU

Mining Large Graphs - cs.cmu.edu

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mining Large Graphs - cs.cmu.edu

CMU SCS

Mining Large Graphs

Christos Faloutsos CMU

Page 2: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 2

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs •  Conclusions

CMU SCS 15649

Page 3: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 3

Graphs - why should we care?

CMU SCS 15649

~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’

Page 4: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 4

Graphs - why should we care?

CMU SCS 15649

U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan.

~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’

Page 5: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 5

Graphs - why should we care?

>$10B; ~1B users

CMU SCS 15649

Page 6: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 6

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

CMU SCS 15649

Page 7: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 7

Graphs - why should we care? •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic and

anomaly detection •  Recommendation systems •  ....

•  Many-to-many db relationship -> graph

CMU SCS 15649

Page 8: Mining Large Graphs - cs.cmu.edu

CMU SCS

Motivating problems •  P1: patterns? Fraud detection?

•  P2: patterns in time-evolving graphs / tensors

CMU SCS 15649 (c) 2014, C. Faloutsos 8

time

destination

Page 9: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 9

Roadmap

•  Introduction – Motivation – Why study (big) graphs?

•  Part#1: Patterns in graphs •  Part#2: time-evolving graphs •  Conclusions

CMU SCS 15649

Page 10: Mining Large Graphs - cs.cmu.edu

CMU SCS

CMU SCS 15649 (c) 2014, C. Faloutsos 10

Part 1: Patterns, &

fraud detection

Page 11: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 11

Laws and patterns •  Q1: Are real graphs random?

CMU SCS 15649

Page 12: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 12

Laws and patterns •  Q1: Are real graphs random? •  A1: NO!!

– Diameter (‘6 degrees’; ‘Kevin Bacon’) –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

CMU SCS 15649

Page 13: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 13

Solution# S.1 •  Power law in the degree distribution [FFF,

SIGCOMM99]

log(rank)

log(degree)

internet domains

att.com

ibm.com

CMU SCS 15649

Page 14: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 14

Solution# S.1 •  Power law in the degree distribution [FFF,

SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

Page 15: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 15

Solution# S.1 •  Q: So what?

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

Page 16: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 16

Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs:

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

= friends of friends (F.O.F.)

Page 17: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 17

Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: 100^2 * N= 10 Trillion

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

= friends of friends (F.O.F.)

Page 18: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 18

Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: 100^2 * N= 10 Trillion

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

= friends of friends (F.O.F.)

Page 19: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 19

Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

~0.8PB -> a data center(!)

DCO @ CMU

Gaussian trap

= friends of friends (F.O.F.)

Page 20: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 20

Solution# S.1 •  Q: So what? •  A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

CMU SCS 15649

~0.8PB -> a data center(!)

Gaussian trap

Page 21: Mining Large Graphs - cs.cmu.edu

CMU SCS

Observation – big-data: •  O(N2) algorithms are ~intractable - N=1B

•  N2 seconds = 31B years (>2x age of universe)

CMU SCS 15649 (c) 2014, C. Faloutsos 21

1B

1B

Page 22: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 22

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix (‘eig()’)

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

CMU SCS 15649

A x = λ x

Page 23: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 23

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs

– Patterns: Degree; Triangles – Anomaly/fraud detection – Graph understanding

•  Part#2: time-evolving graphs; tensors •  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 24: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 24

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

CMU SCS 15649

Page 25: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 25

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns? –  2x the friends, 2x the triangles ?

CMU SCS 15649

Page 26: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 26

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

CMU SCS 15649

Page 27: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 27

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax

2) Q: Can we do that quickly? A:

details

CMU SCS 15649

Page 28: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 28

Triangle Law: Computations [Tsourakakis ICDM 2008]

But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax

2) Q: Can we do that quickly? A: Yes!

#triangles = 1/6 Sum ( λi3 )

(and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)

CMU SCS 15649

A x = λ x

details

Page 29: Mining Large Graphs - cs.cmu.edu

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

29 CMU SCS 15649 29 (c) 2014, C. Faloutsos

? ?

?

Page 30: Mining Large Graphs - cs.cmu.edu

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

30 CMU SCS 15649 30 (c) 2014, C. Faloutsos

Page 31: Mining Large Graphs - cs.cmu.edu

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

31 CMU SCS 15649 31 (c) 2014, C. Faloutsos

Page 32: Mining Large Graphs - cs.cmu.edu

CMU SCS

Triangle counting for large graphs?

Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]

32 CMU SCS 15649 32 (c) 2014, C. Faloutsos

Page 33: Mining Large Graphs - cs.cmu.edu

CMU SCS

MORE Graph Patterns

CMU SCS 15649 (c) 2014, C. Faloutsos 33

✔ ✔ ✔

RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

Page 34: Mining Large Graphs - cs.cmu.edu

CMU SCS

MORE Graph Patterns

CMU SCS 15649 (c) 2014, C. Faloutsos 34

•  Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)

•  Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.

Page 35: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 35

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs

– Patterns – Anomaly / fraud detection – Graph understanding

•  Part#2: time-evolving graphs; tensors •  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 36: Mining Large Graphs - cs.cmu.edu

CMU SCS

Fraud •  Given

– Who ‘likes’ what page, and when

•  Find – Suspicious users and suspicious

products

CMU SCS 15649 (c) 2014, C. Faloutsos 36

CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

Page 37: Mining Large Graphs - cs.cmu.edu

CMU SCS

Fraud •  Given

– Who ‘likes’ what page, and when

•  Find – Suspicious users and suspicious

products

CMU SCS 15649 (c) 2014, C. Faloutsos 37

CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes

Page 38: Mining Large Graphs - cs.cmu.edu

CMU SCS

Our intuition ▪  Lockstep behavior: Same Likes, same time

Graph Patterns and Lockstep Behavior

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes CMU SCS 15649 38 (c) 2014, C. Faloutsos

Page 39: Mining Large Graphs - cs.cmu.edu

CMU SCS

Our intuition ▪  Lockstep behavior: Same Likes, same time

Graph Patterns and Lockstep Behavior

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

B CD A E

Reordered Pages

4

3 2 1

Reord

ere

d U

sers

0

10

20

30

40

50

60

70

80

90

Tim

e

Likes CMU SCS 15649 39 (c) 2014, C. Faloutsos

Page 40: Mining Large Graphs - cs.cmu.edu

CMU SCS

Our intuition ▪  Lockstep behavior: Same Likes, same time

Graph Patterns and Lockstep Behavior

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Pages

Users

0

10

20

30

40

50

60

70

80

90

Tim

e

B CD A E

Reordered Pages

4

3 2 1

Reord

ere

d U

sers

0

10

20

30

40

50

60

70

80

90

Tim

e

Suspicious Lockstep Behavior Likes

CMU SCS 15649 40 (c) 2014, C. Faloutsos

Page 41: Mining Large Graphs - cs.cmu.edu

CMU SCS

MapReduce Overview ▪  Use Hadoop to search for

many clusters in parallel:

1.  Start with randomly seed

2.  Update set of Pages and center Like times for each cluster

3.  Repeat until convergence

Users Pages

A

B

C

D

E

1

2

3

4

40 Z

Likes

CMU SCS 15649 41 (c) 2014, C. Faloutsos

Page 42: Mining Large Graphs - cs.cmu.edu

CMU SCS

Deployment at Facebook ▪  CopyCatch runs regularly (along with many other

security mechanisms, and a large Site Integrity team)

08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01

Num

ber

of u

sers

cau

ght

Date of CopyCatch run

3 months of CopyCatch @ Facebook

#users caught

time CMU SCS 15649 42 (c) 2014, C. Faloutsos

Page 43: Mining Large Graphs - cs.cmu.edu

CMU SCS

Deployment at Facebook

23%

58%

5%9%

5%

Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering

Manually labeled 22 randomly selected clusters from February 2013

Most clusters (77%) come from real but compromised users

Fake acct

CMU SCS 15649 43 (c) 2014, C. Faloutsos

Page 44: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 44

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs

– Patterns – Anomaly / fraud detection – Graph understanding

•  Part#2: time-evolving graphs; •  Conclusions

CMU SCS 15649

Page 45: Mining Large Graphs - cs.cmu.edu

CMU SCS

Wikipedia - editors

CMU SCS 15649 (c) 2014, C. Faloutsos 45

•  Nodes: editors •  Edge A->B: ‘A’ changed ‘B’ Any pattern?

Page 46: Mining Large Graphs - cs.cmu.edu

CMU SCS

VoG:Summarizing and Understanding Large Graphs

Danai Koutra, U Kang, Jilles Vreeken, Christos Faloutsos.

SDM 2014, Philadelphia, PA, April 2014.

CMU SCS 15649 (c) 2014, C. Faloutsos 46

Code: www.cs.cmu.edu/~dkoutra/CODE/vog.tar

Page 47: Mining Large Graphs - cs.cmu.edu

CMU SCS Wiki Controversial Article

(c) 2014, C. Faloutsos 47

Stars: admins,

bots, heavy users

Bipartite cores: edit wars

Kiev vs. Kyiv vandals vs. admins

Danai Koutra (CMU) 47 CMU SCS 15649

Page 48: Mining Large Graphs - cs.cmu.edu

CMU SCS VoG: Summarizing Graphs using

Rich Vocabularies Main Ideas: (1) Use `vocabulary' of subgraph types

(2) Minimum Description Length (MDL) and above vocabulary, to summarize graph

CMU SCS 15649 48 (c) 2014, C. Faloutsos

Page 49: Mining Large Graphs - cs.cmu.edu

CMU SCS

Summary of Part#1 •  *many* patterns in real graphs

– Power-laws everywhere – Gaussian trap

•  Avg << Max

CMU SCS 15649 (c) 2014, C. Faloutsos 49

Page 50: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 50

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

– P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – Speed

•  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 51: Mining Large Graphs - cs.cmu.edu

CMU SCS

CMU SCS 15649 (c) 2014, C. Faloutsos 51

Part 2: Time evolving

graphs; tensors

Page 52: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1:

– Given who calls whom, and when – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 52

smith

Page 53: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1:

– Given who calls whom, and when – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 53

Page 54: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1:

– Given who calls whom, and when – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 54

Mon Tue

Page 55: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1:

– Given who calls whom, and when – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 55 callee

caller

Page 56: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’:

– Given author-keyword-date – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 56 keyword

author

MANY more settings, with >2 ‘modes’

Page 57: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’’:

– Given subject – verb – object facts – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 57 object

subject

MANY more settings, with >2 ‘modes’

Page 58: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs over time -> tensors! •  Problem #2.1’’’:

– Given <triplets> – Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 58 mode2

mode1

MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

Page 59: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs & side info •  Problem #2.2: coupled (eg., side info)

– Given subject – verb – object facts •  And voxel-activity for each subject-word

– Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 59 object

subject

fMRI voxel activity `apple tastes sweet’

Page 60: Mining Large Graphs - cs.cmu.edu

CMU SCS

Graphs & side info •  Problem #2.2: coupled (eg., side info)

– Given subject – verb – object facts •  And voxel-activity for each subject-word

– Find patterns / anomalies

CMU SCS 15649 (c) 2014, C. Faloutsos 60 ‘sweet’

‘apple’

fMRI voxel activity `apple tastes sweet’

‘apple’

Page 61: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 61

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

– P2.1: time-evolving graphs – P2.2: with side information (‘coupled’ M.T.F.) – Speed

•  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 62: Mining Large Graphs - cs.cmu.edu

CMU SCS Answer to both: tensor

factorization •  Recall: (SVD) matrix factorization: finds

blocks

CMU SCS 15649 (c) 2014, C. Faloutsos 62

N users

M products

‘meat-eaters’ ‘steaks’

‘vegetarians’ ‘plants’

‘kids’ ‘cookies’

~ + +

Page 63: Mining Large Graphs - cs.cmu.edu

CMU SCS Answer to both: tensor

factorization •  PARAFAC decomposition

CMU SCS 15649 (c) 2014, C. Faloutsos 63

= + + subject

object

politicians artists athletes

Page 64: Mining Large Graphs - cs.cmu.edu

CMU SCS

Answer: tensor factorization •  PARAFAC decomposition •  Results for who-calls-whom-when

–  4M x 15 days

CMU SCS 15649 (c) 2014, C. Faloutsos 64

= + + caller

callee

?? ?? ??

Page 65: Mining Large Graphs - cs.cmu.edu

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

CMU SCS 15649 65 (c) 2014, C. Faloutsos

=

Page 66: Mining Large Graphs - cs.cmu.edu

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day!

1 caller 5 receivers 4 days of activity

CMU SCS 15649 66 (c) 2014, C. Faloutsos

=

Page 67: Mining Large Graphs - cs.cmu.edu

CMU SCS Anomaly detection in time-

evolving graphs

•  Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks

~200 calls to EACH receiver on EACH day! CMU SCS 15649 67 (c) 2014, C. Faloutsos

=

Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

Page 68: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 68

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

– P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – Speed

•  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 69: Mining Large Graphs - cs.cmu.edu

CMU SCS Coupled Matrix-Tensor Factorization

(CMTF)

Y X

a1 aF

b1 bF

c1 cF

a1 aF

d1 dF

CMU SCS 15649 69 (c) 2014, C. Faloutsos

+

+

Page 70: Mining Large Graphs - cs.cmu.edu

CMU SCS

Neuro-semantics

!"

!" #$$%&%'()* +%,-./0 )($ */,/($0

!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!

!

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

•  Brain Scan Data*

•  9 persons •  60 nouns

•  Questions •  218 questions •  ‘is it alive?’, ‘can

you eat it?’

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

CMU SCS 15649 70 (c) 2014, C. Faloutsos

*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html

Page 71: Mining Large Graphs - cs.cmu.edu

CMU SCS

Neuro-semantics

!"

!" #$$%&%'()* +%,-./0 )($ */,/($0

!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!

!

•  Brain Scan Data*

•  9 persons •  60 nouns

•  Questions •  218 questions •  ‘is it alive?’, ‘can

you eat it?’

CMU SCS 15649 71 (c) 2014, C. Faloutsos

Patterns?

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 72: Mining Large Graphs - cs.cmu.edu

CMU SCS

Neuro-semantics

!"

!" #$$%&%'()* +%,-./0 )($ */,/($0

!"#$%& '() *%&+&,-.-"/, .,0 +&- /1 &2&345.%+ $+&0 ", -6& &24&%"3&,-)!"#$%&'&(#)%*!+,$,!($,*,)%,-!./!-&*%&)'%!+0$-1(&'%2$,!(#&$*!-,*'$&3&)4!'0550)!'0)'$,%,!)02)*6!!78,*,!'0)*&*%,-!09!:!,;,5(<#$*!9$05!,#'8!09!=>!'#%,40$&,*?!#*!*80+)!#30@,6!!A!*<0+!,@,)%1$,<#%,-!(#$#-&45!+#*!,5(<0B,-?!&)!+8&'8!%8,!*%&52<2*!+#*!($,*,)%,-!90$!C*?!90<<0+,-!3B!#!D*!9&;#%&0)!(,$&0-!-2$&)4!+8&'8!#)!E!+#*!($,*,)%,-!&)!%8,!',)%,$!09!%8,!*'$,,)6!!F5#4,*!+,$,!($,*,)%,-!#*!+8&%,!<&),*!#)-!'8#$#'%,$*!0)!#!-#$G!3#'G4$02)-?!32%!#$,!&)@,$%,-!8,$,!%0!&5($0@,!$,#-#3&<&%B6!!78,!,)%&$,!*,%!09!./!,;,5(<#$*!+#*!($,*,)%,-!*&;!%&5,*?!$#)-05<B!(,$52%&)4!%8,!*,H2,)',!0)!,#'8!($,*,)%#%&0)6!

!

•  Brain Scan Data*

•  9 persons •  60 nouns

•  Questions •  218 questions •  ‘is it alive?’, ‘can

you eat it?’

Tofullyspecify

amodelwithint

hiscom-

putationalmode

lingframework,o

nemustfirst

defineasetofin

termediateseman

ticfeatures

f 1(w)f 2(w)…f n(w)tobeextract

edfromthetext

corpus.Inthispa

per,eachintermed

iatesemantic

featureisdefined

intermsoftheco

-occurrence

statisticsofthein

putstimuluswo

rdwwitha

particularotherwo

rd(e.g.,“taste”)or

setofwords

(e.g.,“taste,”“tas

tes,”or“tasted”)w

ithinthetext

corpus.Themode

listrainedbythe

applicationof

multipleregressio

ntothesefeature

sf i(w)andthe

observedfMRIim

ages,soastoobt

ainmaximum-

likelihoodestima

tesforthemodel

parametersc vi

(26).Oncetrained,

thecomputational

modelcanbe

evaluatedbygivin

gitwordsoutside

thetraining

setandcomparin

gitspredictedfM

RIimagesfor

thesewordswith

observedfMRIda

ta.Thiscom

putationalmodeli

ngframeworkis

basedontwokey

theoreticalassump

tions.First,it

assumesthesema

nticfeaturesthatd

istinguishthe

meaningsofarbit

raryconcretenoun

sarereflected

inthestatisticsof

theirusewithina

verylargetext

corpus.Thisassum

ptionisdrawnfro

mthefieldof

computationallin

guistics,wherest

atisticalword

distributionsare

frequentlyusedt

oapproximate

themeaningofd

ocumentsandw

ords(14–17).

Second,itassume

sthatthebrainact

ivityobserved

whenthinkinga

boutanyconcrete

nouncanbe

derivedasaweig

htedlinearsumo

fcontributions

fromeachofits

semanticfeatures.

Althoughthe

correctnessofthis

linearityassumpti

onisdebat-

able,itisconsist

entwiththewide

spreaduseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptionthatf

MRIactivationo

ftenreflectsa

linearsuperpositio

nofcontributions

fromdifferent

sources.Ourtheore

ticalframeworkdo

esnottakea

positiononwheth

ertheneuralactiv

ationencoding

meaningislocali

zedinparticular

corticalre-

gions.Instead,it

considersallcort

icalvoxelsand

allowsthetrainin

gdatatodetermin

ewhichloca-

tionsaresystemat

icallymodulated

bywhichas-

pectsofwordmean

ings.

Results.Weevaluated

thiscomputational

mod-elusing

fMRIdatafromn

inehealthy,college

-ageparticipan

tswhoviewed60

differentword-pic

turepairspre

sentedsixtimes

each.Anatomicall

yde-finedreg

ionsofinterestwe

reautomaticallyla

beledaccording

tothemethodolog

yin(28).The60

ran-domlyo

rderedstimuliinc

ludedfiveitems

fromeachof1

2semanticcatego

ries(animals,body

parts,buildings

,buildingparts,clo

thing,furniture,in

sects,kitchenit

ems,tools,vegeta

bles,vehicles,and

otherman-mad

eitems).Areprese

ntativefMRIimage

foreachstim

uluswascreatedb

ycomputingthem

eanfMRIre

sponseoveritssi

xpresentations,a

ndthemeanof

all60oftheserep

resentativeimages

wasthensub

tractedfromeach

[fordetails,see(

26)].Toinstan

tiateourmodeling

framework,wefir

stchoseas

etofintermediate

semanticfeatures.

Tobeeffective,

theintermediatese

manticfeaturesm

ustsimultane

ouslyencodethew

idevarietyofsema

nticcontento

ftheinputstimul

uswordsandfacto

rtheobserved

fMRIactivationin

tomoreprimitivec

om-

Predicted

“celery” = 0.84

“celery”“airplan

e”

Predicted:

Observed:

AB

+...

high

average below average

Predicted “celer

y”:+ 0.35+ 0.32

“eat”“taste”

“fill”

Fig.2.Predicting

fMRIimages

forgivenstimulus

words.(A)

Formingapredict

ionforpar-

ticipantP1fort

hestimulus

word“celery”afte

rtrainingon

58otherwords.Le

arnedc vico-efficients

for3ofthe25s

e-manticfe

atures(“eat,”“tast

e,”and“fill”

)aredepictedby

thevoxelcolo

rsinthethreeima

gesatthetop

ofthepanel.Thec

o-occurrenc

evalueforeacho

fthesefeaturesfor

thestimulusword

“celery”is

showntothelefto

ftheirrespectiveim

ages[e.g.,thevalu

efor“eat(celery)”i

s0.84].The

predictedactivation

forthestimuluswo

rd[shownatthebo

ttomof(A)]isal

inearcombinationo

fthe25semantic

fMRIsignatures,w

eightedby

theirco-occurrence

values.Thisfigure

showsjustonehor

izontalslice[z=

–12mminMontr

ealNeurological

Institute(MNI)spa

ce]ofthepredict

edthree-dim

ensionalimage.(

B)Predictedand

observedfMRIima

gesfor“celery”a

nd“airplane”after

trainingthatuses5

8otherwords.The

twolongredandb

lueverticalstreaks

nearthetop(post

eriorregion)ofthe

predictedandobse

rvedimagesaret

heleftandrightfu

siformgyri.

A B

C

Mean overparticipants

Participant P5

Fig.3.Locations

ofmostac

curatelypre-

dictedvoxels.Sur

face(A)andg

lassbrain(B)

renderingofthecor

rela-tionbetw

eenpredicted

andactualvoxelac

tiva-tionsfor

wordsoutside

thetrainingsetfor

par-ticipantP

5.Thesepanelssho

wclusterscontaining

atleast10contigu

ousvoxels,eachof

whosepredicted-

actualcorrelationis

atleast0.28.These

voxelclustersared

istributedthroughou

tthecortexan

dlocatedinthelef

tandrightoccipit

alandparietallobe

s;leftandrightfu

siform,postcentra

l,andmiddlefronta

lgyri;leftinferiorfr

ontalgyrus;medial

frontalgyrus;anda

nteriorcingulate.

(C)Surfacerenderi

ngofthepredicted-

actualcorrelation

averagedoveralln

ineparticipan

ts.Thispanelrepres

entsclusterscontai

ningatleast10co

ntiguousvoxels,ea

chwithaveragec

orrelationofatleast

0.14.

30MAY2008V

OL320SCIENC

Ewww.science

mag.org1192RESEAR

CHARTICLES

on May 30, 2008 www.sciencemag.org Downloaded from

Tofullyspecify

amodelwithin

thiscom-

putationalmode

lingframework

,onemustfirst

defineasetof

intermediatese

manticfeatures

f 1(w)f 2(w)…f n(w)tobeextrac

tedfromthetext

corpus.Inthisp

aper,eachinter

mediatesemanti

cfeature

isdefinedinte

rmsoftheco-o

ccurrence

statisticsofthe

inputstimulus

wordwwitha

particularotherw

ord(e.g.,“taste”

)orsetofword

s(e.g.,“ta

ste,”“tastes,”or

“tasted”)within

thetextcorpus.

Themodelistra

inedbytheappl

icationof

multipleregressio

ntothesefeatu

resf i(w)andthe

observedfMRI

images,soasto

obtainmaximum

-likelihoo

destimatesfort

hemodelparam

etersc vi

(26).Oncetraine

d,thecomputatio

nalmodelcanbe

evaluatedbygiv

ingitwordsou

tsidethetrainin

gsetand

comparingitsp

redictedfMRIim

agesforthesew

ordswithobser

vedfMRIdata.

Thiscomputatio

nalmodeling

frameworkis

basedontwok

eytheoreticalas

sumptions.First

,itassumes

thesemanticfea

turesthatdisting

uishthemeaning

sofarbitraryco

ncretenounsare

reflected

inthestatisticso

ftheirusewithi

naverylargetex

tcorpus.

Thisassumption

isdrawnfromthe

fieldofcomputa

tionallinguistic

s,wherestatist

icalword

distributionsare

frequentlyused

toapproximate

themeaningof

documentsand

words(14–17).

Second,itassum

esthatthebrain

activityobserve

dwhenth

inkingaboutan

yconcretenou

ncanbe

derivedasawe

ightedlinearsu

mofcontributio

nsfromea

chofitsseman

ticfeatures.Alth

oughthe

correctnessofth

islinearityassu

mptionisdebat

-able,it

isconsistentwi

ththewidesprea

duseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptiontha

tfMRIactivatio

noftenreflects

alinearsu

perpositionofc

ontributionsfrom

differentsources.

Ourtheoreticalfr

ameworkdoesn

ottakeaposition

onwhetherthen

euralactivation

encoding

meaningisloc

alizedinpartic

ularcorticalre

-gions.I

nstead,itconsid

ersallcorticalv

oxelsand

allowsthetrain

ingdatatodeterm

inewhichloca-

tionsaresystem

aticallymodulat

edbywhichas-

pectsofwordm

eanings.

Results.Weevaluate

dthiscomputatio

nalmod-

elusingfMRId

atafromninehea

lthy,college-age

participantswho

viewed60diffe

rentword-pictur

epairspr

esentedsixtime

seach.Anatom

icallyde-

finedregionsofi

nterestwereauto

maticallylabele

daccordin

gtothemethodo

logyin(28).Th

e60ran-

domlyordered

stimuliincluded

fiveitemsfrom

eachof12sema

nticcategories(a

nimals,bodypart

s,building

s,buildingparts

,clothing,furnit

ure,insects,

kitchenitems,to

ols,vegetables,

vehicles,andoth

erman-ma

deitems).Arepr

esentativefMRI

imagefor

eachstimuluswa

screatedbycomp

utingthemean

fMRIresponse

overitssixpres

entations,andth

emeanof

all60oftheser

epresentativeim

ageswas

thensubtractedf

romeach[ford

etails,see(26)]

.Toinsta

ntiateourmodeli

ngframework,w

efirstchosea

setofintermedia

tesemanticfeatu

res.Tobe

effective,theint

ermediatesema

nticfeaturesmu

stsimultan

eouslyencodeth

ewidevarietyof

semanticcontent

oftheinputstim

uluswordsand

factorthe

observedfMRIa

ctivationintomor

eprimitivecom-

Predicted

“celery” = 0.84

“celery”

“airplane”

Predicted:

Observed:

AB

+...

high

average below averag

e

Predicted “cele

ry”:+ 0.35+ 0.32

“eat”“taste”

“fill”

Fig.2.Predictin

gfMRIimages

forgivenstimu

luswords.(A)

Formingapredic

tionforpar-

ticipantP1for

thestimulus

word“celery”aft

ertrainingon

58otherwords.L

earnedcvico-

efficientsfor3of

the25se-

manticfeatures

(“eat,”“taste,”

and“fill”)ared

epictedbythe

voxelcolorsinth

ethreeimages

atthetopofthe

panel.Theco-

occurrencevalue

foreachofthese

featuresforthes

timulusword“ce

lery”isshownto

theleftoftheirre

spectiveimages[

e.g.,thevaluefor

“eat(celery)”is

0.84].Thepredict

edactivationfor

thestimulusword

[shownatthebot

tomof(A)]isa

linearcombinatio

nofthe25sema

nticfMRIsignatu

res,weightedby

theirco-occurren

cevalues.Thisf

igureshowsjust

onehorizontals

lice[z=

–12mminMont

realNeurologica

lInstitute(MNI)

space]ofthepr

edictedthree-dim

ensionalimage.

(B)Predictedan

dobservedfMR

Iimagesfor

“celery”and“air

plane”aftertrain

ingthatuses58

otherwords.The

twolongredand

blueverticalstre

aksnearthetop(

posteriorregion

)ofthepredicte

dandobs

ervedimagesare

theleftandrigh

tfusiformgyri.

A B

C

Mean overparticipants

Participant P5

Fig.3.Location

sofmostac

curatelypre-

dictedvoxels.S

urface(A)and

glassbrain(B)

renderingofthe

correla-tionbet

weenpredicted

andactualvoxel

activa-tionsfor

wordsoutside

thetrainingsetf

orpar-ticipantP

5.Thesepanelssh

owclusterscontai

ningatleast10c

ontiguousvoxels,

eachofwhose

predicted-actualc

orrelationisatleas

t0.28.Thesevox

elclustersaredis

tributedthrougho

utthecortexan

dlocatedinthele

ftandrightoccip

italandparietal

lobes;leftandri

ghtfusiform,

postcentral,andm

iddlefrontalgyri;l

eftinferiorfronta

lgyrus;medialfron

talgyrus;andant

eriorcingulate

.(C)Surfaceren

deringofthepr

edicted-actualcor

relationaveraged

overallnine

participants.This

panelrepresents

clusterscontaining

atleast10contigu

ousvoxels,each

withaverage

correlationofatle

ast0.14.

30MAY2008

VOL320SCI

ENCEwww.sc

iencemag.org

1192RESEARCHART

ICLES

on May 30, 2008 www.sciencemag.org Downloaded from

Tofullyspecify

amodelwithint

hiscom-

putationalmode

lingframework,o

nemustfirst

defineasetofin

termediateseman

ticfeatures

f 1(w)f 2(w)…f n(w)tobeextract

edfromthetext

corpus.Inthispa

per,eachintermed

iatesemantic

featureisdefined

intermsoftheco

-occurrence

statisticsofthein

putstimuluswo

rdwwitha

particularotherwo

rd(e.g.,“taste”)or

setofwords

(e.g.,“taste,”“tas

tes,”or“tasted”)w

ithinthetext

corpus.Themode

listrainedbythe

applicationof

multipleregressio

ntothesefeature

sf i(w)andthe

observedfMRIim

ages,soastoobt

ainmaximum-

likelihoodestima

tesforthemodel

parametersc vi

(26).Oncetrained,

thecomputational

modelcanbe

evaluatedbygivin

gitwordsoutside

thetraining

setandcomparin

gitspredictedfM

RIimagesfor

thesewordswith

observedfMRIda

ta.Thiscom

putationalmodeli

ngframeworkis

basedontwokey

theoreticalassump

tions.First,it

assumesthesema

nticfeaturesthatd

istinguishthe

meaningsofarbit

raryconcretenoun

sarereflected

inthestatisticsof

theirusewithina

verylargetext

corpus.Thisassum

ptionisdrawnfro

mthefieldof

computationallin

guistics,wherest

atisticalword

distributionsare

frequentlyusedt

oapproximate

themeaningofd

ocumentsandw

ords(14–17).

Second,itassume

sthatthebrainact

ivityobserved

whenthinkinga

boutanyconcrete

nouncanbe

derivedasaweig

htedlinearsumo

fcontributions

fromeachofits

semanticfeatures.

Althoughthe

correctnessofthis

linearityassumpti

onisdebat-

able,itisconsist

entwiththewide

spreaduseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptionthatf

MRIactivationo

ftenreflectsa

linearsuperpositio

nofcontributions

fromdifferent

sources.Ourtheore

ticalframeworkdo

esnottakea

positiononwheth

ertheneuralactiv

ationencoding

meaningislocali

zedinparticular

corticalre-

gions.Instead,it

considersallcort

icalvoxelsand

allowsthetrainin

gdatatodetermin

ewhichloca-

tionsaresystemat

icallymodulated

bywhichas-

pectsofwordmean

ings.

Results.Weevaluated

thiscomputational

mod-elusing

fMRIdatafromn

inehealthy,college

-ageparticipan

tswhoviewed60

differentword-pic

turepairspre

sentedsixtimes

each.Anatomicall

yde-finedreg

ionsofinterestwe

reautomaticallyla

beledaccording

tothemethodolog

yin(28).The60

ran-domlyo

rderedstimuliinc

ludedfiveitems

fromeachof1

2semanticcatego

ries(animals,body

parts,buildings

,buildingparts,clo

thing,furniture,in

sects,kitchenit

ems,tools,vegeta

bles,vehicles,and

otherman-mad

eitems).Areprese

ntativefMRIimage

foreachstim

uluswascreatedb

ycomputingthem

eanfMRIre

sponseoveritssi

xpresentations,a

ndthemeanof

all60oftheserep

resentativeimages

wasthensub

tractedfromeach

[fordetails,see(

26)].Toinstan

tiateourmodeling

framework,wefir

stchoseas

etofintermediate

semanticfeatures.

Tobeeffective,

theintermediatese

manticfeaturesm

ustsimultane

ouslyencodethew

idevarietyofsema

nticcontento

ftheinputstimul

uswordsandfacto

rtheobserved

fMRIactivationin

tomoreprimitivec

om-

Predicted

“celery” = 0.84

“celery”“airplan

e”

Predicted:

Observed:

AB

+...

high

average below average

Predicted “celer

y”:+ 0.35+ 0.32

“eat”“taste”

“fill”

Fig.2.Predicting

fMRIimages

forgivenstimulus

words.(A)

Formingapredict

ionforpar-

ticipantP1fort

hestimulus

word“celery”afte

rtrainingon

58otherwords.Le

arnedc vico-efficients

for3ofthe25s

e-manticfe

atures(“eat,”“tast

e,”and“fill”

)aredepictedby

thevoxelcolo

rsinthethreeima

gesatthetop

ofthepanel.Thec

o-occurrenc

evalueforeacho

fthesefeaturesfor

thestimulusword

“celery”is

showntothelefto

ftheirrespectiveim

ages[e.g.,thevalu

efor“eat(celery)”i

s0.84].The

predictedactivation

forthestimuluswo

rd[shownatthebo

ttomof(A)]isal

inearcombinationo

fthe25semantic

fMRIsignatures,w

eightedby

theirco-occurrence

values.Thisfigure

showsjustonehor

izontalslice[z=

–12mminMontr

ealNeurological

Institute(MNI)spa

ce]ofthepredict

edthree-dim

ensionalimage.(

B)Predictedand

observedfMRIima

gesfor“celery”a

nd“airplane”after

trainingthatuses5

8otherwords.The

twolongredandb

lueverticalstreaks

nearthetop(post

eriorregion)ofthe

predictedandobse

rvedimagesaret

heleftandrightfu

siformgyri.

A B

C

Mean overparticipants

Participant P5

Fig.3.Locations

ofmostac

curatelypre-

dictedvoxels.Sur

face(A)andg

lassbrain(B)

renderingofthecor

rela-tionbetw

eenpredicted

andactualvoxelac

tiva-tionsfor

wordsoutside

thetrainingsetfor

par-ticipantP

5.Thesepanelssho

wclusterscontaining

atleast10contigu

ousvoxels,eachof

whosepredicted-

actualcorrelationis

atleast0.28.These

voxelclustersared

istributedthroughou

tthecortexan

dlocatedinthelef

tandrightoccipit

alandparietallobe

s;leftandrightfu

siform,postcentra

l,andmiddlefronta

lgyri;leftinferiorfr

ontalgyrus;medial

frontalgyrus;anda

nteriorcingulate.

(C)Surfacerenderi

ngofthepredicted-

actualcorrelation

averagedoveralln

ineparticipan

ts.Thispanelrepres

entsclusterscontai

ningatleast10co

ntiguousvoxels,ea

chwithaveragec

orrelationofatleast

0.14.

30MAY2008V

OL320SCIENC

Ewww.science

mag.org1192RESEAR

CHARTICLES

on May 30, 2008 www.sciencemag.org Downloaded from

Tofullyspecify

amodelwithin

thiscom-

putationalmode

lingframework

,onemustfirst

defineasetof

intermediatese

manticfeatures

f 1(w)f 2(w)…f n(w)tobeextrac

tedfromthetext

corpus.Inthisp

aper,eachinter

mediatesemanti

cfeature

isdefinedinte

rmsoftheco-o

ccurrence

statisticsofthe

inputstimulus

wordwwitha

particularotherw

ord(e.g.,“taste”

)orsetofword

s(e.g.,“ta

ste,”“tastes,”or

“tasted”)within

thetextcorpus.

Themodelistra

inedbytheappl

icationof

multipleregressio

ntothesefeatu

resf i(w)andthe

observedfMRI

images,soasto

obtainmaximum

-likelihoo

destimatesfort

hemodelparam

etersc vi

(26).Oncetraine

d,thecomputatio

nalmodelcanbe

evaluatedbygiv

ingitwordsou

tsidethetrainin

gsetand

comparingitsp

redictedfMRIim

agesforthesew

ordswithobser

vedfMRIdata.

Thiscomputatio

nalmodeling

frameworkis

basedontwok

eytheoreticalas

sumptions.First

,itassumes

thesemanticfea

turesthatdisting

uishthemeaning

sofarbitraryco

ncretenounsare

reflected

inthestatisticso

ftheirusewithi

naverylargetex

tcorpus.

Thisassumption

isdrawnfromthe

fieldofcomputa

tionallinguistic

s,wherestatist

icalword

distributionsare

frequentlyused

toapproximate

themeaningof

documentsand

words(14–17).

Second,itassum

esthatthebrain

activityobserve

dwhenth

inkingaboutan

yconcretenou

ncanbe

derivedasawe

ightedlinearsu

mofcontributio

nsfromea

chofitsseman

ticfeatures.Alth

oughthe

correctnessofth

islinearityassu

mptionisdebat

-able,it

isconsistentwi

ththewidesprea

duseof

linearmodelsin

fMRIanalysis(2

7)andwiththe

assumptiontha

tfMRIactivatio

noftenreflects

alinearsu

perpositionofc

ontributionsfrom

differentsources.

Ourtheoreticalfr

ameworkdoesn

ottakeaposition

onwhetherthen

euralactivation

encoding

meaningisloc

alizedinpartic

ularcorticalre

-gions.I

nstead,itconsid

ersallcorticalv

oxelsand

allowsthetrain

ingdatatodeterm

inewhichloca-

tionsaresystem

aticallymodulat

edbywhichas-

pectsofwordm

eanings.

Results.Weevaluate

dthiscomputatio

nalmod-

elusingfMRId

atafromninehea

lthy,college-age

participantswho

viewed60diffe

rentword-pictur

epairspr

esentedsixtime

seach.Anatom

icallyde-

finedregionsofi

nterestwereauto

maticallylabele

daccordin

gtothemethodo

logyin(28).Th

e60ran-

domlyordered

stimuliincluded

fiveitemsfrom

eachof12sema

nticcategories(a

nimals,bodypart

s,building

s,buildingparts

,clothing,furnit

ure,insects,

kitchenitems,to

ols,vegetables,

vehicles,andoth

erman-ma

deitems).Arepr

esentativefMRI

imagefor

eachstimuluswa

screatedbycomp

utingthemean

fMRIresponse

overitssixpres

entations,andth

emeanof

all60oftheser

epresentativeim

ageswas

thensubtractedf

romeach[ford

etails,see(26)]

.Toinsta

ntiateourmodeli

ngframework,w

efirstchosea

setofintermedia

tesemanticfeatu

res.Tobe

effective,theint

ermediatesema

nticfeaturesmu

stsimultan

eouslyencodeth

ewidevarietyof

semanticcontent

oftheinputstim

uluswordsand

factorthe

observedfMRIa

ctivationintomor

eprimitivecom-

Predicted

“celery” = 0.84

“celery”

“airplane”

Predicted:

Observed:

AB

+...

high

average below averag

e

Predicted “cele

ry”:+ 0.35+ 0.32

“eat”“taste”

“fill”

Fig.2.Predictin

gfMRIimages

forgivenstimu

luswords.(A)

Formingapredic

tionforpar-

ticipantP1for

thestimulus

word“celery”aft

ertrainingon

58otherwords.L

earnedcvico-

efficientsfor3of

the25se-

manticfeatures

(“eat,”“taste,”

and“fill”)ared

epictedbythe

voxelcolorsinth

ethreeimages

atthetopofthe

panel.Theco-

occurrencevalue

foreachofthese

featuresforthes

timulusword“ce

lery”isshownto

theleftoftheirre

spectiveimages[

e.g.,thevaluefor

“eat(celery)”is

0.84].Thepredict

edactivationfor

thestimulusword

[shownatthebot

tomof(A)]isa

linearcombinatio

nofthe25sema

nticfMRIsignatu

res,weightedby

theirco-occurren

cevalues.Thisf

igureshowsjust

onehorizontals

lice[z=

–12mminMont

realNeurologica

lInstitute(MNI)

space]ofthepr

edictedthree-dim

ensionalimage.

(B)Predictedan

dobservedfMR

Iimagesfor

“celery”and“air

plane”aftertrain

ingthatuses58

otherwords.The

twolongredand

blueverticalstre

aksnearthetop(

posteriorregion

)ofthepredicte

dandobs

ervedimagesare

theleftandrigh

tfusiformgyri.

A B

C

Mean overparticipants

Participant P5

Fig.3.Location

sofmostac

curatelypre-

dictedvoxels.S

urface(A)and

glassbrain(B)

renderingofthe

correla-tionbet

weenpredicted

andactualvoxel

activa-tionsfor

wordsoutside

thetrainingsetf

orpar-ticipantP

5.Thesepanelssh

owclusterscontai

ningatleast10c

ontiguousvoxels,

eachofwhose

predicted-actualc

orrelationisatleas

t0.28.Thesevox

elclustersaredis

tributedthrougho

utthecortexan

dlocatedinthele

ftandrightoccip

italandparietal

lobes;leftandri

ghtfusiform,

postcentral,andm

iddlefrontalgyri;l

eftinferiorfronta

lgyrus;medialfron

talgyrus;andant

eriorcingulate

.(C)Surfaceren

deringofthepr

edicted-actualcor

relationaveraged

overallnine

participants.This

panelrepresents

clusterscontaining

atleast10contigu

ousvoxels,each

withaverage

correlationofatle

ast0.14.

30MAY2008

VOL320SCI

ENCEwww.sc

iencemag.org

1192RESEARCHART

ICLES

on May 30, 2008 www.sciencemag.org Downloaded from

airplane

dog

noun

s

questions

voxels CMU SCS 15649 72 (c) 2014, C. Faloutsos

Patterns?

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.

This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected

in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.

Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].

To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-

Predicted“celery” = 0.84

“celery” “airplane”

Predicted:

Observed:

A B

+.. .

high

average

belowaverage

Predicted “celery”:

+ 0.35 + 0.32

“eat” “taste” “fill”

Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =

–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.

A

B

C

Mean over

participants

Participant P

5

Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-

ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.

30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192

RESEARCH ARTICLES

on

May

30,

200

8 w

ww

.sci

ence

mag

.org

Dow

nloa

ded

from

Page 73: Mining Large Graphs - cs.cmu.edu

CMU SCS Neuro-semantics

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

73 CMU SCS 15649 (c) 2014, C. Faloutsos

=

Page 74: Mining Large Graphs - cs.cmu.edu

CMU SCS Neuro-semantics

74 CMU SCS 15649 (c) 2014, C. Faloutsos

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

Small items -> Premotor cortex

=

Page 75: Mining Large Graphs - cs.cmu.edu

CMU SCS Neuro-semantics

75 CMU SCS 15649 (c) 2014, C. Faloutsos

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014

Small items -> Premotor cortex

Page 76: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 76

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors

– P2.1: Discoveries @ phonecall network – P2.2: Discoveries in neuro-semantics – Speed

•  Part#3: Cascades and immunization •  Conclusions

CMU SCS 15649

Page 77: Mining Large Graphs - cs.cmu.edu

CMU SCS

Speed of tensor/CMTF analysis •  Q1: Can we make it fast? •  Q2: Does it work for large, disk-based data?

CMU SCS 15649 (c) 2014, C. Faloutsos 77

Page 78: Mining Large Graphs - cs.cmu.edu

CMU SCS

A1: Turbo-SMT a time-evolving social network with side information,discovering anomalies.

0 0.2 0.4 0.6 0.8 101

5

10

15

20

25

30

40

50

60

70

Relative run time

Rela

tive c

ost

Parameter s taking values in [2 5 10 20]

CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core

~200x speedup

Ideal

Baseline

Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.

Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.

2 Preliminaries

Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization

ALS Alternating Least Squares

CMTF-OPT Algorithm introduced in [5]

x,x,X,X scalar, vector, matrix, tensor (respectively)

�A�F Frobenius norm of A.

(i) as superscript Indicates the i-th iteration

Ai1, a

i1 series of matrices or vectors, indexed by i.

I Set of indices.

x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.

Table 1: Table of symbols

2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and

X

voxels

words

persons

a1

b1

c1+

Concept1

a2

b2

c2+

Concept2

aR

bRcR

Concept R

. . . +

Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.

have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.

In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.

2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.

X

voxels

words

persons

Y1

questions

words

Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.

In this work we focus on three mode tensors,

78 CMU SCS 15649 (c) 2014, C. Faloutsos

Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014

200x faster

Relative run time

Relative cost

Ideal

Page 79: Mining Large Graphs - cs.cmu.edu

CMU SCS A1: Turbo-SMT

a time-evolving social network with side information,discovering anomalies.

0 0.2 0.4 0.6 0.8 101

5

10

15

20

25

30

40

50

60

70

Relative run time

Rela

tive c

ost

Parameter s taking values in [2 5 10 20]

CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core

~200x speedup

Ideal

Baseline

Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.

Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.

2 Preliminaries

Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization

ALS Alternating Least Squares

CMTF-OPT Algorithm introduced in [5]

x,x,X,X scalar, vector, matrix, tensor (respectively)

�A�F Frobenius norm of A.

(i) as superscript Indicates the i-th iteration

Ai1, a

i1 series of matrices or vectors, indexed by i.

I Set of indices.

x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.

Table 1: Table of symbols

2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and

X

voxels

words

persons

a1

b1

c1+

Concept1

a2

b2

c2+

Concept2

aR

bRcR

Concept R

. . . +

Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.

have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.

In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.

2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.

X

voxels

words

persons

Y1

questions

words

Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.

In this work we focus on three mode tensors,

79 CMU SCS 15649 (c) 2014, C. Faloutsos

200x faster

Exploiting sparsity

Relative run time Ideal

Relative cost

baseline

Page 80: Mining Large Graphs - cs.cmu.edu

CMU SCS A1: Turbo-SMT

a time-evolving social network with side information,discovering anomalies.

0 0.2 0.4 0.6 0.8 101

5

10

15

20

25

30

40

50

60

70

Relative run time

Rela

tive c

ost

Parameter s taking values in [2 5 10 20]

CMTF−OPT, F=10, 8 coresCMTF−OPT, F=10, 4 coresALS, F=2, 8 coresALS, F=10, 1 core

~200x speedup

Ideal

Baseline

Figure 1: Turbo-SMT is up to 200x faster, for com-parable accuracy. Relative execution time vs relativecost (lower is better), for various settings (see Sec. 3.Turbo-SMT boosting either ALS or CMTF-OPT [5])(i.e. the baselines) vs. plain execution of the baselines(on a single core), on the BrainQ dataset (see Section4.). For more details, see Section 5.

Disclaimer: An earlier version of this work was up-loaded on Arxiv.org [16] for quick dissemination thereofand early feedback. However, the Arxiv.org version hasnot been officially published in any conference proceed-ings or journal, and has evolved into this present work.

2 Preliminaries

Symbol DescriptionCMTF Coupled Matrix-Tensor Factorization

ALS Alternating Least Squares

CMTF-OPT Algorithm introduced in [5]

x,x,X,X scalar, vector, matrix, tensor (respectively)

�A�F Frobenius norm of A.

(i) as superscript Indicates the i-th iteration

Ai1, a

i1 series of matrices or vectors, indexed by i.

I Set of indices.

x(I) Spanning indices I of x.nucleus a biased random sample of the data tensor.

Table 1: Table of symbols

2.1 Introduction to Tensors Matrices recorddyadic properties, like “people recommending prod-ucts”. Tensors are the n-mode generalizations, cap-turing 3- and higher-way relationships. For example“subject-verb-object” relationships, such as the onesrecorded by the Read the Web - NELL project [1] (and

X

voxels

words

persons

a1

b1

c1+

Concept1

a2

b2

c2+

Concept2

aR

bRcR

Concept R

. . . +

Figure 2: PARAFAC decomposition of a three-way tensorof a brain activity tensor as sum of F outer products(rank-one tensors), reminiscent of the rank-F singular valuedecomposition of a matrix. Each component corresponds toa latent concept of, e.g. ”insects”, ”tools” and so on, a setof brain regions that are most active for that particular setof words, as well as groups of persons.

have been recently used in this context [9] [15]) naturallylead to a 3-mode tensor. In this work, our working ex-ample of a tensor has three modes. The first mode con-tains a number of nouns; the second mode correspondsto the brain activity, as recorded by an fMRI machine;and the third mode identifies the human subject corre-sponding to a particular brain activity measurement.

In [15], the authors introduced a scalable andparallelizable tensor decomposition which uses modesampling. In this work, we focus on a more general andexpressive framework, that of Coupled Matrix-TensorFactorizations.

2.2 Coupled Matrix-Tensor Factorization Of-tentimes, two tensors, or a matrix and a tensor, mayhave one mode in common; consider the example thatwe mentioned earlier, where we have a word by brainactivity by human subject tensor, we also have a se-mantic matrix that provides additional information forthe same set of words. In this case, we say that thematrix and the tensor are coupled in the ’word’ mode.

X

voxels

words

persons

Y1

questions

words

Figure 3: Coupled Matrix - Tensor example: Tensors oftenshare one or more modes (with thick, wavy line): X is thebrain activity tensor and Y is the semantic matrix. As thewavy line indicates, these two datasets are coupled in the’word’ dimension.

In this work we focus on three mode tensors,

80 CMU SCS 15649 (c) 2014, C. Faloutsos

200x faster

www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

Relative run time Ideal

Relative cost

baseline

Page 81: Mining Large Graphs - cs.cmu.edu

CMU SCS Q2: spilling to the disk?

Reminder: tensor (eg., Subject-verb-object) 144M non-zeros

NELL (Never Ending Language Learner)

@CMU

CMU SCS 15649 81 (c) 2014, C. Faloutsos

26M

26M

48M

Page 82: Mining Large Graphs - cs.cmu.edu

CMU SCS A2: GigaTensor

Reminder: tensor (eg., Subject-verb-object) 26M x 48M x 26M, 144M non-zeros

NELL (Never Ending Language Learner)

@CMU

CMU SCS 15649 82 (c) 2014, C. Faloutsos

U Kang, Evangelos E. Papalexakis, Abhay Harpale, Christos Faloutsos, GigaTensor: Scaling Tensor Analysis Up By 100 Times - Algorithms and Discoveries, KDD’12

Page 83: Mining Large Graphs - cs.cmu.edu

CMU SCS

A2: GigaTensor

•  GigaTensor solves 100x larger problem

Number of nonzero = I / 50

(J)

(I)

(K)

GigaTensor

Out of Memory

100x

CMU SCS 15649 83 (c) 2014, C. Faloutsos

Page 84: Mining Large Graphs - cs.cmu.edu

CMU SCS

Part 2: Conclusions

•  Time-evolving / heterogeneous graphs -> tensors

•  PARAFAC finds patterns •  Turbo-SMT; GigaTensor -> fast & scalable

CMU SCS 15649 84 (c) 2014, C. Faloutsos

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

=

Page 85: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 85

Roadmap

•  Introduction – Motivation •  Part#1: Patterns in graphs •  Part#2: time-evolving graphs; tensors •  Conclusions

CMU SCS 15649

Page 86: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 86

Project info: PEGASUS

CMU SCS 15649

www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus +

hadoop + M45 Apache license Code, papers, manual, video

Prof. U Kang Prof. Polo Chau

Page 87: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 87

Cast

Akoglu, Leman

Chau, Polo

Kang, U

Prakash, Aditya

CMU SCS 15649

Koutra, Danai

Beutel, Alex

Papalexakis, Vagelis

Shah, Neil

Lee, Jay Yoon

Araujo, Miguel

Page 88: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 88

CONCLUSION#1 – Big data

•  Large datasets reveal patterns/outliers that are invisible otherwise

CMU SCS 15649

Page 89: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 89

CONCLUSION#2 – tensors

•  powerful tool

CMU SCS 15649

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

=

Page 90: Mining Large Graphs - cs.cmu.edu

CMU SCS

(c) 2014, C. Faloutsos 90

References •  D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,

Tools and Case Studies, Morgan Claypool 2012 •  http://www.morganclaypool.com/doi/abs/10.2200/

S00449ED1V01Y201209DMK006

CMU SCS 15649

Page 91: Mining Large Graphs - cs.cmu.edu

CMU SCS

TAKE HOME MESSAGE: Cross-disciplinarity

CMU SCS 15649 (c) 2014, C. Faloutsos 91

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

=

Page 92: Mining Large Graphs - cs.cmu.edu

CMU SCS

TAKE HOME MESSAGE: Cross-disciplinarity

CMU SCS 15649 (c) 2014, C. Faloutsos 92

50 100 150 200 250

50

100

150

200

250

3000

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

50 100 150 200 250

50

100

150

200

250

3000

0.01

0.02

0.03

0.04

0.05

Premotor Cortex

50 100 150 200 250

50

100

150

200

250

300

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Group1 Group 2 Group 4Group 3

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

beetle can it cause you pain?pants do you see it daily?bee is it conscious?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?

Nouns Questions Nouns Questions

Nouns Questionsbeetle can it cause you pain?bear does it grow?

cow is it alive?coat was it ever alive?

bear does it grow?cow is it alive?coat was it ever alive?

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’

bed does it use electricity?house can you sit on it?car does it cast a shadow?

bed does it use electricity?house can you sit on it?car does it cast a shadow?

Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very different from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.

v1 and v2 which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:

�v1 − v̂1�2 + �v2 − v̂2�2 < �v1 − v̂2�2 + �v2 − v̂1�2

Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watchwe obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.

5 Experiments

We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used

5http://www.cs.cmu.edu/~epapalex/src/turbo_smt.zip

the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.

5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that

6http://www.models.life.ku.dk/joda/CMTF_Toolbox

=

Questions?