Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CMU SCS
Mining Large Graphs: Patterns, Anomalies, and Fraud
Detection
Christos Faloutsos CMU
CMU SCS
Thank you!
• Alex Smola
• Aditi Chaudhuri
• Tiffany Kuan
Amazon'16 (c) 2016, C. Faloutsos 2
CMU SCS
Foils: at
Amazon'16 (c) 2016, C. Faloutsos 3
http://www.cs.cmu.edu/~christos/TALKS/16-09-AMAZON/
CMU SCS
(c) 2016, C. Faloutsos 4
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Conclusions
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 5
Graphs - why should we care? • Recommendation systems
• App-stores + reviews
Amazon'16
Yelp ….
CMU SCS
(c) 2016, C. Faloutsos 6
Graphs - why should we care?
Amazon'16
~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’
CMU SCS
(c) 2016, C. Faloutsos 7
Graphs - why should we care?
>$10B; ~1B users
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 8
Graphs - why should we care?
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 9
Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and
anomaly detection • ....
• Many-to-many db relationship -> graph
Amazon'16
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
Amazon'16 (c) 2016, C. Faloutsos 10
time
destination
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
Amazon'16 (c) 2016, C. Faloutsos 11
time
destination
Patterns anomalies
CMU SCS
Motivating problems • P1: patterns? Fraud detection?
• P2: patterns in time-evolving graphs / tensors
Amazon'16 (c) 2016, C. Faloutsos 12
time
destination
Patterns anomalies*
* Robust Random Cut Forest Based Anomaly Detection on Streams Sudipto Guha, Nina Mishra , Gourav Roy, Okke Schrijvers, ICML’16
CMU SCS
(c) 2016, C. Faloutsos 13
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns & fraud detection • Part#2: time-evolving graphs; tensors • Conclusions
Amazon'16
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 14
Part 1: Patterns, &
fraud detection
CMU SCS
(c) 2016, C. Faloutsos 15
Laws and patterns • Q1: Are real graphs random?
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 16
Laws and patterns • Q1: Are real graphs random? • A1: NO!!
– Diameter (‘6 degrees’; ‘Kevin Bacon’) – in- and out- degree distributions – other (surprising) patterns
• So, let’s look at the data
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 17
Solution# S.1 • Power law in the degree distribution [Faloutsos x 3
SIGCOMM99]
log(rank)
log(degree)
internet domains
att.com
ibm.com
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 18
Solution# S.1 • Power law in the degree distribution [Faloutsos x 3
SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 19
Solution# S.1 • Q: So what?
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 20
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs:
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
= friends of friends (F.O.F.)
CMU SCS
(c) 2016, C. Faloutsos 21
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
= friends of friends (F.O.F.)
CMU SCS
(c) 2016, C. Faloutsos 22
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: 100^2 * N= 10 Trillion
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
= friends of friends (F.O.F.)
CMU SCS
(c) 2016, C. Faloutsos 23
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
~0.8PB -> a data center(!)
DCO @ CMU
Gaussian trap
= friends of friends (F.O.F.)
CMU SCS
(c) 2016, C. Faloutsos 24
Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
Amazon'16
~0.8PB -> a data center(!)
Gaussian trap
CMU SCS
(c) 2016, C. Faloutsos 30
Solution# S.2: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix (‘eig()’)
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
Amazon'16
A x = λ x
CMU SCS
(c) 2016, C. Faloutsos 31
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns: Degree; Triangles – Anomaly/fraud detection – Graph understanding
• Part#2: time-evolving graphs; tensors • Conclusions
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 32
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 33
Solution# S.3: Triangle ‘Laws’
• Real social networks have a lot of triangles – Friends of friends are friends
• Any patterns? – 2x the friends, 2x the triangles ?
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 34
Triangle Law: #S.3 [Tsourakakis ICDM 2008]
SN Reuters
Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 35
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax
2) Q: Can we do that quickly? A:
details
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 36
Triangle Law: Computations [Tsourakakis ICDM 2008]
But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax
2) Q: Can we do that quickly? A: Yes!
#triangles = 1/6 Sum ( λi3 )
(and, because of skewness (S2) , we only need the top few eigenvalues! - O(E)
Amazon'16
A x = λ x
details
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
37 Amazon'16 37 (c) 2016, C. Faloutsos
? ?
?
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
38 Amazon'16 38 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
39 Amazon'16 39 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
40 Amazon'16 40 (c) 2016, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11]
41 Amazon'16 41 (c) 2016, C. Faloutsos
CMU SCS
MORE Graph Patterns
Amazon'16 (c) 2016, C. Faloutsos 42
✔ ✔ ✔
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.
CMU SCS
MORE Graph Patterns
Amazon'16 (c) 2016, C. Faloutsos 43
• Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal)
• Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool.
CMU SCS
(c) 2016, C. Faloutsos 44
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’) • Belief Propagation
• Part#2: time-evolving graphs; tensors • Conclusions Amazon'16
Patterns anomalies
CMU SCS
Fraud • Given
– Who ‘likes’ what page, and when
• Find – Suspicious users and suspicious
products
Amazon'16 (c) 2016, C. Faloutsos 45
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
CMU SCS
Fraud • Given
– Who ‘likes’ what page, and when
• Find – Suspicious users and suspicious
products
Amazon'16 (c) 2016, C. Faloutsos 46
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes Amazon'16 47 (c) 2016, C. Faloutsos
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
B CD A E
Reordered Pages
4
3 2 1
Reord
ere
d U
sers
0
10
20
30
40
50
60
70
80
90
Tim
e
Likes Amazon'16 48 (c) 2016, C. Faloutsos
CMU SCS
Our intuition ▪ Lockstep behavior: Same Likes, same time
Graph Patterns and Lockstep Behavior
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Pages
Users
0
10
20
30
40
50
60
70
80
90
Tim
e
B CD A E
Reordered Pages
4
3 2 1
Reord
ere
d U
sers
0
10
20
30
40
50
60
70
80
90
Tim
e
Suspicious Lockstep Behavior Likes
Amazon'16 49 (c) 2016, C. Faloutsos
CMU SCS
MapReduce Overview ▪ Use Hadoop to search for
many clusters in parallel:
1. Start with randomly seed
2. Update set of Pages and center Like times for each cluster
3. Repeat until convergence
Users Pages
A
B
C
D
E
1
2
3
4
40 Z
Likes
Amazon'16 50 (c) 2016, C. Faloutsos
CMU SCS
Deployment at Facebook ▪ CopyCatch runs regularly (along with many other
security mechanisms, and a large Site Integrity team)
08/25 09/08 09/22 10/06 10/20 11/03 11/17 12/01
Num
ber
of u
sers
cau
ght
Date of CopyCatch run
3 months of CopyCatch @ Facebook
#users caught
time Amazon'16 51 (c) 2016, C. Faloutsos
CMU SCS
Deployment at Facebook
23%
58%
5%9%
5%
Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering
Manually labeled 22 randomly selected clusters from February 2013
Fake acct
Amazon'16 52 (c) 2016, C. Faloutsos
CMU SCS
Deployment at Facebook
23%
58%
5%9%
5%
Fake AccountsMalicious Browser ExtensionsOS MalwareCredential StealingSocial Engineering
Manually labeled 22 randomly selected clusters from February 2013
Most clusters (77%) come from real but compromised users
Fake acct
Amazon'16 53 (c) 2016, C. Faloutsos
CMU SCS
(c) 2016, C. Faloutsos 54
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’) • Belief Propagation
• Part#2: time-evolving graphs; tensors • Conclusions Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 55
Problem: Social Network Link Fraud
Amazon'16
Target: find “stealthy” attackers missed by other algorithms
Clique
Bipartite core
41.7M nodes 1.5B edges
CMU SCS
(c) 2016, C. Faloutsos 56
Problem: Social Network Link Fraud
Amazon'16
Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective. ICDM 2014, Shenzhen, China.
Target: find “stealthy” attackers missed by other algorithms
Takeaway: use reconstruction error between true/latent representation!
CMU SCS
(c) 2016, C. Faloutsos 57
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’, suspiciousness) • Belief Propagation
• Part#2: time-evolving graphs; tensors • Conclusions Amazon'16
CMU SCS
Suspicious Patterns in Event Data
Amazon'16 (c) 2016, C. Faloutsos 58
A General Suspiciousness Metric for Dense Blocks in Multimodal Data, Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and Christos Faloutsos, ICDM, 2015.
2-modes
n-modes
?
? ?
CMU SCS
Suspicious Patterns in Event Data
Amazon'16 (c) 2016, C. Faloutsos 59
Which is more suspicious? 20,000 Users Retweeting same 20 tweets 6 times each All in 10 hours
225 Users Retweeting same 1 tweet 15 times each All in 3 hours All from 2 IP addresses
vs.
ICDM 2015
Answer: volume * DKL(p|| pbackground)
CMU SCS
Suspicious Patterns in Event Data
Amazon'16 (c) 2016, C. Faloutsos 60
A General Suspiciousness Metric for Dense Blocksin Multimodal Data
AnonymousAnonymousAnonymous
AnonymousAnonymousAnonymous
AnonymousAnonymousAnonymous
AnonymousAnonymousAnonymous
Abstract—Which seems more suspicious: 5,000 tweets from200 users on 5 IP addresses, or 10,000 tweets from 500 userson 500 IP addresses but all with the same hashtag and all in10 minutes? The literature has many methods that try to finddense blocks in matrices, and, recently, tensors, but no methodgives a principled way to score the suspiciouness of dense blockswith different numbers of modes and rank them to draw humanattention accordingly. Dense blocks are worth inspecting, typicallyindicating fraud, emerging trends, or some other noteworthydeviation from the usual. Our main contribution is that we showhow to unify these methods and how to give a principled answer toquestions like the above. Specifically, (a) we give a list of axiomsthat any metric of suspicousness should satisfy; (b) we propose anintuitive, principled metric that satisfies the axioms, and is fast tocompute; (c) we propose CROSSSPOT, an algorithm to spot denseregions, and sort them in importance (“suspiciousness”) order.Finally, we apply CROSSSPOT to real data, where it improvesthe F1 score over previous techniques by 68% and finds retweet-boosting and hashtag-hijacking in social datasets spanning 0.3
billion posts.
I. INTRODUCTION
Imagine your job at Twitter is to detect when fraudstersare trying to manipulate the most popular tweets for a giventrending topic. Given time pressure, which is more worthy ofyour investigation: 2,000 Twitter users, all retweeting the same20 tweets, 4 to 6 times each; or 225 Twitter users, retweetingthe same 1 tweet, 10 to 15 times each? Now, what if the latterbatch of activity happened within 3 hours, while the formerspanned 10 hours? What if all 225 users of the latter groupused the same 2 IP addresses?
Figure 1 shows an example of these patterns from TencentWeibo, one of the largest microblogging platforms in China;our method CROSSSPOT detected a block of 225 users, using2 IP addresses (“blue circle” and “red cross”), retweeting thesame tweet 27K times, within 200 minutes. Further, manualinspection shows that several of these users get activated every5 minutes. This type of lockstep behavior is suspicious (say,due to automated scripts), and it leads to dense blocks, asin Figure 1. These blocks may span several modes (user-id,timestamp, hashtag, etc.). Although our main motivation isfraud detection in a Twitter-like setting, our proposed approachis suitable for numerous other settings, like distributed-denial-of-service (DDoS) attacks, link fraud, click fraud, even health-insurance fraud, as we discuss next.
Thus, the core question we ask in this paper is: what isthe right way to compare the severity/suspiciousness/surpriseof two dense blocks, that span 2 or more modes? Informally,the problem is:
200 Minutes
225Users
27,313Retweets
...
· · ·
10:09
10:14
10:19
10:24
10:32
10:37
11:04
15:37
21:13
. . .“John123” l : : l l l“Joe456” l : l : : : :“Job789” l : l : l l l“Jack135” : l : : : l : l“Jill246” : l l l :
“Jane314” : : l l l l l“Jay357” l l l l : l
.
.
.
Fig. 1. Density in multiple modes is suspicious. Left: A dense block of 225users on Tencent Weibo (Chinese Twitter) retweeting one tweet 27,313 timesfrom 2 IP addresses over 200 minutes. Right: magnification of a subset of thisblock. l and : indicate the two IP addresses used. Notice how synchronizedthe behavior is, across several modes (IP-address, user-id, timestamp).
Informal Problem 1 (Suspiciousness score) Given a K-mode dataset (tensor) X , with counts of events (that arenon-negative integer values), and two subtensors Y
1
and Y2
,which is more suspicious and worthy of further investigation?
Why multimodal data (tensor): Graphs and social networkshave attracted huge interest - and they are perfectly modeled asK=2 mode datasets, that is, matrices. With K=2 modes we canmodel Twitter’s “who-follows-whom” network [1][2], Face-book’s “who-friends-whom” and “who-Likes-what” graphs[3], eBay’s “who-buys-from-whom” graph [4], financial ac-tivities of “who-trades-what-stocks”, and scientific relationsof “who-cites-whom.” Several high-impact datasets make useof higher mode relations. With K=3 modes, we can considerhow all of the above graphs change over time or what wordsare used in product reviews on eBay or Amazon. With K=4modes, we can analyze network traces for intrusion detectionand distributed denial of service (DDoS) attacks by lookingfor patterns in the source IP, destination IP, destination port,and timestamp [5][6]. Health-insurance fraud detection isanother example, with (patient-id, doctor-id, prescription-id,timestamp) where corrupt doctors prescribe fake, expensivemedicine to senile or corrupt patients [7][8].
Why are dense regions worth inspecting: Dense regionsare surprising in all of the examples above. Past work hasrepeatedly found that dense regions in these tensors correspondto suspicious, lockstep behavior: Purchased Page Likes onFacebook result in a few users “Liking” the same “Pages”always at the same time (when the order for the Page Likes isplaced) [3]. Spammers paid to write deceptively high (or low)reviews for restaurants or hotels will reuse the same accountsand often even the same text [9][10]. Zombie followers, botnetswho are set up to build social links, will inflate the numberof followers to make their customers seem more popularthan they actually are [1][11]. This high-density outcome hasa reason: Spammers have constrained resources (users, IP
Retweeting: “Galaxy Note Dream Project: Happy Happy Life Traveling the World”
Recall0 0.5 1
Prec
isio
n
0
0.2
0.4
0.6
0.8
1
CrossSpotHOSVD (r = 20)HOSVD (r = 10)HOSVD (r = 5)MAF
CrossSpot
Mass of injected 30*30*30 blocks512 256 128 64 32 16
Rec
all
0
0.2
0.4
0.6
0.8
1
HOSVDCrossSpot (HOSVD seed)
CrossSpot
HOSVD
(a) Performance (b) Recall with HOSVD seed
Fig. 3. Finding dense blocks: CROSSSPOT outperforms baselines in finding3-mode blocks, and directly method improves the recall on top of HOSVD.
• SVD has small recall but high precision. However, theSVD can hardy catch small, sparse injected blocks suchas 30⇥30 submatrices of mass 16 and 32, even thoughthey are denser than the background. Higher decomposi-tion rank brings higher classification accuracy.
Finding dense high-order blocks in multimodal data: Wegenerate random tensor data with parameters as (1) the numberof modes k=3, (2) the size of data N
1
=1,000, N2
=1,000and N
3
=1,000 and (3) the mass of data C=10,000. Weinject b=6 blocks of k0=3 modes into the random data, so,I = {1, 2, 3}. Each block has size 30⇥30⇥30 and massc2{16, 32, 64, 128, 256, 512}. The task is again to classify thetensor entries into suspicious and normal classes. Figure 3(a)reports the performances of CROSSSPOT and baselines. Weobserve that in order to find all the six 3-mode injected blocks,our proposed CROSSSPOT has better performance in precisionand recall than baselines. The best F1 score CROSSSPOT givesis 0.891, which is 46.0% higher than the F1 score given bythe best of HOSVD (0.610). If we use the results of HOSVDas seeds to CROSSSPOT, the best F1 score of CROSSSPOTreaches 0.979. Figure 3(b) gives the recall value of everyinjected block. We observe that CROSSSPOT improves therecall over HOSVD, especially on slightly sparser blocks.
Finding dense low-order blocks in multimodal data: Wegenerate random tensor data with parameters as (1) the numberof modes k=3, (2) the size of data N
1
=1,000, N2
=1,000 andN
3
=1,000 and (3) the mass of data C=10,000. We inject b=4blocks into the random data:
• Block #1: The number of modes is k01
=3 and I1
={1,2,3}.The size is 30⇥30⇥30 and the block’s mass is c
1
=512.• Block #2: The number of modes is k0
2
=2 and I2
={1,2}.The size is 30⇥30⇥1,000 and the block’s mass is c
2
=512.• Block #3: k0
3
=2, I3
={1,3}; 30⇥1,000⇥30 of c3
=512.• Block #4: k0
4
=2, I4
={2,3}; 1,000⇥30⇥30 of c4
=512.
Note, blocks 2-4 are dense in only 2 modes and random in thethird mode. Table V report the classification performances ofCROSSSPOT and baselines. We show the overall evaluations(precision, recall and F1 score) and recall value of every block.We observe that CROSSSPOT has 100% recall in catching the3-mode block #1, while the baselines have 85-95% recall.More impressively, CROSSSPOT successfully catches the 2-mode blocks, where HOSVD has difficulty and low recall.The F1 score of overall evaluation is as large as 0.972 with68.8% improvement.
Testing robustness of the random seed number: We test howthe performance of our CROSSSPOT improves when we usemore seed blocks in the low-order block detection experiments.
(a) Robustness (b) Convergence
Fig. 4. CROSSSPOT is robust to the number of random seeds. In detectingthe 4 low-order blocks, when we use 41 seeds, the best F1 score has reportedthe final result of as many as 1,000 seeds. CROSSSPOT converges very fast:the average number of iterations is 2.87.
TABLE VI. BIG DENSE BLOCKS WITH TOP METRIC VALUESDISCOVERED IN THE RETWEETING DATASET.
# User⇥tweet⇥IP⇥minute Mass c Suspiciousness
CROSSSPOT1 14⇥1⇥2⇥1,114 41,396 1,239,8652 225⇥1⇥2⇥200 27,313 777,7813 8⇥2⇥4⇥1,872 17,701 491,323
HOSVD1 24⇥6⇥11⇥439 3,582 131,1132 18⇥4⇥5⇥223 1,942 74,0873 14⇥2⇥1⇥265 9,061 381,211
Figure 4(a) shows the best F1 score for different numbers ofrandom seeds. We find that when we use 41 random seeds, thebest F1 score is close to the results when we use as many as1,000 random seeds. Thus, once we exceed a moderate numberof random seeds, the performance is fairly robust to the numberof random seeds.
Efficiency analysis: CROSSSPOT can be parallelized intomultiple machines to search dense blocks with different setsof random seeds. The time cost of every iteration is linearin the number of non-zero entries in the multimodal dataas we have discussed in Section V. Figure 4(b) reports thecounts of iterations in the procedure of 1000 random seeds.We observe that usually CROSSSPOT takes 2 or 3 iterations tofinish the local search. Each iteration takes only 5.6 seconds.Tensor decompositions such as HOSVD and PARAFAC usedin MAF often take more time. On the same machine, HOSVDmethods of rank r=5, 10 and 20 take 280, 1750, 34,510seconds respectively. From Table V and Figure 4(a), evenwithout parallelization, we know that CROSSSPOT takes only230 seconds to have the best F1 score 0.972, while HOSVDneeds more time (280 seconds if r=5) to have a much smallerF1 score 0.324.
D. Retweeting Boosting
Table VI shows big, dense block patterns of TencentWeibo retweeting dataset. CROSSSPOT reports blocks of highmass and high density. For example, we spot that 14 usersretweet the same content for 41,396 times on 2 IP addressesin 19 hours. Their coordinated, suspicious behaviors resultin a few tweets that seem extremely popular. We observethat CROSSSPOT catches more suspicious (bigger and denser)blocks than HOSVD does: HOSVD evaluates the number ofretweets per user, item, IP, or minute, but does not considerthe block’s density, mass nor the background.
Table VII shows an example of retweeting boosting fromthe big, dense 225⇥1⇥2⇥200 block reported by our proposedCROSSSPOT. A group of users (e.g., A, B, C) retweet thesame message “Galaxy note dream project: Happy happy life
ICDM 2015
CMU SCS
(c) 2016, C. Faloutsos 67
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’) • Belief Propagation
• Part#2: time-evolving graphs; tensors • Conclusions Amazon'16
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 68
E-bay Fraud detection
w/ Polo Chau & Shashank Pandit, CMU [www’07]
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 69
E-bay Fraud detection
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 70
E-bay Fraud detection
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 71
E-bay Fraud detection - NetProbe
CMU SCS
Popular press
And less desirable attention: • E-mail from ‘Belgium police’ (‘copy of
your code?’) Amazon'16 (c) 2016, C. Faloutsos 72
CMU SCS
(c) 2016, C. Faloutsos 84
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs
– Patterns – Anomaly / fraud detection
• CopyCatch • Spectral methods (‘fBox’) • Belief Propagation; fast computation & unification
• Part#2: time-evolving graphs; tensors • Conclusions Amazon'16
CMU SCS
UnifyingGuilt-by-AssociationApproaches:TheoremsandFastAlgorithms
Danai Koutra U Kang
Hsing-Kuo Kenneth Pao
Tai-You Ke Duen Horng (Polo) Chau
Christos Faloutsos
ECML PKDD, 5-9 September 2011, Athens, Greece
CMU SCS Problem Definition:
GBA techniques
(c) 2016, C. Faloutsos 86
Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects)
Amazon'16
CMU SCS
Are they related? • RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are important, I’m important, too’)
• SSL (Semi-supervised learning) – minimize the differences among neighbors
• BP (Belief propagation) – send messages to neighbors, on what you
believe about them
Amazon'16 (c) 2016, C. Faloutsos 87
CMU SCS
Are they related? • RWR (Random Walk with Restarts)
– google’s pageRank (‘if my friends are important, I’m important, too’)
• SSL (Semi-supervised learning) – minimize the differences among neighbors
• BP (Belief propagation) – send messages to neighbors, on what you
believe about them
Amazon'16 (c) 2016, C. Faloutsos 88
YES!
CMU SCS
Correspondence of Methods
(c) 2016, C. Faloutsos 89
Method Matrix Unknown known RWR [I – c AD-1] × x = (1-c)ySSL [I + a(D - A)] × x = y
FABP [I + a D - c’A] × bh = φh
0 1 0 1 0 1 0 1 0
? 0 1 1
1 1 1
d1 d2
d3 final
labels/ beliefs
prior labels/ beliefs
adjacency matrix
Amazon'16
CMU SCS Results: Scalability
(c) 2016, C. Faloutsos 90
FABP is linear on the number of edges.
# of edges (Kronecker graphs)
runt
ime
(min
)
Amazon'16
CMU SCS
Summary of Part#1 • *many* patterns in real graphs
– Power-laws everywhere – Gaussian trap
• Avg << Max
– Long (and growing) list of tools for anomaly/fraud detection
Amazon'16 (c) 2016, C. Faloutsos 92
Patterns anomalies
CMU SCS
(c) 2016, C. Faloutsos 93
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs – [P2.2: with side information (‘coupled’ M.T.F.) – Speed]
• Conclusions
Amazon'16
CMU SCS
Amazon'16 (c) 2016, C. Faloutsos 94
Part 2: Time evolving
graphs; tensors
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 95
smith
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 96
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 97
Mon Tue
CMU SCS
Graphs over time -> tensors! • Problem #2.1:
– Given who calls whom, and when – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 98 callee
caller
CMU SCS
Graphs over time -> tensors! • Problem #2.1’:
– Given author-keyword-date – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 99 keyword
author
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2.1’’:
– Given subject – verb – object facts – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 100 object
subject
MANY more settings, with >2 ‘modes’
CMU SCS
Graphs over time -> tensors! • Problem #2.1’’’:
– Given <triplets> – Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 101 mode2
mode1
MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)
CMU SCS
Graphs & side info • Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts • And voxel-activity for each subject-word
– Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 102 object
subject
fMRI voxel activity `apple tastes sweet’
SKIP
CMU SCS
Graphs & side info • Problem #2.2: coupled (eg., side info)
– Given subject – verb – object facts • And voxel-activity for each subject-word
– Find patterns / anomalies
Amazon'16 (c) 2016, C. Faloutsos 103 ‘sweet’
‘apple’
fMRI voxel activity `apple tastes sweet’
‘apple’
SKIP
CMU SCS
(c) 2016, C. Faloutsos 104
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: time-evolving graphs – [P2.2: with side information (‘coupled’ M.T.F.) – Speed]
• Conclusions
Amazon'16
CMU SCS Answer to both: tensor
factorization • Recall: (SVD) matrix factorization: finds
blocks
Amazon'16 (c) 2016, C. Faloutsos 105
N users
M products
‘meat-eaters’ ‘steaks’
‘vegetarians’ ‘plants’
‘kids’ ‘cookies’
~ + +
CMU SCS Answer to both: tensor
factorization • PARAFAC decomposition
Amazon'16 (c) 2016, C. Faloutsos 106
= + + subject
object
politicians artists athletes
CMU SCS
Answer: tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when
– 4M x 15 days
Amazon'16 (c) 2016, C. Faloutsos 107
= + + caller
callee
?? ?? ??
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
Amazon'16 108 (c) 2016, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day!
1 caller 5 receivers 4 days of activity
Amazon'16 109 (c) 2016, C. Faloutsos
=
CMU SCS Anomaly detection in time-
evolving graphs
• Anomalous communities in phone call data: – European country, 4M clients, data over 2 weeks
~200 calls to EACH receiver on EACH day! Amazon'16 110 (c) 2016, C. Faloutsos
=
Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.
CMU SCS
(c) 2016, C. Faloutsos 111
Roadmap
• Introduction – Motivation • Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors
– P2.1: Discoveries @ phonecall network – [P2.2: Discoveries in neuro-semantics – Speed]
• Conclusions
Amazon'16
CMU SCS Coupled Matrix-Tensor Factorization
(CMTF)
Y X
a1 aF
b1 bF
c1 cF
a1 aF
d1 dF
Amazon'16 112 (c) 2016, C. Faloutsos
+
+
user
product
User-attr.
CMU SCS Coupled Matrix-Tensor Factorization
(CMTF)
Y X
a1 aF
b1 bF
c1 cF
a1 aF
d1 dF
Amazon'16 113 (c) 2016, C. Faloutsos
+
+
CMU SCS
Neuro-semantics
13�
�
3.�Additional�Figures�and�legends��
Figure�S1.�Presentation�and�set�of�exemplars�used�in�the�experiment. Participants were presented 60 distinct word-picture pairs describing common concrete nouns. These consisted of 5 exemplars from each of 12 categories, as shown above. A slow event-related paradigm was employed, in which the stimulus was presented for 3s, followed by a 7s fixation period during which an X was presented in the center of the screen. Images were presented as white lines and characters on a dark background, but are inverted here to improve readability. The entire set of 60 exemplars was presented six times, randomly permuting the sequence on each presentation.
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
Amazon'16 114 (c) 2016, C. Faloutsos
*Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html
SKIP
CMU SCS
Neuro-semantics
13�
�
3.�Additional�Figures�and�legends��
Figure�S1.�Presentation�and�set�of�exemplars�used�in�the�experiment. Participants were presented 60 distinct word-picture pairs describing common concrete nouns. These consisted of 5 exemplars from each of 12 categories, as shown above. A slow event-related paradigm was employed, in which the stimulus was presented for 3s, followed by a 7s fixation period during which an X was presented in the center of the screen. Images were presented as white lines and characters on a dark background, but are inverted here to improve readability. The entire set of 60 exemplars was presented six times, randomly permuting the sequence on each presentation.
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
Amazon'16 115 (c) 2016, C. Faloutsos
Patterns?
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
SKIP
CMU SCS
Neuro-semantics
13�
�
3.�Additional�Figures�and�legends��
Figure�S1.�Presentation�and�set�of�exemplars�used�in�the�experiment. Participants were presented 60 distinct word-picture pairs describing common concrete nouns. These consisted of 5 exemplars from each of 12 categories, as shown above. A slow event-related paradigm was employed, in which the stimulus was presented for 3s, followed by a 7s fixation period during which an X was presented in the center of the screen. Images were presented as white lines and characters on a dark background, but are inverted here to improve readability. The entire set of 60 exemplars was presented six times, randomly permuting the sequence on each presentation.
• Brain Scan Data*
• 9 persons • 60 nouns
• Questions • 218 questions • ‘is it alive?’, ‘can
you eat it?’
Tofullyspecify
amodelwithint
hiscom-
putationalmode
lingframework,o
nemustfirst
defineasetofin
termediateseman
ticfeatures
f 1(w)f 2(w)…f n(w)tobeextract
edfromthetext
corpus.Inthispa
per,eachintermed
iatesemantic
featureisdefined
intermsoftheco
-occurrence
statisticsofthein
putstimuluswo
rdwwitha
particularotherwo
rd(e.g.,“taste”)or
setofwords
(e.g.,“taste,”“tas
tes,”or“tasted”)w
ithinthetext
corpus.Themode
listrainedbythe
applicationof
multipleregressio
ntothesefeature
sf i(w)andthe
observedfMRIim
ages,soastoobt
ainmaximum-
likelihoodestima
tesforthemodel
parametersc vi
(26).Oncetrained,
thecomputational
modelcanbe
evaluatedbygivin
gitwordsoutside
thetraining
setandcomparin
gitspredictedfM
RIimagesfor
thesewordswith
observedfMRIda
ta.Thiscom
putationalmodeli
ngframeworkis
basedontwokey
theoreticalassump
tions.First,it
assumesthesema
nticfeaturesthatd
istinguishthe
meaningsofarbit
raryconcretenoun
sarereflected
inthestatisticsof
theirusewithina
verylargetext
corpus.Thisassum
ptionisdrawnfro
mthefieldof
computationallin
guistics,wherest
atisticalword
distributionsare
frequentlyusedt
oapproximate
themeaningofd
ocumentsandw
ords(14–17).
Second,itassume
sthatthebrainact
ivityobserved
whenthinkinga
boutanyconcrete
nouncanbe
derivedasaweig
htedlinearsumo
fcontributions
fromeachofits
semanticfeatures.
Althoughthe
correctnessofthis
linearityassumpt
ionisdebat-
able,itisconsist
entwiththewide
spreaduseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptionthatf
MRIactivationo
ftenreflectsa
linearsuperpositio
nofcontributions
fromdifferent
sources.Ourtheore
ticalframeworkdo
esnottakea
positiononwheth
ertheneuralactiv
ationencoding
meaningislocali
zedinparticular
corticalre-
gions.Instead,it
considersallcort
icalvoxelsand
allowsthetrainin
gdatatodetermin
ewhichloca-
tionsaresystemat
icallymodulated
bywhichas-
pectsofwordme
anings.
Results.Weevaluated
thiscomputational
mod-elusing
fMRIdatafromn
inehealthy,college
-ageparticipan
tswhoviewed60
differentword-pic
turepairspre
sentedsixtimes
each.Anatomicall
yde-finedreg
ionsofinterestwe
reautomaticallyla
beledaccording
tothemethodolog
yin(28).The60
ran-domlyo
rderedstimuliinc
ludedfiveitems
fromeachof1
2semanticcatego
ries(animals,body
parts,buildings
,buildingparts,clo
thing,furniture,in
sects,kitcheni
tems,tools,vegeta
bles,vehicles,and
otherman-mad
eitems).Areprese
ntativefMRIimage
foreachstim
uluswascreatedb
ycomputingthem
eanfMRIre
sponseoveritssi
xpresentations,a
ndthemeanof
all60oftheserep
resentativeimages
wasthensub
tractedfromeach
[fordetails,see(
26)].Toinstan
tiateourmodeling
framework,wefir
stchoseas
etofintermediate
semanticfeatures.
Tobeeffective,
theintermediatese
manticfeaturesm
ustsimultane
ouslyencodethew
idevarietyofsema
nticcontento
ftheinputstimul
uswordsandfacto
rtheobserved
fMRIactivationin
tomoreprimitivec
om-
Predicted
“celery” = 0.84
“celery”“airplan
e”
Predicted:
Observed:
AB
+...
high
average below average
Predicted “celer
y”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predicting
fMRIimages
forgivenstimulus
words.(A)
Formingapredict
ionforpar-
ticipantP1fort
hestimulus
word“celery”afte
rtrainingon
58otherwords.Le
arnedc vico-efficients
for3ofthe25s
e-manticfe
atures(“eat,”“tast
e,”and“fill”
)aredepictedby
thevoxelcolo
rsinthethreeima
gesatthetop
ofthepanel.Thec
o-occurrenc
evalueforeacho
fthesefeaturesfor
thestimulusword
“celery”is
showntothelefto
ftheirrespectiveim
ages[e.g.,thevalu
efor“eat(celery)”i
s0.84].The
predictedactivation
forthestimuluswo
rd[shownatthebo
ttomof(A)]isal
inearcombinationo
fthe25semantic
fMRIsignatures,w
eightedby
theirco-occurrence
values.Thisfigure
showsjustonehor
izontalslice[z=
–12mminMontr
ealNeurological
Institute(MNI)spa
ce]ofthepredict
edthree-dim
ensionalimage.(
B)Predictedand
observedfMRIima
gesfor“celery”a
nd“airplane”after
trainingthatuses5
8otherwords.The
twolongredandb
lueverticalstreaks
nearthetop(post
eriorregion)ofthe
predictedandobse
rvedimagesaret
heleftandrightfu
siformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Locations
ofmostac
curatelypre-
dictedvoxels.Sur
face(A)andg
lassbrain(B)
renderingofthecor
rela-tionbetw
eenpredicted
andactualvoxelac
tiva-tionsfor
wordsoutside
thetrainingsetfor
par-ticipantP
5.Thesepanelssho
wclusterscontaining
atleast10contigu
ousvoxels,eachof
whosepredicted-
actualcorrelationis
atleast0.28.These
voxelclustersared
istributedthroughou
tthecortexan
dlocatedinthelef
tandrightoccipit
alandparietallobe
s;leftandrightfu
siform,postcentra
l,andmiddlefronta
lgyri;leftinferiorfr
ontalgyrus;medial
frontalgyrus;anda
nteriorcingulate.
(C)Surfacerenderi
ngofthepredicted-
actualcorrelation
averagedoveralln
ineparticipan
ts.Thispanelrepres
entsclusterscontai
ningatleast10co
ntiguousvoxels,ea
chwithaveragec
orrelationofatleast
0.14.
30MAY2008V
OL320SCIENC
Ewww.science
mag.org1192RESEAR
CHARTICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithin
thiscom-
putationalmode
lingframework
,onemustfirst
defineasetof
intermediatese
manticfeatures
f 1(w)f 2(w)…f n(w)tobeextrac
tedfromthetext
corpus.Inthisp
aper,eachinter
mediatesemanti
cfeature
isdefinedinte
rmsoftheco-o
ccurrence
statisticsofthe
inputstimulus
wordwwitha
particularotherw
ord(e.g.,“taste”
)orsetofword
s(e.g.,“ta
ste,”“tastes,”or
“tasted”)within
thetextcorpus.
Themodelistra
inedbytheappl
icationof
multipleregressio
ntothesefeatu
resf i(w)andthe
observedfMRI
images,soasto
obtainmaximum
-likelihoo
destimatesfor
themodelparam
etersc vi
(26).Oncetraine
d,thecomputatio
nalmodelcanbe
evaluatedbygiv
ingitwordsou
tsidethetrainin
gsetand
comparingitsp
redictedfMRIim
agesforthesew
ordswithobser
vedfMRIdata.
Thiscomputatio
nalmodeling
frameworkis
basedontwok
eytheoreticalas
sumptions.First
,itassumes
thesemanticfea
turesthatdisting
uishthemeaning
sofarbitraryco
ncretenounsare
reflected
inthestatisticso
ftheirusewithi
naverylargetex
tcorpus.
Thisassumption
isdrawnfromthe
fieldofcomputa
tionallinguistic
s,wherestatist
icalword
distributionsare
frequentlyused
toapproximate
themeaningof
documentsand
words(14–17).
Second,itassum
esthatthebrain
activityobserve
dwhenth
inkingaboutan
yconcretenou
ncanbe
derivedasawe
ightedlinearsu
mofcontributio
nsfromea
chofitsseman
ticfeatures.Alth
oughthe
correctnessofth
islinearityassu
mptionisdebat
-able,it
isconsistentw
iththewidesprea
duseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptiontha
tfMRIactivatio
noftenreflects
alinearsu
perpositionofc
ontributionsfrom
differentsources.
Ourtheoreticalfr
ameworkdoesn
ottakeaposition
onwhetherthen
euralactivation
encoding
meaningisloc
alizedinpartic
ularcorticalre
-gions.I
nstead,itconsid
ersallcorticalv
oxelsand
allowsthetrain
ingdatatodeterm
inewhichloca-
tionsaresystem
aticallymodulat
edbywhichas-
pectsofwordm
eanings.
Results.Weevaluate
dthiscomputatio
nalmod-
elusingfMRId
atafromninehea
lthy,college-age
participantswho
viewed60diffe
rentword-pictur
epairspr
esentedsixtime
seach.Anatom
icallyde-
finedregionsofi
nterestwereauto
maticallylabele
daccordin
gtothemethodo
logyin(28).Th
e60ran-
domlyordered
stimuliincluded
fiveitemsfrom
eachof12sema
nticcategories(a
nimals,bodypart
s,building
s,buildingparts
,clothing,furnit
ure,insects,
kitchenitems,to
ols,vegetables,
vehicles,andoth
erman-ma
deitems).Arepr
esentativefMRI
imagefor
eachstimuluswa
screatedbycomp
utingthemean
fMRIresponse
overitssixpres
entations,andth
emeanof
all60oftheser
epresentativeim
ageswas
thensubtractedf
romeach[ford
etails,see(26)]
.Toinsta
ntiateourmodeli
ngframework,w
efirstchosea
setofintermedia
tesemanticfeatu
res.Tobe
effective,theint
ermediatesema
nticfeaturesmu
stsimultan
eouslyencodeth
ewidevarietyof
semanticcontent
oftheinputstim
uluswordsand
factorthe
observedfMRIa
ctivationintomor
eprimitivecom-
Predicted
“celery” = 0.84
“celery”
“airplane”
Predicted:
Observed:
AB
+...
high
average below averag
e
Predicted “cele
ry”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predictin
gfMRIimages
forgivenstimu
luswords.(A)
Formingapredic
tionforpar-
ticipantP1for
thestimulus
word“celery”aft
ertrainingon
58otherwords.L
earnedcvico-
efficientsfor3of
the25se-
manticfeatures
(“eat,”“taste,”
and“fill”)ared
epictedbythe
voxelcolorsinth
ethreeimages
atthetopofthe
panel.Theco-
occurrencevalue
foreachofthese
featuresforthes
timulusword“ce
lery”isshownto
theleftoftheirre
spectiveimages[
e.g.,thevaluefor
“eat(celery)”is
0.84].Thepredict
edactivationfor
thestimulusword
[shownatthebo
ttomof(A)]isa
linearcombinatio
nofthe25sema
nticfMRIsignatu
res,weightedby
theirco-occurren
cevalues.Thisf
igureshowsjust
onehorizontals
lice[z=
–12mminMont
realNeurologica
lInstitute(MNI)
space]ofthepr
edictedthree-dim
ensionalimage.
(B)Predictedan
dobservedfMR
Iimagesfor
“celery”and“air
plane”aftertrain
ingthatuses58
otherwords.The
twolongredand
blueverticalstre
aksnearthetop
(posteriorregion
)ofthepredicte
dandobs
ervedimagesare
theleftandrigh
tfusiformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Location
sofmostac
curatelypre-
dictedvoxels.S
urface(A)and
glassbrain(B)
renderingofthe
correla-tionbet
weenpredicted
andactualvoxel
activa-tionsfor
wordsoutside
thetrainingsetf
orpar-ticipantP
5.Thesepanelssh
owclusterscontai
ningatleast10c
ontiguousvoxels,
eachofwhose
predicted-actualc
orrelationisatleas
t0.28.Thesevox
elclustersaredis
tributedthrougho
utthecortexan
dlocatedinthele
ftandrightoccip
italandparietal
lobes;leftandri
ghtfusiform,
postcentral,andm
iddlefrontalgyri;
leftinferiorfronta
lgyrus;medialfron
talgyrus;andant
eriorcingulate
.(C)Surfaceren
deringofthepr
edicted-actualcor
relationaveraged
overallnine
participants.This
panelrepresents
clusterscontaining
atleast10contig
uousvoxels,each
withaverage
correlationofatle
ast0.14.
30MAY2008
VOL320SCI
ENCEwww.sc
iencemag.org
1192RESEARCHART
ICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithint
hiscom-
putationalmode
lingframework,o
nemustfirst
defineasetofin
termediateseman
ticfeatures
f 1(w)f 2(w)…f n(w)tobeextract
edfromthetext
corpus.Inthispa
per,eachintermed
iatesemantic
featureisdefined
intermsoftheco
-occurrence
statisticsofthein
putstimuluswo
rdwwitha
particularotherwo
rd(e.g.,“taste”)or
setofwords
(e.g.,“taste,”“tas
tes,”or“tasted”)w
ithinthetext
corpus.Themode
listrainedbythe
applicationof
multipleregressio
ntothesefeature
sf i(w)andthe
observedfMRIim
ages,soastoobt
ainmaximum-
likelihoodestima
tesforthemodel
parametersc vi
(26).Oncetrained,
thecomputational
modelcanbe
evaluatedbygivin
gitwordsoutside
thetraining
setandcomparin
gitspredictedfM
RIimagesfor
thesewordswith
observedfMRIda
ta.Thiscom
putationalmodeli
ngframeworkis
basedontwokey
theoreticalassump
tions.First,it
assumesthesema
nticfeaturesthatd
istinguishthe
meaningsofarbit
raryconcretenoun
sarereflected
inthestatisticsof
theirusewithina
verylargetext
corpus.Thisassum
ptionisdrawnfro
mthefieldof
computationallin
guistics,wherest
atisticalword
distributionsare
frequentlyusedt
oapproximate
themeaningofd
ocumentsandw
ords(14–17).
Second,itassume
sthatthebrainact
ivityobserved
whenthinkinga
boutanyconcrete
nouncanbe
derivedasaweig
htedlinearsumo
fcontributions
fromeachofits
semanticfeatures.
Althoughthe
correctnessofthis
linearityassumpt
ionisdebat-
able,itisconsist
entwiththewide
spreaduseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptionthatf
MRIactivationo
ftenreflectsa
linearsuperpositio
nofcontributions
fromdifferent
sources.Ourtheore
ticalframeworkdo
esnottakea
positiononwheth
ertheneuralactiv
ationencoding
meaningislocali
zedinparticular
corticalre-
gions.Instead,it
considersallcort
icalvoxelsand
allowsthetrainin
gdatatodetermin
ewhichloca-
tionsaresystemat
icallymodulated
bywhichas-
pectsofwordme
anings.
Results.Weevaluated
thiscomputational
mod-elusing
fMRIdatafromn
inehealthy,college
-ageparticipan
tswhoviewed60
differentword-pic
turepairspre
sentedsixtimes
each.Anatomicall
yde-finedreg
ionsofinterestwe
reautomaticallyla
beledaccording
tothemethodolog
yin(28).The60
ran-domlyo
rderedstimuliinc
ludedfiveitems
fromeachof1
2semanticcatego
ries(animals,body
parts,buildings
,buildingparts,clo
thing,furniture,in
sects,kitcheni
tems,tools,vegeta
bles,vehicles,and
otherman-mad
eitems).Areprese
ntativefMRIimage
foreachstim
uluswascreatedb
ycomputingthem
eanfMRIre
sponseoveritssi
xpresentations,a
ndthemeanof
all60oftheserep
resentativeimages
wasthensub
tractedfromeach
[fordetails,see(
26)].Toinstan
tiateourmodeling
framework,wefir
stchoseas
etofintermediate
semanticfeatures.
Tobeeffective,
theintermediatese
manticfeaturesm
ustsimultane
ouslyencodethew
idevarietyofsema
nticcontento
ftheinputstimul
uswordsandfacto
rtheobserved
fMRIactivationin
tomoreprimitivec
om-
Predicted
“celery” = 0.84
“celery”“airplan
e”
Predicted:
Observed:
AB
+...
high
average below average
Predicted “celer
y”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predicting
fMRIimages
forgivenstimulus
words.(A)
Formingapredict
ionforpar-
ticipantP1fort
hestimulus
word“celery”afte
rtrainingon
58otherwords.Le
arnedc vico-efficients
for3ofthe25s
e-manticfe
atures(“eat,”“tast
e,”and“fill”
)aredepictedby
thevoxelcolo
rsinthethreeima
gesatthetop
ofthepanel.Thec
o-occurrenc
evalueforeacho
fthesefeaturesfor
thestimulusword
“celery”is
showntothelefto
ftheirrespectiveim
ages[e.g.,thevalu
efor“eat(celery)”i
s0.84].The
predictedactivation
forthestimuluswo
rd[shownatthebo
ttomof(A)]isal
inearcombinationo
fthe25semantic
fMRIsignatures,w
eightedby
theirco-occurrence
values.Thisfigure
showsjustonehor
izontalslice[z=
–12mminMontr
ealNeurological
Institute(MNI)spa
ce]ofthepredict
edthree-dim
ensionalimage.(
B)Predictedand
observedfMRIima
gesfor“celery”a
nd“airplane”after
trainingthatuses5
8otherwords.The
twolongredandb
lueverticalstreaks
nearthetop(post
eriorregion)ofthe
predictedandobse
rvedimagesaret
heleftandrightfu
siformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Locations
ofmostac
curatelypre-
dictedvoxels.Sur
face(A)andg
lassbrain(B)
renderingofthecor
rela-tionbetw
eenpredicted
andactualvoxelac
tiva-tionsfor
wordsoutside
thetrainingsetfor
par-ticipantP
5.Thesepanelssho
wclusterscontaining
atleast10contigu
ousvoxels,eachof
whosepredicted-
actualcorrelationis
atleast0.28.These
voxelclustersared
istributedthroughou
tthecortexan
dlocatedinthelef
tandrightoccipit
alandparietallobe
s;leftandrightfu
siform,postcentra
l,andmiddlefronta
lgyri;leftinferiorfr
ontalgyrus;medial
frontalgyrus;anda
nteriorcingulate.
(C)Surfacerenderi
ngofthepredicted-
actualcorrelation
averagedoveralln
ineparticipan
ts.Thispanelrepres
entsclusterscontai
ningatleast10co
ntiguousvoxels,ea
chwithaveragec
orrelationofatleast
0.14.
30MAY2008V
OL320SCIENC
Ewww.science
mag.org1192RESEAR
CHARTICLES
on May 30, 2008 www.sciencemag.org Downloaded from
Tofullyspecify
amodelwithin
thiscom-
putationalmode
lingframework
,onemustfirst
defineasetof
intermediatese
manticfeatures
f 1(w)f 2(w)…f n(w)tobeextrac
tedfromthetext
corpus.Inthisp
aper,eachinter
mediatesemanti
cfeature
isdefinedinte
rmsoftheco-o
ccurrence
statisticsofthe
inputstimulus
wordwwitha
particularotherw
ord(e.g.,“taste”
)orsetofword
s(e.g.,“ta
ste,”“tastes,”or
“tasted”)within
thetextcorpus.
Themodelistra
inedbytheappl
icationof
multipleregressio
ntothesefeatu
resf i(w)andthe
observedfMRI
images,soasto
obtainmaximum
-likelihoo
destimatesfor
themodelparam
etersc vi
(26).Oncetraine
d,thecomputatio
nalmodelcanbe
evaluatedbygiv
ingitwordsou
tsidethetrainin
gsetand
comparingitsp
redictedfMRIim
agesforthesew
ordswithobser
vedfMRIdata.
Thiscomputatio
nalmodeling
frameworkis
basedontwok
eytheoreticalas
sumptions.First
,itassumes
thesemanticfea
turesthatdisting
uishthemeaning
sofarbitraryco
ncretenounsare
reflected
inthestatisticso
ftheirusewithi
naverylargetex
tcorpus.
Thisassumption
isdrawnfromthe
fieldofcomputa
tionallinguistic
s,wherestatist
icalword
distributionsare
frequentlyused
toapproximate
themeaningof
documentsand
words(14–17).
Second,itassum
esthatthebrain
activityobserve
dwhenth
inkingaboutan
yconcretenou
ncanbe
derivedasawe
ightedlinearsu
mofcontributio
nsfromea
chofitsseman
ticfeatures.Alth
oughthe
correctnessofth
islinearityassu
mptionisdebat
-able,it
isconsistentw
iththewidesprea
duseof
linearmodelsin
fMRIanalysis(2
7)andwiththe
assumptiontha
tfMRIactivatio
noftenreflects
alinearsu
perpositionofc
ontributionsfrom
differentsources.
Ourtheoreticalfr
ameworkdoesn
ottakeaposition
onwhetherthen
euralactivation
encoding
meaningisloc
alizedinpartic
ularcorticalre
-gions.I
nstead,itconsid
ersallcorticalv
oxelsand
allowsthetrain
ingdatatodeterm
inewhichloca-
tionsaresystem
aticallymodulat
edbywhichas-
pectsofwordm
eanings.
Results.Weevaluate
dthiscomputatio
nalmod-
elusingfMRId
atafromninehea
lthy,college-age
participantswho
viewed60diffe
rentword-pictur
epairspr
esentedsixtime
seach.Anatom
icallyde-
finedregionsofi
nterestwereauto
maticallylabele
daccordin
gtothemethodo
logyin(28).Th
e60ran-
domlyordered
stimuliincluded
fiveitemsfrom
eachof12sema
nticcategories(a
nimals,bodypart
s,building
s,buildingparts
,clothing,furnit
ure,insects,
kitchenitems,to
ols,vegetables,
vehicles,andoth
erman-ma
deitems).Arepr
esentativefMRI
imagefor
eachstimuluswa
screatedbycomp
utingthemean
fMRIresponse
overitssixpres
entations,andth
emeanof
all60oftheser
epresentativeim
ageswas
thensubtractedf
romeach[ford
etails,see(26)]
.Toinsta
ntiateourmodeli
ngframework,w
efirstchosea
setofintermedia
tesemanticfeatu
res.Tobe
effective,theint
ermediatesema
nticfeaturesmu
stsimultan
eouslyencodeth
ewidevarietyof
semanticcontent
oftheinputstim
uluswordsand
factorthe
observedfMRIa
ctivationintomor
eprimitivecom-
Predicted
“celery” = 0.84
“celery”
“airplane”
Predicted:
Observed:
AB
+...
high
average below averag
e
Predicted “cele
ry”:+ 0.35+ 0.32
“eat”“taste”
“fill”
Fig.2.Predictin
gfMRIimages
forgivenstimu
luswords.(A)
Formingapredic
tionforpar-
ticipantP1for
thestimulus
word“celery”aft
ertrainingon
58otherwords.L
earnedcvico-
efficientsfor3of
the25se-
manticfeatures
(“eat,”“taste,”
and“fill”)ared
epictedbythe
voxelcolorsinth
ethreeimages
atthetopofthe
panel.Theco-
occurrencevalue
foreachofthese
featuresforthes
timulusword“ce
lery”isshownto
theleftoftheirre
spectiveimages[
e.g.,thevaluefor
“eat(celery)”is
0.84].Thepredict
edactivationfor
thestimulusword
[shownatthebo
ttomof(A)]isa
linearcombinatio
nofthe25sema
nticfMRIsignatu
res,weightedby
theirco-occurren
cevalues.Thisf
igureshowsjust
onehorizontals
lice[z=
–12mminMont
realNeurologica
lInstitute(MNI)
space]ofthepr
edictedthree-dim
ensionalimage.
(B)Predictedan
dobservedfMR
Iimagesfor
“celery”and“air
plane”aftertrain
ingthatuses58
otherwords.The
twolongredand
blueverticalstre
aksnearthetop
(posteriorregion
)ofthepredicte
dandobs
ervedimagesare
theleftandrigh
tfusiformgyri.
A B
C
Mean overparticipants
Participant P5
Fig.3.Location
sofmostac
curatelypre-
dictedvoxels.S
urface(A)and
glassbrain(B)
renderingofthe
correla-tionbet
weenpredicted
andactualvoxel
activa-tionsfor
wordsoutside
thetrainingsetf
orpar-ticipantP
5.Thesepanelssh
owclusterscontai
ningatleast10c
ontiguousvoxels,
eachofwhose
predicted-actualc
orrelationisatleas
t0.28.Thesevox
elclustersaredis
tributedthrougho
utthecortexan
dlocatedinthele
ftandrightoccip
italandparietal
lobes;leftandri
ghtfusiform,
postcentral,andm
iddlefrontalgyri;
leftinferiorfronta
lgyrus;medialfron
talgyrus;andant
eriorcingulate
.(C)Surfaceren
deringofthepr
edicted-actualcor
relationaveraged
overallnine
participants.This
panelrepresents
clusterscontaining
atleast10contig
uousvoxels,each
withaverage
correlationofatle
ast0.14.
30MAY2008
VOL320SCI
ENCEwww.sc
iencemag.org
1192RESEARCHART
ICLES
on May 30, 2008 www.sciencemag.org Downloaded from
…
airplane
dog
noun
s
questions
voxels Amazon'16 116 (c) 2016, C. Faloutsos
Patterns?
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
To fully specify a model within this com-putational modeling framework, one must firstdefine a set of intermediate semantic featuresf1(w) f2(w)…fn(w) to be extracted from the textcorpus. In this paper, each intermediate semanticfeature is defined in terms of the co-occurrencestatistics of the input stimulus word w with aparticular other word (e.g., “taste”) or set of words(e.g., “taste,” “tastes,” or “tasted”) within the textcorpus. The model is trained by the application ofmultiple regression to these features fi(w) and theobserved fMRI images, so as to obtain maximum-likelihood estimates for the model parameters cvi(26). Once trained, the computational model can beevaluated by giving it words outside the trainingset and comparing its predicted fMRI images forthese words with observed fMRI data.
This computational modeling framework isbased on two key theoretical assumptions. First, itassumes the semantic features that distinguish themeanings of arbitrary concrete nouns are reflected
in the statistics of their use within a very large textcorpus. This assumption is drawn from the field ofcomputational linguistics, where statistical worddistributions are frequently used to approximatethe meaning of documents and words (14–17).Second, it assumes that the brain activity observedwhen thinking about any concrete noun can bederived as a weighted linear sum of contributionsfrom each of its semantic features. Although thecorrectness of this linearity assumption is debat-able, it is consistent with the widespread use oflinear models in fMRI analysis (27) and with theassumption that fMRI activation often reflects alinear superposition of contributions from differentsources. Our theoretical framework does not take aposition on whether the neural activation encodingmeaning is localized in particular cortical re-gions. Instead, it considers all cortical voxels andallows the training data to determine which loca-tions are systematically modulated by which as-pects of word meanings.
Results. We evaluated this computational mod-el using fMRI data from nine healthy, college-ageparticipants who viewed 60 different word-picturepairs presented six times each. Anatomically de-fined regions of interest were automatically labeledaccording to the methodology in (28). The 60 ran-domly ordered stimuli included five items fromeach of 12 semantic categories (animals, body parts,buildings, building parts, clothing, furniture, insects,kitchen items, tools, vegetables, vehicles, and otherman-made items). A representative fMRI image foreach stimulus was created by computing the meanfMRI response over its six presentations, and themean of all 60 of these representative images wasthen subtracted from each [for details, see (26)].
To instantiate our modeling framework, we firstchose a set of intermediate semantic features. To beeffective, the intermediate semantic features mustsimultaneously encode thewide variety of semanticcontent of the input stimulus words and factor theobserved fMRI activation intomore primitive com-
Predicted“celery” = 0.84
“celery” “airplane”
Predicted:
Observed:
A B
+.. .
high
average
belowaverage
Predicted “celery”:
+ 0.35 + 0.32
“eat” “taste” “fill”
Fig. 2. Predicting fMRI imagesfor given stimulus words. (A)Forming a prediction for par-ticipant P1 for the stimulusword “celery” after training on58 other words. Learned cvi co-efficients for 3 of the 25 se-mantic features (“eat,” “taste,”and “fill”) are depicted by thevoxel colors in the three imagesat the top of the panel. The co-occurrence value for each of these features for the stimulus word “celery” isshown to the left of their respective images [e.g., the value for “eat (celery)” is0.84]. The predicted activation for the stimulus word [shown at the bottom of(A)] is a linear combination of the 25 semantic fMRI signatures, weighted bytheir co-occurrence values. This figure shows just one horizontal slice [z =
–12 mm in Montreal Neurological Institute (MNI) space] of the predictedthree-dimensional image. (B) Predicted and observed fMRI images for“celery” and “airplane” after training that uses 58 other words. The two longred and blue vertical streaks near the top (posterior region) of the predictedand observed images are the left and right fusiform gyri.
A
B
C
Mean over
participants
Participant P
5
Fig. 3. Locations ofmost accurately pre-dicted voxels. Surface(A) and glass brain (B)rendering of the correla-tion between predictedand actual voxel activa-tions for words outsidethe training set for par-
ticipant P5. These panels show clusters containing at least 10 contiguous voxels, each of whosepredicted-actual correlation is at least 0.28. These voxel clusters are distributed throughout thecortex and located in the left and right occipital and parietal lobes; left and right fusiform,postcentral, and middle frontal gyri; left inferior frontal gyrus; medial frontal gyrus; and anteriorcingulate. (C) Surface rendering of the predicted-actual correlation averaged over all nineparticipants. This panel represents clusters containing at least 10 contiguous voxels, each withaverage correlation of at least 0.14.
30 MAY 2008 VOL 320 SCIENCE www.sciencemag.org1192
RESEARCH ARTICLES
on
May
30,
200
8 w
ww
.sci
ence
mag
.org
Dow
nloa
ded
from
SKIP
CMU SCS Neuro-semantics
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1
Group 2 Group 4
Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?bear does it grow?cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very di↵erent from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v
1
and v
2
which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
kv1 � v̂1k2 + kv2 � v̂2k2 < kv1 � v̂2k2 + kv2 � v̂1k2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watch
we obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/
~
epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-
SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
117 Amazon'16 (c) 2016, C. Faloutsos
= SKIP
CMU SCS Neuro-semantics
118 Amazon'16 (c) 2016, C. Faloutsos
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1
Group 2 Group 4
Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?bear does it grow?cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very di↵erent from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v
1
and v
2
which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
kv1 � v̂1k2 + kv2 � v̂2k2 < kv1 � v̂2k2 + kv2 � v̂1k2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watch
we obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/
~
epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-
SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
Small items -> Premotor cortex
= SKIP
CMU SCS Neuro-semantics
119 Amazon'16 (c) 2016, C. Faloutsos
50 100 150 200 250
50
100
150
200
250
3000
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
50 100 150 200 250
50
100
150
200
250
3000
0.01
0.02
0.03
0.04
0.05
Premotor Cortex
50 100 150 200 250
50
100
150
200
250
300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Group1
Group 2 Group 4
Group 3
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
beetle can it cause you pain?pants do you see it daily?bee is it conscious?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?
Nouns Questions Nouns Questions
Nouns Questions
beetle can it cause you pain?bear does it grow?cow is it alive?coat was it ever alive?
bear does it grow?cow is it alive?coat was it ever alive?
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
glass can you pick it up?tomato can you hold it in one hand?bell is it smaller than a golfball?’
bed does it use electricity?house can you sit on it?car does it cast a shadow?
bed does it use electricity?house can you sit on it?car does it cast a shadow?
Figure 4: Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negativelyand positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that canbe held in one hand,such as a tomato or a glass, and the activation pattern is very di↵erent from the one ofGroup 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in thepremotor cortex which is associated with the concepts of that group.
v
1
and v
2
which were withheld from the training data,the leave-two-out scheme measures prediction accuracyby the ability to choose which of the observed brainimages corresponds to which of the two words. Aftermean-centering the vectors, this classification decisionis made according to the following rule:
kv1 � v̂1k2 + kv2 � v̂2k2 < kv1 � v̂2k2 + kv2 � v̂1k2
Although our approach is not designed to make predic-tions, preliminary results are very encouraging: Usingonly F=2 components, for the noun pair closet/watch
we obtained mean accuracy of about 0.82 for 5 out of the9 human subjects. Similarly, for the pair knife/beetle,we achieved accuracy of about 0.8 for a somewhat dif-ferent group of 5 subjects. For the rest of the humansubjects, the accuracy is considerably lower, however, itmay be the case that brain activity predictability variesbetween subjects, a fact that requires further investiga-tion.
5 Experiments
We implemented Turbo-SMT in Matlab. Our imple-mentation of the code is publicly available.5 For the par-allelization of the algorithm, we used Matlab’s ParallelComputing Toolbox. For tensor manipulation, we used
5http://www.cs.cmu.edu/
~
epapalex/src/turbo_smt.zip
the Tensor Toolbox for Matlab [7] which is optimizedespecially for sparse tensors (but works very well fordense ones too). We use the ALS and the CMTF-OPT[5] algorithms as baselines, i.e. we compare Turbo-
SMT when using one of those algorithms as their coreCMTF implementation, against the plain execution ofthose algorithms. We implemented our version of theALS algorithm, and we used the CMTF Toobox6 im-plementation of CMTF-OPT. We use CMTF-OPT forhigher ranks, since that particular algorithm is moreaccurate than ALS, and is the state of the art. All ex-periments were carried out on a machine with 4 IntelXeon E74850 2.00GHz, and 512Gb of RAM. Wheneverwe conducted multiple iterations of an experiment (dueto the randomized nature of Turbo-SMT), we reporterror-bars along the plots. For all the following experi-ments we used either portions of the BrainQ dataset,or the whole dataset.
5.1 Speedup As we have already discussed in the In-troduction and shown in Fig. 1, Turbo-SMT achievesa speedup of 50-200 on the BrainQ dataset; For allcases, the approximation cost is either same as the base-lines, or is larger by a small factor, indicating thatTurbo-SMT is both fast and accurate. Key facts that
6http://www.models.life.ku.dk/joda/CMTF_Toolbox
Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014
Small items -> Premotor cortex
SKIP
CMU SCS
Part 2: Conclusions
• Time-evolving / heterogeneous graphs -> tensors
• PARAFAC finds patterns • (GigaTensor/HaTen2 -> fast & scalable)
Amazon'16 120 (c) 2016, C. Faloutsos
=
CMU SCS
(c) 2016, C. Faloutsos 121
Roadmap
• Introduction – Motivation – Why study (big) graphs?
• Part#1: Patterns in graphs • Part#2: time-evolving graphs; tensors • Acknowledgements and Conclusions
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 122
Thanks
Amazon'16
Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
CMU SCS
(c) 2016, C. Faloutsos 124
Cast
Akoglu, Leman
Chau, Polo
Kang, U
Hooi, Bryan
Amazon'16
Koutra, Danai
Beutel, Alex
Papalexakis, Vagelis
Shah, Neil
Araujo, Miguel
Song, Hyun Ah
Eswaran, Dhivya
Shin, Kijung
CMU SCS
(c) 2016, C. Faloutsos 125
CONCLUSION#1 – Big data • Patterns Anomalies
• Large datasets reveal patterns/outliers that are invisible otherwise
Amazon'16
CMU SCS
(c) 2016, C. Faloutsos 126
CONCLUSION#2 – tensors
• powerful tool
Amazon'16
=
1 caller 5 receivers 4 days of activity
CMU SCS
(c) 2016, C. Faloutsos 127
References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/
S00449ED1V01Y201209DMK006
Amazon'16
CMU SCS
TAKE HOME MESSAGE: Cross-disciplinarity
Amazon'16 (c) 2016, C. Faloutsos 128
=
CMU SCS
My goals wrt sabbatical @ amzn • Win-win setting: • For Amazon: state-of-the-art methods for
anomaly detection • For my students: focus on real-world
research problems
Amazon'16 (c) 2016, C. Faloutsos 129
CMU SCS
My goals wrt sabbatical @ amzn • Win-win setting: • For Amazon: state-of-the-art methods for
anomaly detection • For my students: focus on real-world
research problems
Amazon'16 (c) 2016, C. Faloutsos 130