104
Predictive Social Network Analysis with Multi-Modal Data Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Wei Li, Xuerui Wang, Andres Corrada, Chris Pal, Natasha Mohanty, David Mimno, Gideon Mann.

Predictive Social Network Analysis with Multi-Modal Data

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Predictive Social Network Analysis with Multi-Modal Data

PredictiveSocial Network Analysis

with Multi-Modal Data

Andrew McCallum

Computer Science DepartmentUniversity of Massachusetts Amherst

Joint work with Wei Li, Xuerui Wang, Andres Corrada,Chris Pal, Natasha Mohanty, David Mimno, Gideon Mann.

Page 2: Predictive Social Network Analysis with Multi-Modal Data

Social Network in an Email Dataset

Page 3: Predictive Social Network Analysis with Multi-Modal Data

Social Network in an Email Dataset

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that iscompleted either travel or other things, so please sign thatfirst folder anyway. Then, here is the reminder of the thingsI'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 4: Predictive Social Network Analysis with Multi-Modal Data

Social Network Analysis(Pattern Discovery in Networks)

Network Connectivity Network Connectivity + Many Attributes

Descriptive Predictive & Prescriptive

Text, Timestamps, Authors,...

Generative, Latent Variable Models

Direct Measures, Agglomerative & Spectral Clustering,...

P(edge|...) P(attribute|...), P(group|...)P(collaborator|...)

Small World, Betweeness Centrality,...

Data:

Objective:

Methodology:

Page 5: Predictive Social Network Analysis with Multi-Modal Data

A Probabilistic Approach

• Define a probabilistic generativemodel for documents.

• Learn the parameters of thismodel by fitting them to the dataand a prior.

Page 6: Predictive Social Network Analysis with Multi-Modal Data

Clustering words into topics withLatent Dirichlet Allocation

[Blei, Ng, Jordan 2003]

Sample a distributionover topics, θ

For each document:

Sample a topic, z

For each word in doc

Sample a wordfrom the topic, w

Example:

70% Iraq war30% US election

Iraq war

“bombing”

GenerativeProcess:

Page 7: Predictive Social Network Analysis with Multi-Modal Data

STORYSTORIESTELL

CHARACTERCHARACTERSAUTHORREADTOLDSETTINGTALESPLOT

TELLINGSHORTFICTIONACTIONTRUEEVENTSTELLSTALENOVEL

MINDWORLDDREAMDREAMSTHOUGHT

IMAGINATIONMOMENTTHOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEASWIM

SWIMMINGPOOLLIKESHELLSHARKTANKSHELLSSHARKSDIVINGDOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSESINFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSINGSMALLPOXBODY

INFECTIONSCERTAIN

Example topicsinduced from a large collection of text

FIELDMAGNETICMAGNETWIRENEEDLECURRENTCOILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTIONFORCE

MAGNETSBE

MAGNETISMPOLE

INDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFICKNOWLEDGE

WORKRESEARCHCHEMISTRYTECHNOLOGY

MANYMATHEMATICSBIOLOGYFIELDPHYSICS

LABORATORYSTUDIESWORLDSCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERSPLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYINGHIT

TENNISTEAMSGAMESSPORTSBATTERRY

JOBWORKJOBS

CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIESWORKINGTRAININGSKILLSCAREERSPOSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

[Tennenbaum et al]

Page 8: Predictive Social Network Analysis with Multi-Modal Data

STORYSTORIESTELL

CHARACTERCHARACTERSAUTHORREADTOLDSETTINGTALESPLOT

TELLINGSHORTFICTIONACTIONTRUEEVENTSTELLSTALENOVEL

MINDWORLDDREAMDREAMSTHOUGHT

IMAGINATIONMOMENTTHOUGHTSOWNREALLIFE

IMAGINESENSE

CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE

WATERFISHSEASWIM

SWIMMINGPOOLLIKESHELLSHARKTANKSHELLSSHARKSDIVINGDOLPHINSSWAMLONGSEALDIVE

DOLPHINUNDERWATER

DISEASEBACTERIADISEASESGERMSFEVERCAUSECAUSEDSPREADVIRUSESINFECTIONVIRUS

MICROORGANISMSPERSON

INFECTIOUSCOMMONCAUSINGSMALLPOXBODY

INFECTIONSCERTAIN

FIELDMAGNETICMAGNETWIRENEEDLECURRENTCOILPOLESIRON

COMPASSLINESCORE

ELECTRICDIRECTIONFORCE

MAGNETSBE

MAGNETISMPOLE

INDUCED

SCIENCESTUDY

SCIENTISTSSCIENTIFICKNOWLEDGE

WORKRESEARCHCHEMISTRYTECHNOLOGY

MANYMATHEMATICSBIOLOGYFIELDPHYSICS

LABORATORYSTUDIESWORLDSCIENTISTSTUDYINGSCIENCES

BALLGAMETEAM

FOOTBALLBASEBALLPLAYERSPLAYFIELDPLAYER

BASKETBALLCOACHPLAYEDPLAYINGHIT

TENNISTEAMSGAMESSPORTSBATTERRY

JOBWORKJOBS

CAREEREXPERIENCEEMPLOYMENTOPPORTUNITIESWORKINGTRAININGSKILLSCAREERSPOSITIONSFIND

POSITIONFIELD

OCCUPATIONSREQUIRE

OPPORTUNITYEARNABLE

Example topicsinduced from a large collection of text

[Tennenbaum et al]

Page 9: Predictive Social Network Analysis with Multi-Modal Data

Outline

• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 10: Predictive Social Network Analysis with Multi-Modal Data

From LDA to Author-Recipient-Topic(ART)

Page 11: Predictive Social Network Analysis with Multi-Modal Data

Inference and Estimation

Gibbs Sampling:- Easy to implement- Reasonably fast

r r

Page 12: Predictive Social Network Analysis with Multi-Modal Data

Enron Email Corpus

• 250k email messages• 23k people

Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001

Please see below. Katalin Kiss of TransAlta has requested anelectronic copy of our final draft? Are you OK with this? Ifso, the only version I have is the original draft withoutrevisions.

DP

Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]

Page 13: Predictive Social Network Analysis with Multi-Modal Data

Topics, and prominent senders / receiversdiscovered by ARTTopic names,

by hand

Page 14: Predictive Social Network Analysis with Multi-Modal Data

Topics, and prominent senders / receiversdiscovered by ART

Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice President of Regulatory Affairs”Steffes = “Vice President of Government Affairs”

Page 15: Predictive Social Network Analysis with Multi-Modal Data

Comparing Role Discovery

connection strength (A,B) =

distribution overauthored topics

Traditional SNA

distribution overrecipients

distribution overauthored topics

Author-TopicART

Page 16: Predictive Social Network Analysis with Multi-Modal Data

Comparing Role Discovery Tracy Geaconne ⇔ Dan McCarty

Traditional SNA Author-TopicART

Similar roles Different rolesDifferent roles

Geaconne = “Secretary”McCarty = “Vice President”

Page 17: Predictive Social Network Analysis with Multi-Modal Data

Traditional SNA Author-TopicART

Different roles Very differentVery similar

Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”

Comparing Role Discovery Lynn Blair ⇔ Kimberly Watson

Page 18: Predictive Social Network Analysis with Multi-Modal Data

McCallum Email Corpus 2004

• January - October 2004• 23k email messages• 825 people

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that iscompleted either travel or other things, so please sign thatfirst folder anyway. Then, here is the reminder of the thingsI'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 19: Predictive Social Network Analysis with Multi-Modal Data

McCallum Email Blockstructure

Page 20: Predictive Social Network Analysis with Multi-Modal Data

Four most prominent topicsin discussions with ____?

Page 21: Predictive Social Network Analysis with Multi-Modal Data
Page 22: Predictive Social Network Analysis with Multi-Modal Data

Two most prominent topicsin discussions with ____?

Words Prob

love 0.030514

house 0.015402

0.013659

time 0.012351

great 0.011334

hope 0.011043

dinner 0.00959

saturday 0.009154

left 0.009154

ll 0.009009

0.008282

visit 0.008137

evening 0.008137

stay 0.007847

bring 0.007701

weekend 0.007411

road 0.00712

sunday 0.006829

kids 0.006539

flight 0.006539

Words Prob

today 0.051152

tomorrow 0.045393

time 0.041289

ll 0.039145

meeting 0.033877

week 0.025484

talk 0.024626

meet 0.023279

morning 0.022789

monday 0.020767

back 0.019358

call 0.016418

free 0.015621

home 0.013967

won 0.013783

day 0.01311

hope 0.012987

leave 0.012987

office 0.012742

tuesday 0.012558

Topic 1 Topic 2

Page 23: Predictive Social Network Analysis with Multi-Modal Data

Role-Author-Recipient-Topic Models

Page 24: Predictive Social Network Analysis with Multi-Modal Data

Results with RART:People in “Role #3” in Academic Email

• olc lead Linux sysadmin• gauthier sysadmin for CIIR group• irsystem mailing list CIIR sysadmins• system mailing list for dept. sysadmins• allan Prof., chair of “computing committee”• valerie second Linux sysadmin• tech mailing list for dept. hardware• steve head of dept. I.T. support

Page 25: Predictive Social Network Analysis with Multi-Modal Data

Roles for allan (James Allan)

• Role #3 I.T. support• Role #2 Natural Language researcher

Roles for pereira (Fernando Pereira)

•Role #2 Natural Language researcher•Role #4 SRI CALO project participant•Role #6 Grant proposal writer•Role #10 Grant proposal coordinator•Role #8 Guests at McCallum’s house

Page 26: Predictive Social Network Analysis with Multi-Modal Data

Traditional SNA Author-TopicART

Block structured NotNot

ART: Roles but not Groups

Enron TransWestern Division

Page 27: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 28: Predictive Social Network Analysis with Multi-Modal Data

Groups and Topics

• Input:– Observed relations between people– Attributes on those relations (text, or categorical)

• Output:– Attributes clustered into “topics”– Groups of people---varying depending on topic

Page 29: Predictive Social Network Analysis with Multi-Modal Data

Adjacency Matrix Representing Relations

FEDCBA

FEDCBAFEDCBA

G3G3G2G1G2G1

G3G3G2G1G2G1

FEDCBA

FEDBCA

G3G3G2G2G1G1

G3G3G2G2G1G1

FEDBCA

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Page 30: Predictive Social Network Analysis with Multi-Modal Data

Group Model:Partitioning Entities into Groups

2S

v

!

2G

! ! !

Stochastic Blockstructures for Relations[Nowicki, Snijders 2001]

S: number of entities

G: number of groups

Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004]

BetaDirichlet

Binomial

S

gMultinomial

Page 31: Predictive Social Network Analysis with Multi-Modal Data

Two Relations with Different Attributes

FEDBCA

G3G3G2G2G1G1

G3G3G2G2G1G1FDBECA

G2G2G2G1G1G1

G2G2G2G1G1G1

FDBECA

Student Roster

AdamsBennettCarterDavisEdwardsFrederking

Academic Admiration

Acad(A, B) Acad(C, B)Acad(A, D) Acad(C, D)Acad(B, E) Acad(D, E)Acad(B, F) Acad(D, F)Acad(E, A) Acad(F, A)Acad(E, C) Acad(F, C)

Social Admiration

Soci(A, B) Soci(A, D) Soci(A, F)Soci(B, A) Soci(B, C) Soci(B, E)Soci(C, B) Soci(C, D) Soci(C, F)Soci(D, A) Soci(D, C) Soci(D, E)Soci(E, B) Soci(E, D) Soci(E, F)Soci(F, A) Soci(F, C) Soci(F, E)

FEDBCA

Page 32: Predictive Social Network Analysis with Multi-Modal Data

The Group-Topic Model:Discovering Groups and Topics Simultaneously

bN

w

t

B

T

!

!

DirichletMultinomial

Uniform

2S

v

!

2G

! ! !

BetaDirichlet

Binomial

S

gMultinomial

T

Page 33: Predictive Social Network Analysis with Multi-Modal Data

Dataset #1:U.S. Senate

• 16 years of voting records in the US Senate (1989 – 2005)

• a Senator may respond Yea or Nay to a resolution

• 3423 resolutions with text attributes (index terms)

• 191 Senators in total across 16 yearsS.543Title: An Act to reform Federal deposit insurance, protect the deposit insurancefunds, recapitalize the Bank Insurance Fund, improve supervision and regulationof insured depository institutions, and for other purposes.Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2)Latest Major Action: 12/19/1991 Became Public Law No: 102-242.Index terms: Banks and banking Accounting Administrative fees Cost controlCredit Deposit insurance Depressed areas and other 110 terms

Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), YeaBond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……

Page 34: Predictive Social Network Analysis with Multi-Modal Data

Topics Discovered (U.S. Senate)

carepolicypollutionpreventionemployeelawresearchelementarybusinessaidpetrolstudentstaxcongressgasdrugaidtaxnuclearchildren

insuranceforeignwateraidlabormilitarypowerschoolfederalgovernmentenergyeducation

EconomicMilitaryMisc.EnergyEducation

Mixture of Unigrams

Group-Topic Model

assistancebusinessdiseasesresearchdisabilitywagecommunicableenergymedicareminimumdrugstaxcareincomecongressgovernmentmedicalcongresstariffaidinsurancetaxchemicalsfederalsecurityinsurancetradeschoolsociallaborforeigneducation

Social Security+ Medicare

EconomicForeignEducation+ Domestic

Page 35: Predictive Social Network Analysis with Multi-Modal Data

Groups Discovered (US Senate)

Groups from topic Education + Domestic

Page 36: Predictive Social Network Analysis with Multi-Modal Data

Senators Who Change Coalition the mostDependent on Topic

e.g. Senator Shelby (D-AL) votes with the Republicans on Economicwith the Democrats on Education + Domesticwith a small group of maverick Republicans on Social Security + Medicaid

Page 37: Predictive Social Network Analysis with Multi-Modal Data

Dataset #2:The UN General Assembly

• Voting records of the UN General Assembly (1990 - 2003)

• A country may choose to vote Yes, No or Abstain

• 931 resolutions with text attributes (titles)

• 192 countries in total

• Also experiments later with resolutions from 1960-2003

Vote on Permanent Sovereignty of Palestinian People, 87th plenary meeting

The draft resolution on permanent sovereignty of the Palestinian people in theoccupied Palestinian territory, including Jerusalem, and of the Arab population inthe occupied Syrian Golan over their natural resources (document A/54/591)was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions:

In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France,Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama,Russian Federation, South Africa, Spain, Turkey, and other 126 countries.Against: Israel, Marshall Islands, United States.Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia.

Page 38: Predictive Social Network Analysis with Multi-Modal Data

Topics Discovered (UN)

callsisraelcountriessecuritysituationimplementationsyriapalestineuseisraelhumanweaponsoccupiedrightsnuclear

Securityin Middle East

Human RightsEverythingNuclear

Mixture ofUnigrams

Group-TopicModel

israelspacenationsoccupiedraceweaponspalestinepreventionunitedhumanarmsstatesrightsnuclearnuclear

Human RightsNuclear ArmsRace

NuclearNon-proliferation

Page 39: Predictive Social Network Analysis with Multi-Modal Data

GroupsDiscovered(UN)The countries list for eachgroup are ordered by their2005 GDP (PPP) and only 5countries are shown ingroups that have more than5 members.

Page 40: Predictive Social Network Analysis with Multi-Modal Data

Groups and Topics, Trends over Time (UN)

Page 41: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Path Analysis (Topical Sequence Model, TSM)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 42: Predictive Social Network Analysis with Multi-Modal Data

Groups and Topics, Trends over Time (UN)

Page 43: Predictive Social Network Analysis with Multi-Modal Data

Want to Model Trends over Time

• Pattern appears only briefly– Capture its statistics in focused way– Don’t confuse it with patterns elsewhere in time

• Is prevalence of topic growing or waning?

• How do roles, groups, influence shift over time?

Page 44: Predictive Social Network Analysis with Multi-Modal Data

Topics over Time (TOT)

θ

w t

α

Nd

z

D

φ

TBetaover time

Multinomialover words

β γ

Dirichlet

multinomialover topics

topicindex

wordtime

stamp

Dirichletprior

Uniformprior

[Wang, McCallum, KDD 2006]

Page 45: Predictive Social Network Analysis with Multi-Modal Data

State of the Union Address208 Addresses delivered between January 8, 1790 and January 29, 2002.To increase the number of documents, we split the addresses into paragraphsand treated them as ‘documents’. One-line paragraphs were excluded.Stopping was applied.

• 17156 ‘documents’

• 21534 words

• 669,425 tokens

Our scheme of taxation, by means of which this needless surplus is takenfrom the people and put into the public Treasury, consists of a tariff orduty levied upon importations from abroad and internal-revenue taxes leviedupon the consumption of tobacco and spirituous and malt liquors. It must beconceded that none of the things subjected to internal-revenue taxationare, strictly speaking, necessaries. There appears to be no just complaintof this taxation by the consumers of these articles, and there seems to benothing so well able to bear the burden without hardship to any portion ofthe people.

1910

Page 46: Predictive Social Network Analysis with Multi-Modal Data

Comparing

TOT

against

LDA

Page 47: Predictive Social Network Analysis with Multi-Modal Data

TOT

versus

LDA

on myemail

Page 48: Predictive Social Network Analysis with Multi-Modal Data

Topic Distributions Conditioned on Time

time

topi

c m

ass

(in v

ertic

al h

eigh

t)in N

IPS conference papers

Page 49: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 50: Predictive Social Network Analysis with Multi-Modal Data

How do new links form in social networks?

1) Randomly (Poisson graph)2) Pick someone popular (Preferential attachment)3) Pick someone with mutual friends

(Adamic & Adar, Liben-Nowell & Kleinberg)

4) Pick someone from one of your “communities”(Mimno, Wallach & McCallum 2007)Can we find communities that help predict links?

Page 51: Predictive Social Network Analysis with Multi-Modal Data

A Community-based Generative Model forText and Co-authorships

1) To generate a document,we first pick a community.

2) The community thendetermines the choice ofauthors and topics.

3) From topics, we pickwords.

Community

Authors

Topics

Words

Page 52: Predictive Social Network Analysis with Multi-Modal Data

A Community-based Generative Model forText and Co-authorships

Graphical Model can answervarious queries!

P(author3 | author1, author2)P(author3 | author1, author2, text)

P(community | authors)P (authors | community)P (text | community)P (text | authors)

Community

Authors

Topics

Words

Page 53: Predictive Social Network Analysis with Multi-Modal Data

(Preferential attachment is much worse, at -40,121.)

Link PredictionProbability of NIPS 2004-6 Co-authorships

Page 54: Predictive Social Network Analysis with Multi-Modal Data

Community-Author Viewfeatures, feature, markov, sequence, models, conditional, label, function, setnumber, results, paper, based, function, previous, resulting, introduction, generalpolicy, learning, action, states, function, reward, actions, optimal, mdpcontrol, controller, model, helicopter, system, neural, forward, learning, systemsmodel, models, press, shows, figure, related, journal, underlying, correspondpresent, effect, figure, references, important, increase, similar, addition, increasedlearning, control, reinforcement, sutton, action, space, task, trajectory, methods

Ng_AKoller_DParr_RAbbeel_PJordan_MMerzenich_MMel_B

propagation, belief, tree, nodes, node, approximation, variational, networks, boundnumber, results, paper, based, function, previous, resulting, introduction, generaltheorem, case, proof, function, assume, set, section, algorithm, boundfield, boltzmann, approximations, exact, jordan, parameters, set, step, networklog, models, inference, variables, model, distribution, variational, parameters, matrixproblem, algorithm, optimization, methods, solution, method, problems, proposed, optimalclustering, spectral, graph, matrix, cut, data, clusters, eigenvectors, normalized

Jordan_MJaakkola_TSaul_LBach_F_RSingh_SWainwright_MNguyen_X

Page 55: Predictive Social Network Analysis with Multi-Modal Data

Community-Author-Topic Viewwords, model, word, documents, document, text, topic, distribution, mixturesuffix, algorithm, feature, adaptor, space, model, kernels, strings, naturallearning, category, naive, definition, estimation, single, figure, applied, obtainset, labels, analysis, adclus, pmm, function, evaluation, problem, alphabetnumber, results, paper, based, function, previous, resulting, introduction, generalprior, posterior, distribution, bayesian, likelihood, data, models, probability, modeltarget, task, visual, figure, contrast, attention, search, orientation, discrimination

Griffiths_T_LSinger_YBlei_DGoldwater_SJordan_MJohnson_MCampbell_W

propagation, belief, tree, nodes, node, approximation, variational, networks, boundfield, boltzmann, approximations, exact, jordan, parameters, set, step, networklog, models, inference, variables, model, distribution, variational, parameters, matrixnetwork, variables, node, inference, distribution, nodes, algorithm, message, treenumber, results, paper, based, function, previous, resulting, introduction, generaltheorem, case, proof, function, assume, set, section, algorithm, boundmixture, data, gaussian, density, likelihood, parameters, distribution, model, function

Jordan_MWillsky_AJaakkola_TSaul_LWiegerinck_WKappen_HWainwright_M

control, motor, learning, arm, model, movement, feedback, movements, handeye, vor, visual, desired, field, controller, force, cerebellum, vestibularneural, data, activity, figure, firing, movement, motor, speech, dynamicspresent, effect, figure, references, important, increase, similar, addition, increasedfinger, data, learning, shift, rbfs, pattern, manipulated, scaling, modulesvisual, corrective, performance, generalization, neural, figure, neurons, network, learningmodel, models, press, shows, figure, related, journal, underlying, correspond

Kawato_MJordan_MBarto_AVatikiotisShadmehr_RHirayama_MWolpert_D

Page 56: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 57: Predictive Social Network Analysis with Multi-Modal Data

Want a “topic model” withthe advantages of

Conditional Random Fields• Use arbitrary, overlapping features of the input.

• Undirected graphical model, so we don’t have tothink about avoiding cycles.

• Integrate naturally with our other CRFcomponents.

• Train “discriminatively”

• Natural semi-supervised training

What does this mean?

Topic models are unsupervised!

Page 58: Predictive Social Network Analysis with Multi-Modal Data

“Multi-Conditional Mixtures”Latent Variable Models fit by Multi-way Conditional Probability

• For clustering structured data,ala Latent Dirichlet Allocation & its successors

• But an undirected model,like the Harmonium [Welling, Rosen-Zvi, Hinton, 2005]

• But trained by a “multi-conditional” objective: O = P(A|B,C) P(B|A,C) P(C|A,B)e.g. A,B,C are different modalities

[McCallum, Wang, Pal, 2005], [McCallum, Pal, Wang, 2006]

Page 59: Predictive Social Network Analysis with Multi-Modal Data

“Multi-Conditional Learning” (Regularization)[McCallum, Pal, Wang, 2006]

Page 60: Predictive Social Network Analysis with Multi-Modal Data

Multi-Conditional Mixtures

Page 61: Predictive Social Network Analysis with Multi-Modal Data

Predictive Random Fieldsmixture of Gaussians on synthetic data

Data, classify by color Generatively trained

Conditionally-trained [Jebara 1998]

Multi-Conditional

[McCallum, Wang, Pal, 2005]

Page 62: Predictive Social Network Analysis with Multi-Modal Data

Multi-Conditional Mixturesvs. Harmoniun

on document retrieval task

Harmonium, joint withwords, no labels

Harmonium, joint,with class labelsand words

Conditionally-trained,to predict class labels

Multi-Conditional,multi-way conditionallytrained

[McCallum, Wang, Pal, 2005]

Page 63: Predictive Social Network Analysis with Multi-Modal Data

Multiple Topics per Document:“Multinomial Components Analysis” (MCA)

tN

L

wN

w

1t

2w L

wNw

1w

ttNt

Plate Notation

Nw words

Nt topics

dN

• Undirected (Random Field) Topic Model• Conditionally log-Normal Topics• Conditionally Multinomial Words

dN

T

z

!

w

!

!!

wN

Contrastw/ LDA

Page 64: Predictive Social Network Analysis with Multi-Modal Data

Further Contrast - MCA, PCA, RAP

tN L

wN

w

1z

2x L

xNx

1x

t

zNz

Nx observed, Gaussianvariables, fixed dimension

Nb binary topics

dN

• Multinomial Component Analysis (MCA)• Principal Component Analysis (PCA)• Rate Adapting Poisson (RAP) Model

L1b

2v L

vNv

1v

bNb

Nv Poisson counts foreach word in vocabulary

Nz unobserved,Gaussian variables

MCA PCA RAP

Nw draws from a discretedistribution, (words in doc)

Page 65: Predictive Social Network Analysis with Multi-Modal Data

Our Model (MCA) vs. TFIDF vs. RAP

• Precision vs. Recall on 20 Newsgroups, 100 word vocabulary• 20 dimensional hidden topic space• Cosine Distance Comparisons (.9, .1 – Train, Test Split)• Compared with TFIDF and Rate Adapting Poisson (RAP) Model

MRR Method

.45 Our Model

.37 TFIDF

.33 RAP

Page 66: Predictive Social Network Analysis with Multi-Modal Data

NIPS Topics

Page 67: Predictive Social Network Analysis with Multi-Modal Data

Richer Model with Multiple Modalities

L

L

1t

aNa L

wNwy

1a

1w

Nw authors, year, Nw words

tNt

Page 68: Predictive Social Network Analysis with Multi-Modal Data

Predicting NIPS Authors

• Comparing Models, Mean Reciprocal Rank (MRR)• Cosine Distance Comparisons (.9, .1 – Train, Test Split)

MRR Method

.88 Discriminative

.46 Joint

.25 Joint, Words only

Page 69: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 70: Predictive Social Network Analysis with Multi-Modal Data

Social Networks in Research Literature

• Better understand structure of our ownresearch area.

• Structure helps us learn a new field.• Aid collaboration• Map how ideas travel through social networks

of researchers.

• Aids for hiring and finding reviewers!• Measure impact of papers or people.

Page 71: Predictive Social Network Analysis with Multi-Modal Data

Our Data

• Over 1.6 million research papers,gathered as part of Rexa.info portal.

• Cross linked references / citations.

Page 72: Predictive Social Network Analysis with Multi-Modal Data

Previous Systems

Page 73: Predictive Social Network Analysis with Multi-Modal Data
Page 74: Predictive Social Network Analysis with Multi-Modal Data

ResearchPaper

Cites

Previous Systems

Page 75: Predictive Social Network Analysis with Multi-Modal Data

ResearchPaper

Cites

Person

UniversityVenue

Grant

Groups

Expertise

More Entities and Relations

Page 76: Predictive Social Network Analysis with Multi-Modal Data
Page 77: Predictive Social Network Analysis with Multi-Modal Data
Page 78: Predictive Social Network Analysis with Multi-Modal Data
Page 79: Predictive Social Network Analysis with Multi-Modal Data
Page 80: Predictive Social Network Analysis with Multi-Modal Data
Page 81: Predictive Social Network Analysis with Multi-Modal Data
Page 82: Predictive Social Network Analysis with Multi-Modal Data
Page 83: Predictive Social Network Analysis with Multi-Modal Data
Page 84: Predictive Social Network Analysis with Multi-Modal Data
Page 85: Predictive Social Network Analysis with Multi-Modal Data
Page 86: Predictive Social Network Analysis with Multi-Modal Data
Page 87: Predictive Social Network Analysis with Multi-Modal Data
Page 88: Predictive Social Network Analysis with Multi-Modal Data
Page 89: Predictive Social Network Analysis with Multi-Modal Data

Topical TransferCitation counts from one topic to another.

Map “producers and consumers”

Page 90: Predictive Social Network Analysis with Multi-Modal Data

Topical Bibliometric Impact Measures

• Topical Citation Counts

• Topical Impact Factors

• Topical Longevity

• Topical Precedence

• Topical Diversity

• Topical Transfer

[Mann, Mimno, McCallum, 2006]

Page 91: Predictive Social Network Analysis with Multi-Modal Data

Topical Transfer

Transfer from Digital Libraries to other topics

WebBase: a repository of Web pages11Web Pages

Trawling the Web for Emerging Cyber-Communities

12Graphs

Lessons learned from the creation anddeployment of a terabyte digital video libr..

12Video

On being ‘Undigital’ with digital cameras:extending the dynamic...

14Computer Vision

Trawling the Web for Emerging Cyber-Communities, Kumar, Raghavan,... 1999.

31Web Pages

Paper TitleCit’sOther topic

Page 92: Predictive Social Network Analysis with Multi-Modal Data

Topical Diversity

Papers that had the most influence across many other fields...

Page 93: Predictive Social Network Analysis with Multi-Modal Data

Topical DiversityEntropy of the topic distribution among

papers that cite this paper (this topic).

HighDiversity

LowDiversity

Page 94: Predictive Social Network Analysis with Multi-Modal Data

Topical Bibliometric Impact Measures

• Topical Citation Counts

• Topical Impact Factors

• Topical Longevity

• Topical Precedence

• Topical Diversity

• Topical Transfer

[Mann, Mimno, McCallum, 2006]

Page 95: Predictive Social Network Analysis with Multi-Modal Data

Topical PrecedenceWithin a topic, what are the earliest papers that received more than n citations?

“Early-ness”

Speech Recognition:

Some experiments on the recognition of speech, with one and two ears,E. Colin Cherry (1953)

Spectrographic study of vowel reduction,B. Lindblom (1963)

Automatic Lipreading to enhance speech recognition, Eric D. Petajan (1965)

Effectiveness of linear prediction characteristics of the speech wave for...,B. Atal (1974)

Automatic Recognition of Speakers from Their Voices,B. Atal (1976)

Page 96: Predictive Social Network Analysis with Multi-Modal Data

Topical PrecedenceWithin a topic, what are the earliest papers that received more than n citations?

“Early-ness”

Information Retrieval:

On Relevance, Probabilistic Indexing and Information Retrieval,Kuhns and Maron (1960)

Expected Search Length: A Single Measure of Retrieval Effectiveness Basedon the Weak Ordering Action of Retrieval Systems,

Cooper (1968)Relevance feedback in information retrieval,

Rocchio (1971)Relevance feedback and the optimization of retrieval effectiveness,

Salton (1971)New experiments in relevance feedback,

Ide (1971)Automatic Indexing of a Sound Database Using Self-organizing Neural Nets,

Feiten and Gunzel (1982)

Page 97: Predictive Social Network Analysis with Multi-Modal Data

Topical Transfer Through Time

• Can we predict which research topicswill be “hot” at ICML next year?

• ...based on– the hot topics in “neighboring” venues last year– learned “neighborhood” distances for venue pairs

Page 98: Predictive Social Network Analysis with Multi-Modal Data

How do Ideas Progress ThroughSocial Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 99: Predictive Social Network Analysis with Multi-Modal Data

How do Ideas Progress ThroughSocial Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 100: Predictive Social Network Analysis with Multi-Modal Data

How do Ideas Progress ThroughSocial Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 101: Predictive Social Network Analysis with Multi-Modal Data

Topic Prediction Models

Static Model

Transfer Model

Linear Regression and Ridge RegressionUsed for Coefficient Training.

Page 102: Predictive Social Network Analysis with Multi-Modal Data

Preliminary Results

MeanSquaredPredictionError

# Venues used for prediction

Transfer Model with Ridge Regression is a good Predictor

(SmallerIs better) Transfer

Model

Page 103: Predictive Social Network Analysis with Multi-Modal Data

Outline• Social Network Analysis

– Roles (Author-Recipient-Topic Model)

– Groups (Group-Topic Model)

– Trends over time (Topics-over-Time Model, TOT)

– Preferential Attachment (Community-Author-Topic, CAT)

• Undirected Graphical Models– Flexible Objective Functions (Multi-Conditional Learning, MCL)

– Topics for Prediction (Multinomial-Components-Analysis, MCA)

• Demo: Rexa, a Web portal for researchers– Topical Impact Measures (Diversity,...)

Page 104: Predictive Social Network Analysis with Multi-Modal Data

Social Network Analysis(Pattern Discovery in Networks)

Network Connectivity Network Connectivity + Many Attributes

Descriptive Predictive & Prescriptive

Text, Timestamps, Authors,...

Generative, Latent Variable Models

Direct Measures, Agglomerative & Spectral Clustering,...

P(edge|...) P(attribute|...), P(group|...)P(collaborator|...)

Small World, Betweeness Centrality,...

Data:

Objective:

Methodology: