43
Research Introspection “ICML does ICML” Andrew McCallum Computer Science Department University of Massachusetts Amherst

Research Introspection “ICML does ICML”

  • Upload
    kennan

  • View
    30

  • Download
    1

Embed Size (px)

DESCRIPTION

Research Introspection “ICML does ICML”. Andrew McCallum Computer Science Department University of Massachusetts Amherst. Relational Modeling of the Research Literature & other Entities. Better understand structure of our own research area. Tools to help us learn a new sub-field. - PowerPoint PPT Presentation

Citation preview

Page 1: Research Introspection “ICML does ICML”

Research Introspection“ICML does ICML”

Andrew McCallum

Computer Science Department

University of Massachusetts Amherst

Page 2: Research Introspection “ICML does ICML”

Relational Modeling of theResearch Literature & other Entities

• Better understand structure of our own research area.

• Tools to help us learn a new sub-field.• Aid collaboration• Map how ideas travel through social networks

of researchers.• Aids for hiring and finding reviewers!

• Many opportunities for rich relational learning• ... in a domain we understand well.

Page 3: Research Introspection “ICML does ICML”

Previous Systems

Page 4: Research Introspection “ICML does ICML”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 5: Research Introspection “ICML does ICML”

ResearchPaper

Cites

Previous Systems

Page 6: Research Introspection “ICML does ICML”

ResearchPaper

Cites

Person

UniversityVenue

Grant

Groups

Expertise

More Entities and Relations

Page 7: Research Introspection “ICML does ICML”

Rexa System Overview

Reference resolution

(of papers, authors & grants)

Spider Web

for PDFs

Convert to text

(with layout & format)

Extract metadata

(title, authors, abstract, venue,

citations; 14 fields in total)

Browsable Web

Interface

Topic Analysis & other Data

Mining

WWW

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Home-grownJava+MySQL

(~1m PDF/day)

Enhancedps2text

(better word stiching,plus layout in XML)

ConditionalRandom Fields

(99% word accuracy)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

NSF grant DB

Discriminativelytrained

graph partitioning

(competition-winningaccuracy)

Page 8: Research Introspection “ICML does ICML”
Page 9: Research Introspection “ICML does ICML”
Page 10: Research Introspection “ICML does ICML”
Page 11: Research Introspection “ICML does ICML”
Page 12: Research Introspection “ICML does ICML”
Page 13: Research Introspection “ICML does ICML”
Page 14: Research Introspection “ICML does ICML”
Page 15: Research Introspection “ICML does ICML”
Page 16: Research Introspection “ICML does ICML”
Page 17: Research Introspection “ICML does ICML”
Page 18: Research Introspection “ICML does ICML”
Page 19: Research Introspection “ICML does ICML”
Page 20: Research Introspection “ICML does ICML”
Page 21: Research Introspection “ICML does ICML”
Page 22: Research Introspection “ICML does ICML”

From Text to Actionable Knowledge

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Page 23: Research Introspection “ICML does ICML”

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

Database

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Uncertainty Info

Emerging Patterns

Joint Inference

Page 24: Research Introspection “ICML does ICML”

SegmentClassifyAssociateCluster

Filter

Prediction Outlier detection Decision support

IE

Documentcollection

ProbabilisticModel

Discover patterns - entity types - links / relations - events

DataMining

Spider

Actionableknowledge

Conditional Random Fields [Lafferty, McCallum, Pereira]

Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…]

Discriminatively-trained undirected graphical models

Complex Inference and LearningJust what we researchers like to sink our teeth into!

Unified Model

Page 25: Research Introspection “ICML does ICML”

Information Extraction

Markov dependencies

...and long-range & KB dependencies?

Page 26: Research Introspection “ICML does ICML”

IE from Research Papers[McCallum et al ‘99]

@article{ kaelbling96reinforcement, author = "Leslie Pack Kaelbling and Michael L. Littman and Andrew P. Moore", title = "Reinforcement Learning: A Survey", journal = "Journal of Artificial Intelligence Research", volume = "4", pages = "237-285", year = "1996",

Page 27: Research Introspection “ICML does ICML”

(Linear Chain) Conditional Random Fields

yt -1

yt

xt

yt+1

xt +1

xt -1

Finite state model Graphical model

Undirected graphical model, trained to maximize

conditional probability of output sequence given input sequence

. . .

FSM states

observations

yt+2

xt +2

yt+3

xt +3

said Jones a Microsoft VP …

OTHER PERSON OTHER ORG TITLE …

output seq

input seq

Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04]Object classification in images [CVPR ‘04]

Wide-spread interest, positive experimental results in many applications.

Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04]IE from Bioinformatics text [Bioinformatics ‘04],…

[Lafferty, McCallum, Pereira 2001]

p(y | x) =1

Zx

Φ(y t ,y t−1,x, t)t

∏ where

Φ(y t ,y t−1,x, t) = exp λ k fk (y t ,y t−1,x, t)k

∑ ⎛

⎝ ⎜

⎠ ⎟

Page 28: Research Introspection “ICML does ICML”

Entity Resolution

Joint inference among all pairwise coref

...models of entities, attributes, first-order...

Page 29: Research Introspection “ICML does ICML”

Y/N

Y/N

Y/N

Joint Co-reference Decisions,Discriminative Model

Stuart Russell

Stuart Russell

[Culotta & McCallum 2005]

S. Russel

People

Page 30: Research Introspection “ICML does ICML”

Y/N

Y/N

Y/N

Y/N

Y/N

Y/N

Co-reference for Multiple Entity Types

Stuart Russell

Stuart Russell

University of California at Berkeley

[Culotta & McCallum 2005]

S. Russel

Berkeley

Berkeley

People Organizations

Page 31: Research Introspection “ICML does ICML”

Y/N

Y/N

Y/N

Y/N

Y/N

Y/N

Joint Co-reference of Multiple Entity Types

Stuart Russell

Stuart Russell

University of California at Berkeley

[Culotta & McCallum 2005]

S. Russel

Berkeley

Berkeley

People Organizations

Reduces error by 22%

Page 32: Research Introspection “ICML does ICML”

Structured Topic Models

Discovering latent structurein jointly modeling words, time, relations...

Page 33: Research Introspection “ICML does ICML”

Topical N-gram Model

z1 z2 z3 z4

w1 w2 w3 w4

y1 y2 y3 y4

1

T

D

. . .

. . .

. . .

WTW

1 2 2

[Wang, McCallum 2005]

Page 34: Research Introspection “ICML does ICML”

Finding Topics with TNG

Traditional unigram LDArun on 1.6 million

titles / abstracts(200 topics)

...select ~300k papers onML, NLP, robotics, vision...

Find 200 TNG topics among those papers.

Page 35: Research Introspection “ICML does ICML”

Topical TransferCitation counts from one topic to another.

Map “producers and consumers”

Page 36: Research Introspection “ICML does ICML”

Trends in 17 years of NIPS proceedings

Page 37: Research Introspection “ICML does ICML”

Topic Distributions Conditioned on Time

time

top

ic m

ass

(in

ver

tica

l h

eig

ht)

Page 38: Research Introspection “ICML does ICML”

Topical Transfer Through Time

• Can we predict which research topicswill be “hot” at ICML next year?

• ...based on– the hot topics in “neighboring” venues last year– learned “neighborhood” distances for venue pairs

Page 39: Research Introspection “ICML does ICML”

How do Ideas Progress Through Social Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 40: Research Introspection “ICML does ICML”

How do Ideas Progress Through Social Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 41: Research Introspection “ICML does ICML”

How do Ideas Progress Through Social Networks?

COLT

“ADA Boost”

ICML

ACL(NLP)

ICCV(Vision)

SIGIR(Info. Retrieval)

Hypothetical Example:

Page 42: Research Introspection “ICML does ICML”

Preliminary Results

MeanSquaredPredictionError

# Venues used for prediction

Transfer Model with Ridge Regression is a good Predictor

(SmallerIs better) Transfer

Model

Page 43: Research Introspection “ICML does ICML”

Other Relational Opportunities

• Categorizing citations.• Map transfer of ideas through science.• Rank CS departments by various criteria.• What 10 papers tell the story of ASR research?• Predicting when a student will graduate.• Help me find the right postdoc.• Suggest best collaborative opportunities.• Who should chair the next ICML?