45
Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Metro Maps of

  • Upload
    eydie

  • View
    41

  • Download
    0

Embed Size (px)

DESCRIPTION

Metro Maps of. Dafna Shahaf Carlos Guestrin Eric Horvitz. T he abundance of books is a distraction. ‘‘. ,,. Lucius Annaeus Seneca. 4 BC – 65 AD. … and it does not get any better. 129,864,880 Books (Google estimate) Research: - PowerPoint PPT Presentation

Citation preview

Page 1: Metro Maps of

Metro Maps of

Dafna ShahafCarlos Guestrin

Eric Horvitz

Page 2: Metro Maps of

The abundance of books is a distraction‘‘

,,Lucius Annaeus Seneca

4 BC – 65 AD

Page 3: Metro Maps of

… and it does not get any better

• 129,864,880 Books (Google estimate)

• Research:– PubMed: 19 million papers

(One paper added per minute!)– Scopus: 40 million papers

Page 4: Metro Maps of

Papers

InnovativePapers

Page 5: Metro Maps of

So, you want to understand a research topic…

Now what?

Page 6: Metro Maps of

Search Engines are Great

• But do not show how it all fits together

Page 7: Metro Maps of

Timeline Systems

Page 8: Metro Maps of

Research is not Linear

Page 9: Metro Maps of

Metro Map

• A map is a set of lines of articles• Each line follows a coherent narrative thread• Temporal Dynamics + Structure

austerity

bailout

junk status

Germany

protests

strike

labor unionsMerkel

Page 10: Metro Maps of

Map Definition• A map M is a pair (G, P) where

– G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line

austerity

bailout

junk status

protests

strike

Germany

labor unionsMerkel

Page 11: Metro Maps of

Game Plan

Objective Algorithm Does it

work?

Page 12: Metro Maps of

Properties of a Good Map

1. Coherence

???

Page 13: Metro Maps of

1 2 3 4 5

Greece

Europe

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

Coherence is not a property of local interactions:

Incoherent: Each pair shares different words

Page 14: Metro Maps of

1 2 3 4 5

Greece

Austerity

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

A more-coherent chain:

Coherent: a small number of words captures the story

Page 15: Metro Maps of

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Page 16: Metro Maps of

Using the Citation Graph

• Create a graph per word– All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11]

3

6 7

4

9

2

8

1

5

Network

Where did paper 8 get the idea?

Do papers 8 and 9 mean the same thing?

Page 17: Metro Maps of

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Incoherent

Page 18: Metro Maps of

Properties of a Good Map

1. Coherence

Is it enough?

Page 19: Metro Maps of

Max-coherence MapQuery: Reinforcement Learning

Page 20: Metro Maps of

Properties of a Good Map

1. Coherence

2. Coverage

Should cover diverse topics important to

the user

Page 21: Metro Maps of

Coverage: What to Cover?

• Perhaps words?• Not enough:

SVM in oracle database 10gMilenova et al

VLDB '05

Support Vector Machines in Relational Databases RupingSVM '02

1

2

Page 22: Metro Maps of

Similar Content

1 2

Page 23: Metro Maps of

Different Impact Citing Venues and Authors:

Affected more authors/ venues

Very little intersection

1 2

Page 24: Metro Maps of

What to Cover?

• Instead of words…• Cover papers• A paper covers papers that

it had an impact on• High-coverage map:

impact on a lot of the corpus• Why descendants?

• Soft notion: [0,1]

Page 25: Metro Maps of

p has High Impact on q if…p

q

Many paths(especially short)

Note that our protocol is different from previous

work…

coherent

Formalize with coherent random walks

We use the algorithm of…

r

Page 26: Metro Maps of

Map Coverage• Documents cover pieces of the corpus:

CorpusCoverage

Page 27: Metro Maps of

High-coverage, Coherent Map

Page 28: Metro Maps of

Properties of a Good Map

1. Coherence

2. Coverage

3. Connectivity

Page 29: Metro Maps of

Definition: Connectivity

• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines

Page 30: Metro Maps of

Lines with No Intersection

Solution: Reward lines that had impact on each other

Perceptrons SVMOptimizing Kernels

for SVM

Face DetectionSVM for Facial

Recognition

Page 31: Metro Maps of

Tying it all Together:Map Objective

• Coherence– Either coherent or not: Constraint

• Coverage– Must have!

• Connectivity– Nice to have

Consider all coherent maps with maximum possible coverage.

Find the most connected one.

Page 32: Metro Maps of

Game Plan

Objective Algorithm Does it

work?

Page 33: Metro Maps of

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Page 34: Metro Maps of

Coherence Graph: Main Idea

• Vertices correspond to short coherent chains• Directed edges between chains which can be

conjoined and remain coherent

1 2 3

4 5 6 5 8 9

1 2 3 5 8 9

Page 35: Metro Maps of

Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

1 2 3

4 5 6 5 8 9

Cover( )

>

Cover( )

?

1 2 3 4 5 6

1 2 3 5 8 9

Page 36: Metro Maps of

Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation

Orienteering

a function of the nodes visited

Page 37: Metro Maps of

Approach Overview: Recap

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Encodes all coherent chains as

graph paths

Submodular orienteering [Chekuri & Pal, 2005]

Quasipoly time recursive greedy

O(log OPT) approximation

Page 38: Metro Maps of

Example Map: Reinforcement Learning

multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdp

Page 39: Metro Maps of

Example Map Detail: SVM

Page 40: Metro Maps of

Game Plan

Objective Algorithm Does it

work?

Page 41: Metro Maps of

User Study

• Tricky!– No double-blind, no within-subject– Domain: understandable yet unfamiliar– Reinforcement Learning (RL)

Page 42: Metro Maps of

User Study

• 30 participants• First-year grad student, Reinforcement

Learning project• Update a survey paper from 1996• Identify research directions + relevant papers

– Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

Page 43: Metro Maps of

Results (in a nutshell)Be

tter

Google Us Google Us

Map users find better papers, and

cover more important areas

Page 44: Metro Maps of

User CommentsHelpful

noticed directions I didn't know aboutgreat starting point

… get a basic idea of what science is up to

why don't you draw words on edges?

Legend is confusing

hard to get an idea from paper title alone

Page 45: Metro Maps of

Conclusions• Formulated metrics characterizing good maps for

the scientific domain• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization

Thank you!