44
Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Embed Size (px)

Citation preview

Page 1: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Trains of Thought: Generating Information Maps

Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Page 2: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

The abundance of books is a distraction‘‘

,,Lucius Annaeus Seneca

4 BC – 65 AD

Page 3: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

So, you want to understand a complex topic…

Now what?

Page 4: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Search Engines are Great

• But do not show how it all fits together

Page 5: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Timeline Systems

Page 6: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Real Stories are not Linear

Page 7: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Metro Map

• A set of lines• Each line follows a coherent narrative thread• Structure + multiple aspects

austerity

bailout

junk status

Germany

protests

strike

labor unionsMerkel

Page 8: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Map Definition• A map M is a pair (G, P) where – G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line

austerity

bailout

junk status

protests

strike

Germany

labor unionsMerkel

Page 9: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 10: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Properties of a Good Map

1. Coherence

???

Page 11: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

1 2 3 4 5

Greece

Europe

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

Coherence is not a property of local interactions:

Incoherent: Each pair shares different words

Page 12: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

1 2 3 4 5

Greece

Austerity

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

A more-coherent chain:

Coherent: a small number of words captures the story

Page 13: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Properties of a Good Map

1. Coherence

Is it enough?

Page 14: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Max-coherence MapQuery: Clinton

Clinton visitsBelfast

Clinton setfor Dublin

High hopes for Clinton visit

Clinton, Religious Leaders Share

Thoughts

Church Leaders Praise Clinton's

'Spirituality'

Religion Leaders Divided on Clinton

Moral Issue

Clinton Should Resign, 2 Religious

Leaders Say

Page 15: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Properties of a Good Map

1. Coherence

2. Coverage

Should cover diverse topics important to

the user

Page 16: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Coverage• Select a small set of diverse articles that

covers the most important stories

January 17, 2009

Turning Down the Noise [El-Arini, Veda, S, Guestrin, KDD’09]

Page 17: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Coverage: The Idea• Documents cover concepts:

CorpusCoverage

Page 18: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

High-coverage, Coherent Map

Greek Civil ServantsStrike over

Austerity MeasuresGreece Paralyzed

by New Strike

Greek Take to theStreets, but Lacing

Earlier Zeal

Infighting Adds to Merkel’s Woes

It’s Germany that Matters

UK Backs Germany’s Effort

Germany says the IMF should Rescue

Greece

IMF more Likely to Lead Efforts

IMF is Urged to Move Forward

Page 19: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Properties of a Good Map

1. Coherence

2. Coverage

3. Connectivity

Page 20: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Definition: Connectivity

• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines

Page 21: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Tying it all Together:Map Objective

• Coherence– Either coherent or not: Constraint

• Coverage– Must have!

• Connectivity– Nice to have

Consider all coherent maps with maximum possible coverage.

Find the most connected one.

Page 22: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 23: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Page 24: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Coherence Graph: Main Idea

• Vertices correspond to short coherent chains• Directed edges between chains which can be

conjoined and remain coherent

1 2 3

4 5 6 5 8 9

1 2 3 5 8 9

Page 25: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Finding Vertices

• Vertices are short, coherent chains• Can use [KDD’10]– Expensive– Solving many LPs

• Take advantage of simplicity of short stories– No topic drift– Sampling-based (fast) algorithm

Page 26: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Finding Edges

• Problem: Combining several strong chains may result in a much-weaker chain

Discontinuity:Change of focus

Page 27: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

A chain is m-coherent if each sub-chain (di, …, di+m) is coherent.

m-Coherence• Control discontinuity points:

• m: size of user's ‘history window‘– m=length(chain) : standard coherence– m=1: optimize transitions without context

Page 28: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Observation

• If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:

Page 29: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Using the Observation

1 2 3

2 3 4 2 3 5

1 2 3 5

• If two chains are m-Coherent and have m-1 overlap, the conjoined chain is m-coherent:

• Useful for divide and conquer:– Add edge if m-1 overlap

Page 30: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Page 31: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

1 2 3

2 3 4 2 3 5

1 2 3 51 2 3 4Cover( ) > Cover( ) ?

Page 32: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation

Orienteering

a function of the nodes visited

Page 33: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Approach Overview: Recap

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Encodes all m-coherent

chains as graph paths

Submodular orienteering [Chekuri & Pal, 2005]

Quasipoly time recursive greedy

O(log OPT) approximation

Page 34: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Example Map: Greece Debt

Page 35: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Game Plan

Objective Algorithm Does itwork?

Page 36: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Evaluation

• User study– Document selection: capturing important content?– Micro-knowledge: question-answering– Macro-knowledge: high-level summaries– Effect of structure

• New York Times (2008-2010)– 18K+ articles– Chile, Haiti, Greece

Page 37: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Document Selection• Experts compose a list of important events• Subtopic recall (% of events in the map):

# lines

Subtopicrecall

Page 38: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Micro-Knowledge (Question Answering)

• Mechanical Turk

• Competitors:– Google News– Event threading (TDT) [Nallapati et al, 04]

– Structureless maps• Results: minor gains– map structure helps

Question 2: How many miners were trapped?

Page 39: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Macro-Knowledge(High-Level Summaries)

• Summarize complex story in a paragraph– Maps vs. Google News– ~15 paragraphs per task

• Mturk to evaluate paragraphs:– Which paragraph provided a more complete and

coherent picture of the story?– Justification: Paragraph A is more… – ~300 evaluations per task

Page 40: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Macro-Knowledge: Results

• Greece: 72% prefer maps– Justifications:

• Haiti: 59% prefer maps– Map users mostly summarized one story line

MapsGoogle News

Bottom line: maps are more useful as high-level tools for stories without a single dominant storyline

Page 41: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Conclusions• Formulated metrics characterizing good maps• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization

Thank you!

Page 42: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz
Page 43: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Finding Coherent Chains

• Goal: represent all coherent chains• Problem: intractable

• Divide and conquer:– Find short coherent chains– Concatenate to form longer coherent chains

Page 44: Trains of Thought: Generating Information Maps Dafna Shahaf, Carlos Guestrin and Eric Horvitz

Website