39
Multimodal Alignment of Scholarly Documents and Their Presentations Bamdad Bahrani and Min- Yen Kan Slides Available: http ://bit.ly/

Multimodal Alignment of Scholarly Documents and Their Presentations

  • Upload
    astra

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Slides Available: http ://bit.ly/ 1bMSJee. Multimodal Alignment of Scholarly Documents and Their Presentations. Bamdad Bahrani and Min-Yen Kan. Slides Available: http ://bit.ly/ 1bMSJee. We read papers, lots of papers! How do we make sense of this knowledge? By reading the proceedings?. - PowerPoint PPT Presentation

Citation preview

Page 1: Multimodal Alignment of Scholarly Documents and Their Presentations

Multimodal Alignment of Scholarly Documents and

Their PresentationsBamdad Bahrani and Min-Yen

Kan

Slides Available: http://bit.ly/1bMSJee

Page 2: Multimodal Alignment of Scholarly Documents and Their Presentations

24 Jul 2013 JCDL 2013, Indiapolis, USA 2

We read papers, lots of papers!

How do we make sense of this knowledge?

By reading the proceedings?

Photo Credits: Mike Dory @ Flickr

Slides Available: http://bit.ly/1bMSJee

Page 3: Multimodal Alignment of Scholarly Documents and Their Presentations

24 Jul 2013 JCDL 2013, Indiapolis, USA 3

Photo Credits: Xeeliz @ Flickr

We attend conferences in part to help learn from each other.

A key artifact is the slide presentation, which often summarizes the work in an accessible manner.

But they:• Are not detailed

enough• Miss important

technical detailsIdea: Use both together

Slides Available: http://bit.ly/1bMSJee

Page 4: Multimodal Alignment of Scholarly Documents and Their Presentations

Better to juxtapose both media together in a fine-grained manner.

Output: an alignment map

ALIGNING PAPERS TO THEIR PRESENTATIONS

24 Jul 2013 JCDL 2013, Indiapolis, USA 4

Page 5: Multimodal Alignment of Scholarly Documents and Their Presentations

PROBLEM STATEMENT• Generate an alignment map for a

pair• Paper, containing m (sub)sections and• Presentation, containing n slides

• A slide-centric alignment: Each slide is aligned to – either a section of the paper, or – unaligned (termed nil alignment)

24 Jul 2013 JCDL 2013, Indiapolis, USA 5

Page 6: Multimodal Alignment of Scholarly Documents and Their Presentations

OUTLINE• Motivation and Problem Statement

• Baseline Analysis on an Existing Dataset

• Methodology – Multimodal Alignment• Experimental Results

24 Jul 2013 JCDL 2013, Indiapolis, USA 6

Page 7: Multimodal Alignment of Scholarly Documents and Their Presentations

RELATED WORKHow can we improve on past work?

24 Jul 2013 JCDL 2013, Indiapolis, USA 7

Hayama et al 2005

Ephraim 2006

Kan 2007

Beamer & Girju 2009

Our Work – Multimodal Alignment

Text similarity Monotonic alignment

Nil identificati

on

(Suggested)

(Suggested

)

(Suggested)

Visual

content

We note that none of it considered visual content.

Page 8: Multimodal Alignment of Scholarly Documents and Their Presentations

ANALYSIS OF A BASELINE8

Use the public dataset from (Ephraim, 2006).• 20 Presentation–Paper pairs– Papers in .PDF, source DBLP

• Sections / Subsections

– Presentations in .PPT, verified to have been constructed by same author• Slides

24 Jul 2013 JCDL 2013, Indiapolis, USA

Page 9: Multimodal Alignment of Scholarly Documents and Their Presentations

ANALYSIS OF A BASELINE9

Use the public dataset from (Ephraim, 2006).• 20 Presentation–Paper pairs– Papers in .PDF, source DBLP

• Sections / Subsections

– Presentations in .PPT, verified to have been constructed by same author• Slides

24 Jul 2013 JCDL 2013, Indiapolis, USA

Total number of sections 515 Average number of sections per paper 25.75

Total number of slides 751Average number of slides per presentation 37.5

Page 10: Multimodal Alignment of Scholarly Documents and Their Presentations

DEMOGRAPHICS1024 Jul 2013 JCDL 2013, Indiapolis, USA

Page 11: Multimodal Alignment of Scholarly Documents and Their Presentations

BASELINE ERROR ANALYSISSlide Type Common reason % Incorrectly

Aligned by Baseline

Nil Doesn’t know where to align align to best fit

64%

Outline Name of some sections in it align to longest one

36%

Image Very little text available 81%Drawing Noisy data: lots of shapes and

text boxes53%

Table Little text, noisy data 50%Text 24%Approximately 70% of these

errors belong to “Evaluation” or “Results” slides

1124 Jul 2013 JCDL 2013, Indiapolis, USA

81%

Page 12: Multimodal Alignment of Scholarly Documents and Their Presentations

MONOTONIC ALIGNMENTWe observed that the alignment between slides and sections is largely monotonic.

12

Sections (1-26)

Slid

es (1

-37)

24 Jul 2013 JCDL 2013, Indiapolis, USA

Why 26 sections and 37 slides? The average number of each in the pairs in the dataset.

New work! Not in the paper.

Page 13: Multimodal Alignment of Scholarly Documents and Their Presentations

EVIDENCE FOR ALIGNMENT1. Text Similarity (Baseline)– Between each slide and each

section

2. Linear Ordering– Slides and sections are often

monotonically aligned with respect to previous aligned pair

3. Visual Content– Represented by a slide image

classifier

1324 Jul 2013 JCDL 2013, Indiapolis, USA

Page 14: Multimodal Alignment of Scholarly Documents and Their Presentations

COMBINING EVIDENCERepresent each of the three sources as a probability distribution or preference

1. Text Similarity2. Linear Ordering3. Visual Content

Handle obvious exceptions.Weight distributions together to find most likely point as alignment.

1424 Jul 2013 JCDL 2013, Indiapolis, USA

Page 15: Multimodal Alignment of Scholarly Documents and Their Presentations

SYSTEM ARCHITECTURE15

Input: Presentation

Pre-processin

gText Alignment

Input: Document

nil

Linear Ordering Alignment

1. Text 3. Drawing

2. Outline

4. Results

Multimodal AlignmentSlide Image Classifier

Output: Alignment map

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

24 Jul 2013 JCDL 2013, Indiapolis, USA

Current architecture. Slightly different from published paper.

Page 16: Multimodal Alignment of Scholarly Documents and Their Presentations

TEXT EXTRACTION• Presentation

• PaperPDF

xPDF Parser

(via Python)

XML Section Text

16

MS PowerPoint VB compiler

Slides1. Slide Text

2. Slide Number

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

24 Jul 2013 JCDL 2013, Indiapolis, USA

PRE-PROCESSING

Page 17: Multimodal Alignment of Scholarly Documents and Their Presentations

• StemmingTo conflate semantically similar words– For both the presentation and paper text– Replace each word with its stem

e.g., “Tagging” “Tag”

• Part of Speech (POS) TaggingTo reduce noise– For the paper text– Tag all words, retaining only important tags: Noun,

Verb, Adjective, Adverb and Conjunction

• 17

STEMMING AND TAGGING

• 24 Jul 2013 • JCDL 2013, Indiapolis, USA

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image ClassifierPRE-PROCESSING

Page 18: Multimodal Alignment of Scholarly Documents and Their Presentations

1. TEXT SIMILARITY• tf.idf cosine-based similarity measure– Previous works have all used textual

evidence–We use it as baseline– Primary alignment component

• For each slide s, computes similarity for all sections– Probability distribution – Outputs a text alignment vector (VTs)

24 Jul 2013 JCDL 2013, Indiapolis, USA 18

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image ClassifierALIGNMENT MODALITY

Page 19: Multimodal Alignment of Scholarly Documents and Their Presentations

2. LINEAR ORDERING24 Jul 2013 JCDL 2013, Indiapolis, USA 19

0 0 0.1 0.2 0.4 0.2 0.1 0 0

1. 2. 2.1 3. 3.1 3.2 4. 5. 5.1

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

ALIGNMENT MODALITY

• Outputs a linear alignment vector (OVs) for each slide s

• Probability mass centered at E.g., A presentation with 20 slides and 9 (sub-)sections:

Page 20: Multimodal Alignment of Scholarly Documents and Their Presentations

3. SLIDE IMAGE CLASSIFIER

Take Snapshot

Slides

1. Text

2. Outline

3. Drawing

4. Results

Image Classifier

Image

20

Pre-processin

g Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

24 Jul 2013 JCDL 2013, Indiapolis, USA

ALIGNMENT MODALITY

Note: Different classes than in the earlier analysis

Page 21: Multimodal Alignment of Scholarly Documents and Their Presentations

CLASSIFIER RESULTS• Used a different set of 750 manually-

annotated slides• Linear SVM, using a single feature class of

Histogram of Oriented Gradients (HOG)• 10-fold cross validation

24 Jul 2013 JCDL 2013, Indiapolis, USA 21

Image Class Text Outlin

e Drawing Result Average

Recall 0.89 1.00 1.00 1.00 0.97Precision 0.84 0.94 0.82 0.83 0.85

F1 measure 0.86 0.96 0.90 0.90 0.90

Pre-processing Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

Presentation only material: Table not in paper.

Page 22: Multimodal Alignment of Scholarly Documents and Their Presentations

MULTIMODAL FUSION• Input for each slide:

1. Text Alignment Vector VTs

2. Ordering Alignment Vector VOs

3. Class assigned from image classifier

• Define 3 weights as: WTs + WOs + Wnil = 1.00

• Tune weights according to image classes

• Apply Nil classifier

• Output for each slide: Final Alignment Vector FAVs

24 Jul 2013 JCDL 2013, Indiapolis, USA 22

Pre-processing Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

N.B.: not image evidence

Page 23: Multimodal Alignment of Scholarly Documents and Their Presentations

RE-WEIGHTINGInitial Distribution

24 Jul 2013 JCDL 2013, Indiapolis, USA 23

1. Text 3. Drawing

2. Outline 4. Results

Slide Image ClassifierSLIDE IMAGE CLASSIFICATION

Wnil WOsWTs

Page 24: Multimodal Alignment of Scholarly Documents and Their Presentations

RE-WEIGHTINGText Slide

24 Jul 2013 JCDL 2013, Indiapolis, USA 24

1. Text 3. Drawing

2. Outline 4. Results

Slide Image ClassifierSLIDE IMAGE CLASSIFICATION

Wnil WOsWTs

Page 25: Multimodal Alignment of Scholarly Documents and Their Presentations

RE-WEIGHTINGOutline Slide

24 Jul 2013 JCDL 2013, Indiapolis, USA 25

1. Text 3. Drawing

2. Outline 4. Results

Slide Image ClassifierSLIDE IMAGE CLASSIFICATION

Wnil WOsWTs

Page 26: Multimodal Alignment of Scholarly Documents and Their Presentations

RE-WEIGHTINGDrawing Slide

24 Jul 2013 JCDL 2013, Indiapolis, USA 26

1. Text 3. Drawing

2. Outline 4. Results

Slide Image ClassifierSLIDE IMAGE CLASSIFICATION

Wnil WOsWTs

Leave weights as initially uniform

Page 27: Multimodal Alignment of Scholarly Documents and Their Presentations

EXCEPTION 1:RESULTSResults Slide

1. Text 3. Drawing

2. Outline 4. Results

Slide Image ClassifierSLIDE IMAGE CLASSIFICATION

Wnil WOsWTs

Ignore weights and

Align to “Experiment and Results” section

// end

24 Jul 2013 JCDL 2013, Indiapolis, USA 27

Page 28: Multimodal Alignment of Scholarly Documents and Their Presentations

EXCEPTION 2: NIL CLASSIFIER

24 Jul 2013 JCDL 2013, Indiapolis, USA 28

Use a heuristic to discard nil slides from alignment:•

• Nil factor =

If Nil factor > 0.40 classify as nil

Page 29: Multimodal Alignment of Scholarly Documents and Their Presentations

FINAL ALIGNMENT VECTORIf the exceptions do not apply, i.e.,– the slide s was not a “Results” slide,– and it was not classified as nil,

Then:– s is aligned to the section with the

highest probability in the final alignment vector:

24 Jul 2013 JCDL 2013, Indiapolis, USA 29

Pre-processing Text Alignment

nil

Ordering Alignment

Multimodal Alignment

Slide Image Classifier

Page 30: Multimodal Alignment of Scholarly Documents and Their Presentations

EXPERIMENTSFor comparative evaluation

S1. Text-only Paragraph-to-slide alignment

To further the state-of-the-art

S2. Text-only Section-to-slide alignment

S3. S2 + Linear Ordering

S4. S3 + Image Classification

24 Jul 2013 JCDL 2013, Indiapolis, USA 30

Page 31: Multimodal Alignment of Scholarly Documents and Their Presentations

Results

Baseline

Section

Ordering

Image Class

16%

24 Jul 2013 JCDL 2013, Indiapolis, USA

Page 32: Multimodal Alignment of Scholarly Documents and Their Presentations

RESULTS BY SLIDE TYPE32

Num

ber

of

slid

es

nil (B

aselin

e)

nil (S

4)

Outline

(Base

line)

Outline

(S4)

Imag

e (Ba

seline

)

Imag

e (S4

)

Table

(Base

line)

Table

(S4)

Drawing

(Base

line)

Drawing

(S4)

020406080

100120140

4587

23 31 1755

4 730 44

8341

13 573

35

4 1

35 21

Correct Alignment Incorrect

24 Jul 2013 JCDL 2013, Indiapolis, USA

• Improvement in all categories• Especially in Image and nils

Recent Work. Not in published paper.

Page 33: Multimodal Alignment of Scholarly Documents and Their Presentations

• More than 40% of slides contain elements other than text

• Baseline analysis shows the error rate: – 13% of overall incorrect alignment

on text slides.– 26% of overall incorrect alignment

on others.

• We use visual content to classify the slides– Heuristic and weights depending on slide

class

9 % 13%

SUMMARY3324 Jul 2013 JCDL 2013, Indiapolis, USA

Final system (S4)

50% reduction in targeted errors

Page 34: Multimodal Alignment of Scholarly Documents and Their Presentations

CONCLUSION• Many slides with images and drawings,

where text is insufficient evidence for alignment.

• Visual evidence serves to drive the alignment:– As evidence (Image Classification)– As a system architecture driver (Multimodal

Fusion)

THANK YOU

3424 Jul 2013 JCDL 2013, Indiapolis, USA

Page 35: Multimodal Alignment of Scholarly Documents and Their Presentations

BACK UP SLIDES

3524 Jul 2013 JCDL 2013, Indiapolis, USA

Page 36: Multimodal Alignment of Scholarly Documents and Their Presentations

APPLICATIONS• Help the process of learning for beginners

by reviewing a paper along with its presentation.

• Improve the quality of the skimming process for researchers and professionals.

• Generate a large dataset of aligned slides and sections for the purpose of (semi-) automatic presentation generation.

3624 Jul 2013 JCDL 2013, Indiapolis, USA

Page 37: Multimodal Alignment of Scholarly Documents and Their Presentations

FUTURE WORKMore accurate text similarity

measures.

Differentiate between title and body text, and account for slide formatting.

Handling slides include hyperlinks, videos, animations, or other multimedia.

24 Jul 2013 JCDL 2013, Indiapolis, USA 37

Page 38: Multimodal Alignment of Scholarly Documents and Their Presentations

OLD SYSTEM ARCHITECTURE

Input: Presentation

Text Extraction

Textual Similarity

Input: Document

nil

Linear Ordering

1. Text 3. Drawing

2. Index 4. Results

Multimodal Fusion

Slide Image Classifier

Output: Alignment Map

3824 Jul 2013 JCDL 2013, Indiapolis, USA

Page 39: Multimodal Alignment of Scholarly Documents and Their Presentations

OLD WEIGHT TUNING 1. Text

Text similarity alignment weight (WTs) Increase 2/3

2. OutlineText similarity alignment weight (WTs) Decrease 1/3Linear ordering alignment weight (WOs) Decrease

1/3

3. DrawingUniform probability for all weights

4. ResultExceptional rule: Align directly to “Experiment and

Result” section

3924 Jul 2013 JCDL 2013, Indiapolis, USA