Upload
astra
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides Available: http ://bit.ly/ 1bMSJee. Multimodal Alignment of Scholarly Documents and Their Presentations. Bamdad Bahrani and Min-Yen Kan. Slides Available: http ://bit.ly/ 1bMSJee. We read papers, lots of papers! How do we make sense of this knowledge? By reading the proceedings?. - PowerPoint PPT Presentation
Citation preview
Multimodal Alignment of Scholarly Documents and
Their PresentationsBamdad Bahrani and Min-Yen
Kan
Slides Available: http://bit.ly/1bMSJee
24 Jul 2013 JCDL 2013, Indiapolis, USA 2
We read papers, lots of papers!
How do we make sense of this knowledge?
By reading the proceedings?
Photo Credits: Mike Dory @ Flickr
Slides Available: http://bit.ly/1bMSJee
24 Jul 2013 JCDL 2013, Indiapolis, USA 3
Photo Credits: Xeeliz @ Flickr
We attend conferences in part to help learn from each other.
A key artifact is the slide presentation, which often summarizes the work in an accessible manner.
But they:• Are not detailed
enough• Miss important
technical detailsIdea: Use both together
Slides Available: http://bit.ly/1bMSJee
Better to juxtapose both media together in a fine-grained manner.
Output: an alignment map
ALIGNING PAPERS TO THEIR PRESENTATIONS
24 Jul 2013 JCDL 2013, Indiapolis, USA 4
PROBLEM STATEMENT• Generate an alignment map for a
pair• Paper, containing m (sub)sections and• Presentation, containing n slides
• A slide-centric alignment: Each slide is aligned to – either a section of the paper, or – unaligned (termed nil alignment)
24 Jul 2013 JCDL 2013, Indiapolis, USA 5
OUTLINE• Motivation and Problem Statement
• Baseline Analysis on an Existing Dataset
• Methodology – Multimodal Alignment• Experimental Results
24 Jul 2013 JCDL 2013, Indiapolis, USA 6
RELATED WORKHow can we improve on past work?
24 Jul 2013 JCDL 2013, Indiapolis, USA 7
Hayama et al 2005
Ephraim 2006
Kan 2007
Beamer & Girju 2009
Our Work – Multimodal Alignment
Text similarity Monotonic alignment
Nil identificati
on
(Suggested)
(Suggested
)
(Suggested)
Visual
content
We note that none of it considered visual content.
ANALYSIS OF A BASELINE8
Use the public dataset from (Ephraim, 2006).• 20 Presentation–Paper pairs– Papers in .PDF, source DBLP
• Sections / Subsections
– Presentations in .PPT, verified to have been constructed by same author• Slides
24 Jul 2013 JCDL 2013, Indiapolis, USA
ANALYSIS OF A BASELINE9
Use the public dataset from (Ephraim, 2006).• 20 Presentation–Paper pairs– Papers in .PDF, source DBLP
• Sections / Subsections
– Presentations in .PPT, verified to have been constructed by same author• Slides
24 Jul 2013 JCDL 2013, Indiapolis, USA
Total number of sections 515 Average number of sections per paper 25.75
Total number of slides 751Average number of slides per presentation 37.5
DEMOGRAPHICS1024 Jul 2013 JCDL 2013, Indiapolis, USA
BASELINE ERROR ANALYSISSlide Type Common reason % Incorrectly
Aligned by Baseline
Nil Doesn’t know where to align align to best fit
64%
Outline Name of some sections in it align to longest one
36%
Image Very little text available 81%Drawing Noisy data: lots of shapes and
text boxes53%
Table Little text, noisy data 50%Text 24%Approximately 70% of these
errors belong to “Evaluation” or “Results” slides
1124 Jul 2013 JCDL 2013, Indiapolis, USA
81%
MONOTONIC ALIGNMENTWe observed that the alignment between slides and sections is largely monotonic.
12
Sections (1-26)
Slid
es (1
-37)
24 Jul 2013 JCDL 2013, Indiapolis, USA
Why 26 sections and 37 slides? The average number of each in the pairs in the dataset.
New work! Not in the paper.
EVIDENCE FOR ALIGNMENT1. Text Similarity (Baseline)– Between each slide and each
section
2. Linear Ordering– Slides and sections are often
monotonically aligned with respect to previous aligned pair
3. Visual Content– Represented by a slide image
classifier
1324 Jul 2013 JCDL 2013, Indiapolis, USA
COMBINING EVIDENCERepresent each of the three sources as a probability distribution or preference
1. Text Similarity2. Linear Ordering3. Visual Content
Handle obvious exceptions.Weight distributions together to find most likely point as alignment.
1424 Jul 2013 JCDL 2013, Indiapolis, USA
SYSTEM ARCHITECTURE15
Input: Presentation
Pre-processin
gText Alignment
Input: Document
nil
Linear Ordering Alignment
1. Text 3. Drawing
2. Outline
4. Results
Multimodal AlignmentSlide Image Classifier
Output: Alignment map
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
24 Jul 2013 JCDL 2013, Indiapolis, USA
Current architecture. Slightly different from published paper.
TEXT EXTRACTION• Presentation
• PaperPDF
xPDF Parser
(via Python)
XML Section Text
16
MS PowerPoint VB compiler
Slides1. Slide Text
2. Slide Number
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
24 Jul 2013 JCDL 2013, Indiapolis, USA
PRE-PROCESSING
• StemmingTo conflate semantically similar words– For both the presentation and paper text– Replace each word with its stem
e.g., “Tagging” “Tag”
• Part of Speech (POS) TaggingTo reduce noise– For the paper text– Tag all words, retaining only important tags: Noun,
Verb, Adjective, Adverb and Conjunction
• 17
STEMMING AND TAGGING
• 24 Jul 2013 • JCDL 2013, Indiapolis, USA
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image ClassifierPRE-PROCESSING
1. TEXT SIMILARITY• tf.idf cosine-based similarity measure– Previous works have all used textual
evidence–We use it as baseline– Primary alignment component
• For each slide s, computes similarity for all sections– Probability distribution – Outputs a text alignment vector (VTs)
24 Jul 2013 JCDL 2013, Indiapolis, USA 18
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image ClassifierALIGNMENT MODALITY
2. LINEAR ORDERING24 Jul 2013 JCDL 2013, Indiapolis, USA 19
0 0 0.1 0.2 0.4 0.2 0.1 0 0
1. 2. 2.1 3. 3.1 3.2 4. 5. 5.1
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ALIGNMENT MODALITY
• Outputs a linear alignment vector (OVs) for each slide s
• Probability mass centered at E.g., A presentation with 20 slides and 9 (sub-)sections:
3. SLIDE IMAGE CLASSIFIER
Take Snapshot
Slides
1. Text
2. Outline
3. Drawing
4. Results
Image Classifier
Image
20
Pre-processin
g Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
24 Jul 2013 JCDL 2013, Indiapolis, USA
ALIGNMENT MODALITY
Note: Different classes than in the earlier analysis
CLASSIFIER RESULTS• Used a different set of 750 manually-
annotated slides• Linear SVM, using a single feature class of
Histogram of Oriented Gradients (HOG)• 10-fold cross validation
24 Jul 2013 JCDL 2013, Indiapolis, USA 21
Image Class Text Outlin
e Drawing Result Average
Recall 0.89 1.00 1.00 1.00 0.97Precision 0.84 0.94 0.82 0.83 0.85
F1 measure 0.86 0.96 0.90 0.90 0.90
Pre-processing Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
Presentation only material: Table not in paper.
MULTIMODAL FUSION• Input for each slide:
1. Text Alignment Vector VTs
2. Ordering Alignment Vector VOs
3. Class assigned from image classifier
• Define 3 weights as: WTs + WOs + Wnil = 1.00
• Tune weights according to image classes
• Apply Nil classifier
• Output for each slide: Final Alignment Vector FAVs
24 Jul 2013 JCDL 2013, Indiapolis, USA 22
Pre-processing Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
N.B.: not image evidence
RE-WEIGHTINGInitial Distribution
24 Jul 2013 JCDL 2013, Indiapolis, USA 23
1. Text 3. Drawing
2. Outline 4. Results
Slide Image ClassifierSLIDE IMAGE CLASSIFICATION
Wnil WOsWTs
RE-WEIGHTINGText Slide
24 Jul 2013 JCDL 2013, Indiapolis, USA 24
1. Text 3. Drawing
2. Outline 4. Results
Slide Image ClassifierSLIDE IMAGE CLASSIFICATION
Wnil WOsWTs
RE-WEIGHTINGOutline Slide
24 Jul 2013 JCDL 2013, Indiapolis, USA 25
1. Text 3. Drawing
2. Outline 4. Results
Slide Image ClassifierSLIDE IMAGE CLASSIFICATION
Wnil WOsWTs
RE-WEIGHTINGDrawing Slide
24 Jul 2013 JCDL 2013, Indiapolis, USA 26
1. Text 3. Drawing
2. Outline 4. Results
Slide Image ClassifierSLIDE IMAGE CLASSIFICATION
Wnil WOsWTs
Leave weights as initially uniform
EXCEPTION 1:RESULTSResults Slide
1. Text 3. Drawing
2. Outline 4. Results
Slide Image ClassifierSLIDE IMAGE CLASSIFICATION
Wnil WOsWTs
Ignore weights and
Align to “Experiment and Results” section
// end
24 Jul 2013 JCDL 2013, Indiapolis, USA 27
EXCEPTION 2: NIL CLASSIFIER
24 Jul 2013 JCDL 2013, Indiapolis, USA 28
Use a heuristic to discard nil slides from alignment:•
• Nil factor =
If Nil factor > 0.40 classify as nil
FINAL ALIGNMENT VECTORIf the exceptions do not apply, i.e.,– the slide s was not a “Results” slide,– and it was not classified as nil,
Then:– s is aligned to the section with the
highest probability in the final alignment vector:
24 Jul 2013 JCDL 2013, Indiapolis, USA 29
Pre-processing Text Alignment
nil
Ordering Alignment
Multimodal Alignment
Slide Image Classifier
EXPERIMENTSFor comparative evaluation
S1. Text-only Paragraph-to-slide alignment
To further the state-of-the-art
S2. Text-only Section-to-slide alignment
S3. S2 + Linear Ordering
S4. S3 + Image Classification
24 Jul 2013 JCDL 2013, Indiapolis, USA 30
Results
Baseline
Section
Ordering
Image Class
16%
24 Jul 2013 JCDL 2013, Indiapolis, USA
RESULTS BY SLIDE TYPE32
Num
ber
of
slid
es
nil (B
aselin
e)
nil (S
4)
Outline
(Base
line)
Outline
(S4)
Imag
e (Ba
seline
)
Imag
e (S4
)
Table
(Base
line)
Table
(S4)
Drawing
(Base
line)
Drawing
(S4)
020406080
100120140
4587
23 31 1755
4 730 44
8341
13 573
35
4 1
35 21
Correct Alignment Incorrect
24 Jul 2013 JCDL 2013, Indiapolis, USA
• Improvement in all categories• Especially in Image and nils
Recent Work. Not in published paper.
• More than 40% of slides contain elements other than text
• Baseline analysis shows the error rate: – 13% of overall incorrect alignment
on text slides.– 26% of overall incorrect alignment
on others.
• We use visual content to classify the slides– Heuristic and weights depending on slide
class
9 % 13%
SUMMARY3324 Jul 2013 JCDL 2013, Indiapolis, USA
Final system (S4)
50% reduction in targeted errors
CONCLUSION• Many slides with images and drawings,
where text is insufficient evidence for alignment.
• Visual evidence serves to drive the alignment:– As evidence (Image Classification)– As a system architecture driver (Multimodal
Fusion)
THANK YOU
3424 Jul 2013 JCDL 2013, Indiapolis, USA
BACK UP SLIDES
3524 Jul 2013 JCDL 2013, Indiapolis, USA
APPLICATIONS• Help the process of learning for beginners
by reviewing a paper along with its presentation.
• Improve the quality of the skimming process for researchers and professionals.
• Generate a large dataset of aligned slides and sections for the purpose of (semi-) automatic presentation generation.
3624 Jul 2013 JCDL 2013, Indiapolis, USA
FUTURE WORKMore accurate text similarity
measures.
Differentiate between title and body text, and account for slide formatting.
Handling slides include hyperlinks, videos, animations, or other multimedia.
24 Jul 2013 JCDL 2013, Indiapolis, USA 37
OLD SYSTEM ARCHITECTURE
Input: Presentation
Text Extraction
Textual Similarity
Input: Document
nil
Linear Ordering
1. Text 3. Drawing
2. Index 4. Results
Multimodal Fusion
Slide Image Classifier
Output: Alignment Map
3824 Jul 2013 JCDL 2013, Indiapolis, USA
OLD WEIGHT TUNING 1. Text
Text similarity alignment weight (WTs) Increase 2/3
2. OutlineText similarity alignment weight (WTs) Decrease 1/3Linear ordering alignment weight (WOs) Decrease
1/3
3. DrawingUniform probability for all weights
4. ResultExceptional rule: Align directly to “Experiment and
Result” section
3924 Jul 2013 JCDL 2013, Indiapolis, USA