23
Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Embed Size (px)

Citation preview

Page 1: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Hierarchical Segmentation:Finding Changes in a Text Signal

Malcolm Slaney and Dulce Ponceleon

IBM Almaden Research Center

Page 2: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Problem Statement Problem

How do we browse video? Goal

Create a table-of-contents Solution

Look for topic changes in text

Page 3: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

TOC Example

Chapter 1

Chapter 2

Page 4: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Overview of This Talk Goal and approach Latent semantic indexing (LSI) Scale space Combination Results

LSIScaleSpaceFilter

Segment

Page 5: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Approach Sentences -> Semantic Space Filter at multiple scales Look for large jumps Three subjects (loops) shown

Loop 1: Polychromaticity Artifacts Loop 2: Emission Tomography Loop 3: Ultrasound Tomography

Page 6: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Courtesy of Jianbo Shi (CMU)

Building on Previous Work LSI and clustering Text tiling Change point analysis Segmentation Scale space

Page 7: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Latent Semantic Indexing Collect histogram of word

frequencies Use SVD to capture frequent

combinations Orthogonal decomposition

Represent in low-dimensional space

Word

s

Docs Docs

10D

Page 8: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI Within a Document Split into chunks

Fixed size Sentences

Compute histograms Perform SVD Look at results Sources

“Principles of Computerized Tomographic Imaging”

PBS News Hour

Page 9: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI – 2D Projection

Chapter 4 of Principles of Computerized Tomographic Imaging

Page 10: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI – Self-similarity Measure

similarity Cosine of angle

between “documents”

Plot all pairs of chunks/sentences

Look for block diagonal

Chapter 4 of Principles of Computerized Tomographic Imaging

Page 11: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Scale-space Filtering What size are the features? Look at different scales! Continuous scale Used for

Object Recognition Feature Detection

Page 12: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Scale-space Movie Green line

marks best high-level segmentation

10d semantic space

Scale varies from 1 to 400 sentences

Page 13: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Scale-space Segmentation Low pass filter signal Form image of scale vs. time Look for changes Track peaks of vector derivative

across scale

Page 14: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Scale-space Example

Derivative as function of scale and sentence

Page 15: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI and Scale Space Putting it all together Split document/transcript Perform LSI analysis Look at change in angle Perform scale-space segmentation Show tree

Page 16: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Scale-Space Image

Peaks in scale-space derivative

Peaks traced to their origin

Page 17: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Results – CT Comparison

Scale-Space Book Headings

Page 18: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Results – News Comparison

Scale-Space Ground Truth

Page 19: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Results – Autocorrelation Block

sentences Measure

correlation Positive

Peak Anti-

correlation

Page 20: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Discussion Issues Evaluation (and ground truth)

Lafferty’s measure Temporal properties

Histogram/SVD chunking size Autocorrelation

Page 21: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

Computational Effort Histogram: O(N) SVD: O(N3) Scale space: O(N2) N < 1000

Number of sentences in a video or document is not large

Page 22: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI Document Lookup Histogram documents Entropy term weighting Compute SVD Use first 10-100 vectors to model

space Encode query as histogram Look for documents in similar

direction

Page 23: Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM Almaden Research Center

LSI Example Collection of

book titles Differential

equations vs. algorithms and applications