Hierarchical Segmentation: Finding Changes in a Text Signal Malcolm Slaney and Dulce Ponceleon IBM...

Preview:

Citation preview

Hierarchical Segmentation:Finding Changes in a Text Signal

Malcolm Slaney and Dulce Ponceleon

IBM Almaden Research Center

Problem Statement Problem

How do we browse video? Goal

Create a table-of-contents Solution

Look for topic changes in text

TOC Example

Chapter 1

Chapter 2

Overview of This Talk Goal and approach Latent semantic indexing (LSI) Scale space Combination Results

LSIScaleSpaceFilter

Segment

Approach Sentences -> Semantic Space Filter at multiple scales Look for large jumps Three subjects (loops) shown

Loop 1: Polychromaticity Artifacts Loop 2: Emission Tomography Loop 3: Ultrasound Tomography

Courtesy of Jianbo Shi (CMU)

Building on Previous Work LSI and clustering Text tiling Change point analysis Segmentation Scale space

Latent Semantic Indexing Collect histogram of word

frequencies Use SVD to capture frequent

combinations Orthogonal decomposition

Represent in low-dimensional space

Word

s

Docs Docs

10D

LSI Within a Document Split into chunks

Fixed size Sentences

Compute histograms Perform SVD Look at results Sources

“Principles of Computerized Tomographic Imaging”

PBS News Hour

LSI – 2D Projection

Chapter 4 of Principles of Computerized Tomographic Imaging

LSI – Self-similarity Measure

similarity Cosine of angle

between “documents”

Plot all pairs of chunks/sentences

Look for block diagonal

Chapter 4 of Principles of Computerized Tomographic Imaging

Scale-space Filtering What size are the features? Look at different scales! Continuous scale Used for

Object Recognition Feature Detection

Scale-space Movie Green line

marks best high-level segmentation

10d semantic space

Scale varies from 1 to 400 sentences

Scale-space Segmentation Low pass filter signal Form image of scale vs. time Look for changes Track peaks of vector derivative

across scale

Scale-space Example

Derivative as function of scale and sentence

LSI and Scale Space Putting it all together Split document/transcript Perform LSI analysis Look at change in angle Perform scale-space segmentation Show tree

Scale-Space Image

Peaks in scale-space derivative

Peaks traced to their origin

Results – CT Comparison

Scale-Space Book Headings

Results – News Comparison

Scale-Space Ground Truth

Results – Autocorrelation Block

sentences Measure

correlation Positive

Peak Anti-

correlation

Discussion Issues Evaluation (and ground truth)

Lafferty’s measure Temporal properties

Histogram/SVD chunking size Autocorrelation

Computational Effort Histogram: O(N) SVD: O(N3) Scale space: O(N2) N < 1000

Number of sentences in a video or document is not large

LSI Document Lookup Histogram documents Entropy term weighting Compute SVD Use first 10-100 vectors to model

space Encode query as histogram Look for documents in similar

direction

LSI Example Collection of

book titles Differential

equations vs. algorithms and applications

Recommended