1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

1

CS 430 / INFO 430 Information Retrieval

Lecture 23

Non-Textual Materials 2

2

Course Administration

Assignment 3

Grades and comments will be sent out tomorrow

Assignment 4 has been posted

3

Automatic Creation of Surrogates for Non-textual Materials

Discovery of non-textual materials usually requires surrogates

• How far can these surrogates be created automatically?

• Automatically created surrogates are much less expensive than manually created, but have high error rates.

• If surrogates have high rates of error, is it possible to have effective information discovery?

4

Example: Informedia Digital Video Library

Collections: Segments of video programs, e.g., TV and radio news and documentary broadcasts. Cable Network News, British Open University, WQED television.

Segmentation: Automatically broken into short segments of video, such as the individual items in a news broadcast.

Size: More than 4,000 hours, 2 terabyte.

Objective: Research into automatic methods for organizing and retrieving information from video.

Funding: NSF, DARPA, NASA and others.

Principal investigator: Howard Wactlar (Carnegie Mellon University).

5

Informedia Digital Video Library

History

• Carnegie Mellon has broad research programs in speech recognition, image recognition, natural language processing.

• 1994. Basic mock-up demonstrated the general concept of a system using speech recognition to build an index from a sound track matched against spoken queries. (DARPA funded.)

• 1994-1998. Informedia developed the concept of multi-modal information discovery with a series of users interface experiments. (NSF/DARPA/NASA Digital Libraries Initiative.)

• 1998 - . Continued research and commercial spin-off (which failed).

6

The Challenge

A video sequence is awkward for information discovery:

• Textual methods of information retrieval cannot be applied

• Browsing requires the user to view the sequence. Fast skimming is difficult.

• Computing requirements are demanding (MPEG-1 requires 1.2 Mbits/sec).

Surrogates are required

7

Multi-Modal Information Discovery

The multi-modal approach to information retrieval

Computer programs to analyze video materials for clues e.g., changes of scene

• methods from artificial intelligence, e.g., speech recognition, natural language processing, image recognition.

• analysis of video track, sound track, closed captioning if present, any other information.

Each mode gives imperfect information. Therefore use many approaches and combine the evidence.

8

Informedia Library Creation

Video Audio Text

Speech recognition

Image extraction

Natural language interpretation

SegmentationSegments

with derived metadata

9

Informedia: Information Discovery

User

Segments with derived

metadata

Browsing via multimedia surrogates

Querying via natural

languageRequested segments

and metadata

10

Text Extraction

Source

Sound track: Automatic speech recognition using Sphinx II and III recognition systems. (Unrestricted vocabulary, speaker independent, multi-lingual, background sounds). Error rates 25% up.

Closed captions: Digitally encoded text. (Not on all video. Often inaccurate.)

Text on screen: Can be extracted by image recognition and optical character recognition. (Matches speaker with name.)

Query

Spoken query: Automatic speech recognition using the same system as is used to index the sound track.

Typed by user

11

Image Understanding

Informedia has developed specialized tools for various aspects of image understanding

• scene break detection

segmentation

icon selection

• image similarity matching

• camera motion and object tracking

• video-OCR (recognize text on screen)

• face detection and association

12

Multimodal Metadata Extraction

13

An Evaluation Experiment

Test corpus:

• 602 news stories from CNN, etc. Average length 672 words.

• Manually transcribed to obtained accurate text.

• Speech recognition of text using Sphinx II (50.7% error rate)

• Errors introduced artificially to give error rates from 0% to 80%.

• Relative precision and recall (using a vector ranking) were used as measures of retrieval performance.

As word error rate increased from 0% to 50%:

• Relative precision fell from 80% to 65%

• Relative recall fell from 90% to 80%

14

Speech recognition and retrieval performance

15

User Interface Concepts

Users need a variety of ways to search and browse, depending on the task being carried out and preferred style of working

• Visual icons

one-line headlinesfilm strip viewsvideo skimstranscript following of audio track

• Collages

• Semantic zooming

• Results set

• Named faces

• Skimming

16

17

Thumbnails, Filmstrips and Video Skims

Thumbnail:

• A single image that illustrates the content of a video

Filmstrip:

• A sequence of thumbnails that illustrate the flow of a video segment

Video skim:

• A short video that summarizes the contents of a longer sequence, by combining shorter sequences of video and sound that provide an overview of the full sequence

18

Creating a Filmstrip

Separate video sequence into shots

• Use techniques from image recognition to identify dramatic changes in scene. Frames with similar color characteristics are assumed to be part of a single shot.

Choose a sample frame

• Default is to select the middle frame from the shot.

• If camera motion, select frame where motion ends.

User feedback:

• Frames are tied to time sequence.

19

Creating Video Skims

Static:

• Precomputed based on video and audio phrases

• Fixed compression, e.g., one minute skim of 10 minute sequence

Dynamic:

• After a query, skim is created to emphasize context of the hit

• Variable compression selected by user

• Adjustable during playback

20

Limits to Scalability

Informedia has demonstrated effective information discovery with moderately large collections

Problems with increased scale:

• Technical -- storage, bandwidth, etc.

• Diversity of content -- difficult to tune heuristics

• User interfaces -- complexity of browsing grows with scale

21

Lessons Learned

• Searching and browsing must be considered integrated parts of a single information discovery process.

• Data (content and metadata), computing systems (e.g., search engines), and user interfaces must be designed together.

• Multi-modal methods compensate for incomplete or error-prone data.

22

CS 430 / INFO 430 Information Retrieval

Lecture 23

Architecture of Information Retrieval Systems

23

Basic Architecture 1: Single Homogeneous Collection

• Documents and indexes are held on a single computer system (may be several computers).

• The user interface and search methods are selected for the specific service.

Examples: Medline (medical information) Cornell University library catalog

Index

Documents

24

Basic Architecture 2: Several Similar Collections -- One Computer System

• Several more or less similar collections are held on a single computer system.

• Each collection is indexed separately using the same software, procedures, algorithms, etc. (but tuned for each collection, e.g., stoplists).

• The user interface is the same (or very similar) for each service.

Examples: OCLC's FirstSearch

25

Distributed Architecture 1: Standard Search Protocols

Find x

Find x

Strict adherence to standards allows any user interface to search any conforming search service.

26

Distributed Architecture 2: Broadcast Search (a.k.a. Federated Search)

Find xInterface Service

An interface server broadcasts a query to each collection, combines the results and returns them to the user.

Examples: Dienst (digital library protocol), Web metasearch services

27

Distributed Architecture 3: Centralized Search Services

Find x

Batch indexing: Metadata about all items is accumulated in a central system.

Real-time searching: The user (a) searches the central system, and (b) retrieves items from collections.

Examples: Union catalogs, Web search services

Search Service

retrieve

search