27
1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

Embed Size (px)

Citation preview

Page 1: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

1

CS 430 / INFO 430 Information Retrieval

Lecture 23

Non-Textual Materials 2

Page 2: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

2

Course Administration

Assignment 3

Grades and comments will be sent out tomorrow

Assignment 4 has been posted

Page 3: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

3

Automatic Creation of Surrogates for Non-textual Materials

Discovery of non-textual materials usually requires surrogates

• How far can these surrogates be created automatically?

• Automatically created surrogates are much less expensive than manually created, but have high error rates.

• If surrogates have high rates of error, is it possible to have effective information discovery?

Page 4: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

4

Example: Informedia Digital Video Library

Collections: Segments of video programs, e.g., TV and radio news and documentary broadcasts. Cable Network News, British Open University, WQED television.

Segmentation: Automatically broken into short segments of video, such as the individual items in a news broadcast.

Size: More than 4,000 hours, 2 terabyte.

Objective: Research into automatic methods for organizing and retrieving information from video.

Funding: NSF, DARPA, NASA and others.

Principal investigator: Howard Wactlar (Carnegie Mellon University).

Page 5: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

5

Informedia Digital Video Library

History

• Carnegie Mellon has broad research programs in speech recognition, image recognition, natural language processing.

• 1994. Basic mock-up demonstrated the general concept of a system using speech recognition to build an index from a sound track matched against spoken queries. (DARPA funded.)

• 1994-1998. Informedia developed the concept of multi-modal information discovery with a series of users interface experiments. (NSF/DARPA/NASA Digital Libraries Initiative.)

• 1998 - . Continued research and commercial spin-off (which failed).

Page 6: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

6

The Challenge

A video sequence is awkward for information discovery:

• Textual methods of information retrieval cannot be applied

• Browsing requires the user to view the sequence. Fast skimming is difficult.

• Computing requirements are demanding (MPEG-1 requires 1.2 Mbits/sec).

Surrogates are required

Page 7: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

7

Multi-Modal Information Discovery

The multi-modal approach to information retrieval

Computer programs to analyze video materials for clues e.g., changes of scene

• methods from artificial intelligence, e.g., speech recognition, natural language processing, image recognition.

• analysis of video track, sound track, closed captioning if present, any other information.

Each mode gives imperfect information. Therefore use many approaches and combine the evidence.

Page 8: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

8

Informedia Library Creation

Video Audio Text

Speech recognition

Image extraction

Natural language interpretation

SegmentationSegments

with derived metadata

Page 9: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

9

Informedia: Information Discovery

User

Segments with derived

metadata

Browsing via multimedia surrogates

Querying via natural

languageRequested segments

and metadata

Page 10: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

10

Text Extraction

Source

Sound track: Automatic speech recognition using Sphinx II and III recognition systems. (Unrestricted vocabulary, speaker independent, multi-lingual, background sounds). Error rates 25% up.

Closed captions: Digitally encoded text. (Not on all video. Often inaccurate.)

Text on screen: Can be extracted by image recognition and optical character recognition. (Matches speaker with name.)

Query

Spoken query: Automatic speech recognition using the same system as is used to index the sound track.

Typed by user

Page 11: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

11

Image Understanding

Informedia has developed specialized tools for various aspects of image understanding

• scene break detection

segmentation

icon selection

• image similarity matching

• camera motion and object tracking

• video-OCR (recognize text on screen)

• face detection and association

Page 12: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

12

Multimodal Metadata Extraction

Page 13: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

13

An Evaluation Experiment

Test corpus:

• 602 news stories from CNN, etc. Average length 672 words.

• Manually transcribed to obtained accurate text.

• Speech recognition of text using Sphinx II (50.7% error rate)

• Errors introduced artificially to give error rates from 0% to 80%.

• Relative precision and recall (using a vector ranking) were used as measures of retrieval performance.

As word error rate increased from 0% to 50%:

• Relative precision fell from 80% to 65%

• Relative recall fell from 90% to 80%

Page 14: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

14

Speech recognition and retrieval performance

Page 15: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

15

User Interface Concepts

Users need a variety of ways to search and browse, depending on the task being carried out and preferred style of working

• Visual icons

one-line headlinesfilm strip viewsvideo skimstranscript following of audio track

• Collages

• Semantic zooming

• Results set

• Named faces

• Skimming

Page 16: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

16

Page 17: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

17

Thumbnails, Filmstrips and Video Skims

Thumbnail:

• A single image that illustrates the content of a video

Filmstrip:

• A sequence of thumbnails that illustrate the flow of a video segment

Video skim:

• A short video that summarizes the contents of a longer sequence, by combining shorter sequences of video and sound that provide an overview of the full sequence

Page 18: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

18

Creating a Filmstrip

Separate video sequence into shots

• Use techniques from image recognition to identify dramatic changes in scene. Frames with similar color characteristics are assumed to be part of a single shot.

Choose a sample frame

• Default is to select the middle frame from the shot.

• If camera motion, select frame where motion ends.

User feedback:

• Frames are tied to time sequence.

Page 19: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

19

Creating Video Skims

Static:

• Precomputed based on video and audio phrases

• Fixed compression, e.g., one minute skim of 10 minute sequence

Dynamic:

• After a query, skim is created to emphasize context of the hit

• Variable compression selected by user

• Adjustable during playback

Page 20: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

20

Limits to Scalability

Informedia has demonstrated effective information discovery with moderately large collections

Problems with increased scale:

• Technical -- storage, bandwidth, etc.

• Diversity of content -- difficult to tune heuristics

• User interfaces -- complexity of browsing grows with scale

Page 21: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

21

Lessons Learned

• Searching and browsing must be considered integrated parts of a single information discovery process.

• Data (content and metadata), computing systems (e.g., search engines), and user interfaces must be designed together.

• Multi-modal methods compensate for incomplete or error-prone data.

Page 22: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

22

CS 430 / INFO 430 Information Retrieval

Lecture 23

Architecture of Information Retrieval Systems

Page 23: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

23

Basic Architecture 1: Single Homogeneous Collection

• Documents and indexes are held on a single computer system (may be several computers).

• The user interface and search methods are selected for the specific service.

Examples: Medline (medical information) Cornell University library catalog

Index

Documents

Page 24: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

24

Basic Architecture 2: Several Similar Collections -- One Computer System

• Several more or less similar collections are held on a single computer system.

• Each collection is indexed separately using the same software, procedures, algorithms, etc. (but tuned for each collection, e.g., stoplists).

• The user interface is the same (or very similar) for each service.

Examples: OCLC's FirstSearch

Page 25: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

25

Distributed Architecture 1: Standard Search Protocols

Find x

Find x

Strict adherence to standards allows any user interface to search any conforming search service.

Page 26: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

26

Distributed Architecture 2: Broadcast Search (a.k.a. Federated Search)

Find xInterface Service

An interface server broadcasts a query to each collection, combines the results and returns them to the user.

Examples: Dienst (digital library protocol), Web metasearch services

Page 27: 1 CS 430 / INFO 430 Information Retrieval Lecture 23 Non-Textual Materials 2

27

Distributed Architecture 3: Centralized Search Services

Find x

Batch indexing: Metadata about all items is accumulated in a central system.

Real-time searching: The user (a) searches the central system, and (b) retrieves items from collections.

Examples: Union catalogs, Web search services

Search Service

retrieve

search