57
2002.09.17 - SLIDE 1 IS 202 – FALL 2002 Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 http://www.sims.berkeley.edu/academics/courses/ is202/f02/ SIMS 202: Information Organization and Retrieval Lecture 08: Media Streams

Metadata for Motion Pictures: Media Streams

  • Upload
    ronny72

  • View
    308

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 1IS 202 – FALL 2002

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2002http://www.sims.berkeley.edu/academics/courses/is202/f02/

SIMS 202:

Information Organization

and Retrieval

Lecture 08: Media Streams

Page 2: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 2IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Current Approaches

• Representing Media

• New Solutions

• Methodological Considerations

• Future Work

Page 3: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 3IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Current Approaches

• Representing Media

• New Solutions

• Methodological Considerations

• Future Work

Page 4: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 4IS 202 – FALL 2002

What is the Problem?

• Today people cannot easily create, find, edit, share, and reuse media

• Computers don’t understand media content– Media is opaque and data rich– We lack structured representations

• Without content representation (metadata), manipulating digital media will remain like word-processing with bitmaps

Page 5: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 5IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Current Approaches

• Representing Media

• New Solutions

• Methodological Considerations

• Future Work

Page 6: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 6IS 202 – FALL 2002

The Search for Solutions

• Current approaches to creating metadata don’t work– Signal-based analysis– Keywords– Natural language

• Need standardized metadata framework– Designed for video and rich media data– Human and machine readable and writable– Standardized and scaleable– Integrated into media capture, archiving, editing,

distribution, and reuse

Page 7: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 7IS 202 – FALL 2002

Signal-Based Parsing

• Practical problem– Parsing unstructured, unknown video is very,

very hard

• Theoretical problem– Mismatch between percepts and concepts

Page 8: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 8IS 202 – FALL 2002

Why Keywords Don’t Work

• Are not a semantic representation

• Do not describe relations between descriptors

• Do not describe temporal structure

• Do not converge

• Do not scale

Page 9: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 9IS 202 – FALL 2002

Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.

Natural Language vs. Visual Language

Page 10: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 10IS 202 – FALL 2002

Natural Language vs. Visual Language

Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.

Page 11: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 11IS 202 – FALL 2002

Notation for Time-Based Media: Music

Page 12: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 12IS 202 – FALL 2002

Visual Language Advantages

• A language designed as an accurate and readable representation of time-based media– For video, especially important for actions,

expressions, and spatial relations

• Enables Gestalt view and quick recognition of descriptors due to designed visual similarities

• Supports global use of annotations

Page 13: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 13IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Current Approaches

• Representing Media

• New Solutions

• Methodological Considerations

• Future Work

Page 14: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 14IS 202 – FALL 2002

Representing Video

• Streams vs. Clips

• Video syntax and semantics

• Ontological issues in video representation

Page 15: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 15IS 202 – FALL 2002

Video is Temporal

Stream of 100 Frames of Video

A Clip from Frame 47 to Frame 68 with Descriptors

Page 16: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 16IS 202 – FALL 2002

Streams vs. Clips

The Stream of 100 Frames of Video with 6 Annotations Resulting in ManyPossible Segmentations of the Stream

Stream of 100 Frames of Video

Page 17: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 17IS 202 – FALL 2002

Stream-Based Representation

• Makes annotation pay off– The richer the annotation, the more numerous the

possible segmentations of the video stream

• Clips – Change from being fixed segmentations of the video

stream, to being the results of retrieval queries based on annotations of the video stream

• Annotations– Create representations which make clips, not

representations of clips

Page 18: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 18IS 202 – FALL 2002

Video Syntax and Semantics

• The Kuleshov Effect

• Video has a dual semantics

– Sequence-independent invariant semantics of shots

– Sequence-dependent variable semantics of shots

Page 19: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 19IS 202 – FALL 2002

Ontological Issues for Video

• Video plays with rules for identity and continuity

– Space

– Time

– Character

– Action

Page 20: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 20IS 202 – FALL 2002

Space and Time: Actual vs. Inferable

• Actual Recorded Space and Time– GPS– Studio space and time

• Inferable Space and Time– Establishing shots– Cues and clues

Page 21: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 21IS 202 – FALL 2002

Time: Temporal Durations

• Story (Fabula) Duration– Example: Brushing teeth in story world (5 minutes)

• Plot (Syuzhet) Duration– Example: Brushing teeth in plot world (1 minute: 6

steps of 10 seconds each)

• Screen Duration– Example: Brushing teeth (10 seconds: 2 shots of 5

seconds each)

Page 22: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 22IS 202 – FALL 2002

Character and Continuity

• Identity of character is constructed through– Continuity of actor– Continuity of role

• Alternative continuities– Continuity of actor only– Continuity of role only

Page 23: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 23IS 202 – FALL 2002

Representing Action

• Physically-based description for sequence-independent action semantics– Abstract vs. conventionalized descriptions– Temporally and spatially decomposable

actions and subactions

• Issues in describing sequence-dependent action semantics– Mental states (emotions vs. expressions)– Cultural differences (e.g., bowing vs. greeting)

Page 24: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 24IS 202 – FALL 2002

“Cinematic” Actions

• Cinematic actions support the basic narrative structure of cinema– Reactions/Proactions

• Nodding, screaming, laughing, etc.

– Focus of Attention• Gazing, headturning, pointing, etc.

– Locomotion• Walking, running, etc.

• Cinematic actions can occur• Within the frame/shot boundary• Across the frame boundary• Across shot boundaries

Page 25: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 25IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Current Approaches

• Representing Media

• New Solutions

• Methodological Considerations

• Future Work

Page 26: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 26IS 202 – FALL 2002

New Solutions for Creating Metadata

After Capture During Capture

Page 27: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 27IS 202 – FALL 2002

After Capture: Media Streams

Page 28: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 28IS 202 – FALL 2002

Media Streams Features

• Key features– Stream-based representation (better segmentation)– Semantic indexing (what things are similar to)– Relational indexing (who is doing what to whom)– Temporal indexing (when things happen)– Iconic interface (designed visual language)– Universal annotation (standardized markup schema)

• Key benefits– More accurate annotation and retrieval– Global usability and standardization– Reuse of rich media according to content and structure

Page 29: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 29IS 202 – FALL 2002

Media Streams GUI Components

• Media Time Line

• Icon Space– Icon Workshop– Icon Palette

Page 30: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 30IS 202 – FALL 2002

Media Time Line

• Visualize video at multiple time scales

• Write and read multi-layered iconic annotations

• One interface for annotation, query, and composition

Page 31: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 31IS 202 – FALL 2002

Media Time Line

Page 32: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 32IS 202 – FALL 2002

Icon Space

• Icon Workshop– Utilize categories of video representation– Create iconic descriptors by compounding iconic

primitives– Extend set of iconic descriptors

• Icon Palette– Dynamically group related sets of iconic descriptors– Reuse descriptive effort of others– View and use query results

Page 33: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 33IS 202 – FALL 2002

Icon Space

Page 34: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 34IS 202 – FALL 2002

Icon Space: Icon Workshop

• General to specific (horizontal)– Cascading hierarchy of icons with increasing

specificity on subordinate levels

• Combinatorial (vertical)– Compounding of hierarchically organized

icons across multiple axes of description

Page 35: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 35IS 202 – FALL 2002

Icon Space: Icon Workshop Detail

Page 36: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 36IS 202 – FALL 2002

Icon Space: Icon Palette

• Dynamically group related sets of iconic descriptors

• Collect icon sentences

• Reuse descriptive effort of others

Page 37: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 37IS 202 – FALL 2002

Icon Space: Icon Palette Detail

Page 38: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 38IS 202 – FALL 2002

Video Retrieval In Media Streams

• Same interface for annotation and retrieval

• Assembles responses to queries as well as finds them

• Query responses use semantics to degrade gracefully

Page 39: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 39IS 202 – FALL 2002

Media Streams Technologies

• Minimal video representation distinguishing syntax and semantics

• Iconic visual language for annotating and retrieving video content

• Retrieval-by-composition methods for repurposing video

Page 40: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 40IS 202 – FALL 2002

New Solutions for Creating Metadata

After Capture During Capture

Page 41: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 41IS 202 – FALL 2002

Creating Metadata During Capture

New Capture Paradigm

1 Good Capture Drives

Multiple Uses

Current Capture Paradigm

Multiple Captures To Get

1 Good Capture

Page 42: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 42IS 202 – FALL 2002

Active Capture

• Active engagement and communication among the capture device, agent(s), and the environment

• Re-envision capture as a control system with feedback

• Use multiple data sources and communication to simplify the capture scenario

• Use HCI to support “human-in-the-loop” algorithms for computer vision and audition

Page 43: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 43IS 202 – FALL 2002

Active Capture

Processing

Capture Interaction

ActiveCapture

ComputerVision

HCI

Direction/Cinematography

Page 44: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 44IS 202 – FALL 2002

Automated Capture: Good Capture

Page 45: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 45IS 202 – FALL 2002

Automated Capture: Error Handling

Page 46: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 46IS 202 – FALL 2002

Evolution of Media Production

• Customized production– Skilled creation of one media product

• Mass production– Automatic replication of one media product

• Mass customization– Skilled creation of adaptive media templates– Automatic production of customized media

Page 47: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 47IS 202 – FALL 2002

• Movies change from being static data to programs

• Shots are inputs to a program that computes new media based on content representation and functional dependency (US Patents 6,243,087 & 5,969,716)

Central Idea: Movies as Programs

Parser

Parser

Producer

Media

Media

Media

ContentRepresentation

ContentRepresentation

Page 48: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 48IS 202 – FALL 2002

Jim Lanahan in an MCI Ad

Page 49: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 49IS 202 – FALL 2002

Jim Lanahan in an @Home Banner

Page 50: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 50IS 202 – FALL 2002

Automated Media Production Process

2 Annotationand Retrieval

Asset Retrieval and Reuse

Web Integration and

Streaming MediaServices

FlashGenerator

WAP

HTML Email

Print/PhysicalMedia

AutomatedCapture

1Automatic

Editing3

PersonalizedDelivery

4

Annotation ofMedia Assets

Reusable Online Asset Database

Adaptive Media Engine

Page 51: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 51IS 202 – FALL 2002

Proposed Technology Architecture

Media Processing

DB

Analysis Engine

Interaction Engine

Adaptive MediaEngine

Annotation andRetrieval Engine

(MPEG 7)

DeliveryEngine

OSMedia

CaptureFile

AVOut

NetworkDeviceControl

Page 52: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 52IS 202 – FALL 2002

Lecture 08: Media Streams

• Problem Setting

• Representing Media

• Current Approaches

• New Solutions

• Methodological Considerations

• Future Work

Page 53: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 53IS 202 – FALL 2002

Non-Technical Challenges

• Standardization of media metadata (MPEG-7)

• Broadband infrastructure and deployment

• Intellectual property and economic models for sharing and reuse of media assets

Page 54: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 54IS 202 – FALL 2002

Technical Research Challenges

• Develop end-to-end metadata system for automated media capture, processing, management, and reuse

• Creating metadata– Represent action sequences and higher level narrative

structures– Integrate legacy metadata (keywords, natural language)– Gather more and better metadata at the point of capture

(develop metadata cameras)– Develop “human-in-the-loop” indexing algorithms and interfaces

• Using metadata– Develop media components (MediaLego)– Integrate linguistic and other query interfaces

Page 55: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 55IS 202 – FALL 2002

For More Info

• Marc Davis Web Site– www.sims.berkeley.edu/~marc

• Spring 2003 course on “Multimedia Information” at SIMS

• URAP and GSR positions

• TidalWave II “New Media” program

Page 56: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 56IS 202 – FALL 2002

Next Time

• Metadata for Motion Pictures: MPEG-7 (MED)

• Readings for next time (in Protected)– “MPEG-7: The Generic Multimedia Content

Description Interface, Part 1” (J. M. Martinez, R. Koenen, F. Pereira)

– “MPEG-7: Overview of MPEG-7 Description Tools, Part 2” (J. Martinez)

Page 57: Metadata for Motion Pictures: Media Streams

2002.09.17 - SLIDE 57IS 202 – FALL 2002

Homework (!)

• Assignment 4: Revision of Photo Metadata Design and Project Presentation

– Due by Monday, September 23 • Completed (Revised) Photo Classifications and Annotated

Photos– [groupname]_classification.xls file– [groupname]_photos.xls file

– Due by Thursday, September 26 • Group Presentation

– 2 minutes: Presentation of application idea– 6 minutes: Presentation of classification and photo browser– 2 minutes: residual time for completing explanations and Q + A

• Photo Browser Page (will be sent to you)