Upload
august-jacobs
View
215
Download
3
Tags:
Embed Size (px)
Citation preview
A Motivating Scenario for Designing an Extensible Audio-Visual Description
Language
Monday 25th of October, 2004
Raphaël Troncy, Jean Carrive, Steffen Lalande and Jean-Philippe Poli
Raphaël Troncy CoRIMedia - 10/25/2004 2
Description of the AV content
• Various uses / Different granularity :– identification of the content creator and the content
provider: Dublin Core metadata, VRA core categories, TV Anytime metadata …
– feature extraction from the video signal: storing and exchanging automatic tools results (MPEG-7)
– structural decomposition in video segments corresponding to a logical structure of the program: time-code, spatial coordinates
– semantic description of these segments: controlled vocabulary, thesaurus, free text annotation
Raphaël Troncy CoRIMedia - 10/25/2004 3
Description of the AV content(cultural heritage point of view)
• Segmentation– locate and date some
events• Description
– type each segment with an AV genre
– type each segment with a general thematic
– give hints on the production– describe the scene (who,
when, where, what, …)
time t
report
athletics
Michael Johnson smashed the 200mworld record to complete a 200m in
19''32 in Atlanta for the Olympic Games
fade in/out
⇒ needs a powerful description language
Raphaël Troncy CoRIMedia - 10/25/2004 4
Motivating scenario• Generic application for describing manually TV
programs w.r.t:– structural constraints: patterns represent the logical
structure of a document– semantic constraints: the description of the content is
machine understandable
• Let us define the temporal structure of a Sports Magazine
Raphaël Troncy CoRIMedia - 10/25/2004 5
MPEG-7, the natural candidate description language?
• ISO standard since December of 2001
• Main components:– Descriptors (Ds) and
Description Schemes (DSs)
– DDL (XML Schema + extensions)
• Concern all types of media
Basic datatypes
Links & media localization
Basic Tools
Models
Basic elements
Navigation & Access
Content management
Content description
Collections
Summaries
Variations
Content organization
Creation & Production
Media Usage
Semantic aspects
Structural aspects
User interaction
User Preferences
Schema Tools
User History Views Views
Part 5 - MDS
Raphaël Troncy CoRIMedia - 10/25/2004 6
MPEG-7: a non-suitable description language for this scenario
1. A non-extensible language• closed set of descriptors
2. Exchange syntax rather than a real machine processable multimedia description language
• non object-based data model• non modular language (universal approach)
3. No formal semantics provided• applications cannot have access to the meaning of
the documents
⇒ the DDL (XML Schema) fault ?
Raphaël Troncy CoRIMedia - 10/25/2004 7
MPEG-7: a non-suitable description language for this scenario
⇒ how to reconciliate the critical issue object-oriented semantic expression
versus structural validation
• How to define new descriptors ?• How to define new description schemes ?• How to make the description machine
understandable ?
Raphaël Troncy CoRIMedia - 10/25/2004 8
Our proposition: AVDL
• AVDL: a reduced yet extensible audio-visual description language– an object meta-model (an instance model specifies
the vocabulary for and the rules followed by the descriptions)
– an XML syntax– a semantics (closed to DL for the descriptors)
• Description Schemes– Descriptors– Properties– Structures
• Descriptions– valid instances w.r.t
description schemes
Raphaël Troncy CoRIMedia - 10/25/2004 9
The meta class level
Raphaël Troncy CoRIMedia - 10/25/2004 10
The class level
Raphaël Troncy CoRIMedia - 10/25/2004 11
Location
Raphaël Troncy CoRIMedia - 10/25/2004 12
Document, Content and Media
• Distinction :– Document vs Content vs
Media– Virtual content vs
physical content
• Media: a content abstraction for decomposition– audio tracks, subtitles
Raphaël Troncy CoRIMedia - 10/25/2004 13
Defining Structures• A structure defines how the descriptors may and have to be
combined– allows a description control– allows an automatic completion of the descriptions
• AVDL provides some predefined structure models– containment : gives the list of the possible sub-segments of an AV
segment (in space and in time)– regular expression : by analogy of grammar for temporal succession
• Other models are currently studied: temporal constraints, etc.
Raphaël Troncy CoRIMedia - 10/25/2004 14
AVDL Implementation
• XML Serialization– Independent from a schema language– Use XML Schema validation (mainly for
datatypes)
• C#– Object inheritance– Use of the .NET reflexivity
Raphaël Troncy CoRIMedia - 10/25/2004 15
d-162.xmlds-17.xml
avdl.xsd
XML Serialization
Audio-VisualDescriptionLanguage
DescriptionSchemes
Descriptions
ds-17.xsdpartialcontrol
transformation
partial control
Raphaël Troncy CoRIMedia - 10/25/2004 16
XML Syntax (DS)
<Descriptor xsi:type="LocatedDescriptorType" id="id-d2" name="Tracking">
<Property ref="id-p2"/>
<Structure ref="id-s2"/>
<DescriptionRelationship characterization="string">
<Location type="TemporalInterval"/>
<Media type="Media"/>
</DescriptionRelationship>
</Descriptor>
<Property id="id-p2" name="nbDetection">
<Domain descriptor="id-d2"/>
<Range>
<Primitive nameType="int"/>
</Range>
</Property>
<Structure id="id-s2" name="TrackingStructure">
<FormalModel>
<Constraint type="temporal" validation="full" method="system
parser="XMLSchema">
<xsd:sequence minOccurs="0" maxOccurs="unbounded">
<xsd:element name="Detection" type="DetectionType"/>
</xsd:sequence>
</Constraint>
</FormalModel>
</Structure>
Raphaël Troncy CoRIMedia - 10/25/2004 17
XML Syntax (Descriptions)
<Tracking type="LocatedDescriptorType" nbDetection="1">
<DescriptionRelationship>
<Location>
<avdl:Begin timeRef="147329280"/><avdl:End timeRef="147329280"/>
</Location>
<Media id="CPB86006610.mpg" name="CPB86006610.mpg" contentID="CPB86006610.mpg"/>
</DescriptionRelationship>
<Structure constraintType="temporal">
<Detection type="LocatedDescriptorType" nbFeature="1">
<DescriptionRelationship>
<Location>
<avdl:Instant timeRef="147329280"/>
</Location>
<Media id="CPB86006610.mpg" name="CPB86006610.mpg"
contentID="CPB86006610.mpg" frameHeight="288" frameWidth="352"/>
</DescriptionRelationship>
<Structure constraintType="spatial">
<Feature xsi:type="FaceType">
<DescriptionRelationship>
<Location>
<avl:BoundingBox>
<avdl:NE numX="92" denX="352" numY="217" denY="288"/>
<avdl:NW numX="92" denX="352" numY="267" denY="288"/>
<avdl:SE numX="136" denX="352" numY="217" denY="288"/>
<avdl:SW numX="136" denX="352" numY="267" denY="288"/>
</avdl:BoundingBox>
</Location>
...
Raphaël Troncy CoRIMedia - 10/25/2004 18
Memory
.NET implementation
d-162.xmlds-17.xml
DescriptionSchemes
Descriptions
ds-17.dll
parsing parsing
read/write
.NET instanciation
Raphaël Troncy CoRIMedia - 10/25/2004 19
Two kinds of applications
• Static Description Schemes– DS are well-known– The developer uses generated libraries
• Dynamic Description Schemes– DS are created by the application– Use of the dynamic instantiation mechanism
(reflexivity) of .NET
Raphaël Troncy CoRIMedia - 10/25/2004 20
Carrying out the scenario
• Definition of new descriptors and properties– associating behavior with the corresponding classes– performing reasoning on the descriptions with the
formal definitions in OWL
• Definition of logical and temporal structures– the description is controlled and validated by a
grammar
Raphaël Troncy CoRIMedia - 10/25/2004 21
Conclusion and Future Work
• AVDL: a reduced yet extensible Audio-Visual Description Language– descriptors, properties, structures– XML syntax and DL semantics– .NET implementation and APIs
• About structure validation:– which constructors used ? which semantics ?
• Trade-of expressivity vs calculability– OWL Full is undecidable– constraints satisfaction problems can be complex