MPEG Augmented Reality Tutorial

Preview:

DESCRIPTION

I made this tutorial at Web3D 2012 conference. It provides MPEG position to AR, technologies currently used, as well as explanations on how to set up AR applications.

Citation preview

MPEG Augmented Reality Tutorial

Web3D Conference, August 4-5, Los Angeles, CA

Marius Preda, MPEG 3DG ChairInstitut Mines TELECOM

http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial

Topics of the day

MPEG-A Part 14 Augmented Reality Reference Model

MPEG-A Part 13 Augmented Reality Application Format

MPEG offer in the Augmented Reality field

What is MPEG?

MPEG Augmented Reality Tutorial

Topics of the day

MPEG-A Part 14 Augmented Reality Reference Model

MPEG-A Part 13 Augmented Reality Application Format

MPEG offer in the Augmented Reality field

What is MPEG?

MPEG Augmented Reality Tutorial

What is MPEG?

Coding/compression of elementary media: – Audio (MPEG-1, 2 and 4)– Video (MPEG-1, 2 and 4)– 2D/3D graphics (MPEG-4)

Storage and Transport – MPEG-2 Transport– File Format (MPEG-4)– Dynamic Adaptive Streaming over HTTP (DASH)

Hybrid (natural & synthetic) scene description, user interaction (MPEG-4)Metadata (MPEG-7)Media management and protection (MPEG-21)Sensors and actuators, virtual worlds (MPEG-V)Advanced User interaction (MPEG-U)Media-oriented middleware (MPEG-M)

More ISO/IEC standards under development for– 3D Video, 3D Audio – Coding and Delivery in Heterogeneous Environments– …

A suite of ~130 ISO/IEC standards

A standardization activity continuing for 24 years– Supported by several hundreds companies/organisations from ~25 countries– ~500 experts participating in quarterly meetings– More than 2300 active contributors– Many thousands experts working in companies

A proven manner to organize the work to deliver useful and used standards– Developing standards by integrating individual technologies– Well defined procedures– Subgroups with clear objectives – Ad hoc groups continuing coordinated work between meetings

MPEG standards are widely referenced by industry– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF…

Billions of software and hardware devices built on MPEG technologies – MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, …

What is MPEG?Involvement, approach, deployment

Topics of the day

MPEG-A Part 14 Augmented Reality Reference Model

MPEG-A Part 13 Augmented Reality Application Format

MPEG offer in the Augmented Reality field

What is MPEG?

MPEG Augmented Reality Tutorial

MPEG technologies related to AR

MPEG-1/2(AV content)

1992/4

VRML

1997

• Part 11 - BIFS: -Binarisation of VRML -Extensions for streaming -Extensions for server command -Extensions for 2D graphics - Real time augmentation with audio & video• Part 2 - Visual: - 3D Mesh compression - Face animation

1998

• Part 2 – Visual - Body animation

1999

MPEG-4 v.1

MPEG-4 v.2

First form of broadcast signal augmentation

MPEG technologies related to AR

MPEG-4

2003

•AFX 2nd Edition: - Animation by morphing - Multi-texturing

2005

• AFX 3rd Edition - WSS for terrain and cities - Frame based animation

2007

MPEG-4

MPEG-4

• Part 16 - AFX: - A rich set of 3D graphics tools - Compression of geometry, appearance, animation

• AFX 4th Edition - Scalable complexity mesh coding

2011

MPEG-4A rich set of 3D Graphics representation and compression tools

MPEG technologies related to AR

MPEG-4

2003

•AFX 2nd Edition: - Animation by morphing - Multi-texturing

2005

• AFX 3rd Edition - WSS for terrain and cities - Frame based animation

2007

MPEG-4

MPEG-4

• Part 16 - AFX: - A rich set of 3D graphics tools - Compression of geometry, appearance, animation

• AFX 4th Edition - Scalable complexity mesh coding

2011

MPEG-4

2009

• Part 25 - Compression of third-party XML (X3D, COLLADA)

MPEG-4

2004

• Part 16 - X3D Interactive Profile

MPEG-4

MPEG technologies related to AR

MPEG-V - Media Context and Control

2011

• 2nd Edition: - GPS - Biosensors - 3D Camera

201x

• Compression of video + depth

201x

MPEG-V

- 3D Video

• 1st Edition - Sensors and actuators - Interoperability between Virtual Worlds

• Feature-point based descriptors for image recognition

201x

CDVS

MPEG-U – Advanced User Interface

2012

A rich set of sensors and actuators

- 3D Audio

MPEG-H

All AR-related data is available from MPEG standardsReal time composition of synthetic and natural objectsAccess to

– Remotely/locally stored BIFS/compressed 2D/3D mesh objects – Streamed real-time BIFS/compressed 2D/3D mesh objects

Inherent object scalability (e.g. for streaming)User interaction & server generated scene changesPhysical context

– Captured by a broad range of standard sensors– Affected by a broad range of standard actuators

Main features of MPEG AR technologies

MPEG vision on AR, the MPEG AR Browser

Point to a URL – no need to download new applications for each context. The browser

– Retrieves scenario from the internet– Starts video acquisition– Tracks objects– Recognizes objects from visual signatures– Recovers camera pose– Gets streamed 3D graphics– Composes new scenes– Gets inputs from various sensors– Offers optimal AR experience by constantly adapting interaction possibilities

and objects from a remote server. Industry

– Maximize number of customers through MPEG-compliant authoring tools and browsers

– No need to develop a new application for each use case and device platform

MPEG vision on AR

MPEG-4/MPEG-7/MPEG-21/MPEG-U/MPEG-V MPEG Player

CompressionAuthoring Tool

Produce

Download

Architecture

AR Player

MediaServers

ServiceServers

User

LocalSensors & Actuators

RemoteSensors & Actuators

AR file or stream

Local Real World

Environment

Local Real World

Environment

Remote Real World

Environment

Remote Real World

Environment

ISO/IEC 23000-14 Augmented Reality Reference Model– WD stage, collaborating with SC24/WG9, ARStandards, OGC, Khronos,

Web3DISO/IEC 23000-13 Augmented Reality Application Format

– CD stage, based on MPEG standards

MPEG ongoing work on AR

Topics of the day

MPEG-A Part 14 Augmented Reality Reference Model

MPEG-A Part 13 Augmented Reality Application Format

MPEG offer in the Augmented Reality field

What is MPEG?

MPEG Augmented Reality Tutorial

Glossary

InformationViewpoint

ComputationalViewpoint

EngineeringViewpoint

Implementation/Development

TechnologyViewpoint

EnterpriseViewpoint

Community Objectives

Abstract/Design

Use cases- Guide- Create- Play

WD2.0 content

Augmented Reality Reference Model

Viewpoints

AR Player

MediaServers

ServiceServers

User

Local / Remote Context

AR Document

Enterprise viewpoint: global architecture and actors

Augmented Reality Reference Model

ARTC

AREC

AC

AR Tools Creator (ARTC)AR Experience Creator (AREC)Assets Creator (AC)

AC AAAssets Aggregator (AA)

Device Manufacturer (DM)Middleware/Component Provider (MCP)

DM MCP

MCP

End-User (EU)

EU

Telecommunication Operator (TO)

TO TO

TO

TO

Online Middleware/Component Provider (OMCP)AR Service Provider (ARSP)

OMCP ARSP

Information viewpoint

Augmented Reality Reference Model

AR Player

MediaServers

ServiceServers

User

Local/Remote Context

AR Document

Scene/Real World• Raw image• Sensed data • Virtual Camera view• Detected features• Area of Interest/Anchors

Tracking objects• Markers• Marker-less

Device Context• Device capabilities Location of Device

• Location• Orientation

Spatial Models• Coordinate Ref. Sys.• (Geol)ocation• Projections• Coordinate conversion

Presentation• Augmentation• Registration• Styling/complexity• Spatial Filtering, e.g.

rangeUser Input

• Query• Manipulation of

Presentation• Topics of interest• Preferences

Digital Assets• Presentation data• Trigger/Event rules• Accuracy based

AR Player

MediaServers

ServiceServers

User

Local / Remote Context

AR Document

Computational viewpoint

Augmented Reality Reference Model

1

2

3 4

5

AR Player

MediaServers

ServiceServers

User

Local / Remote Context

AR Document

Computational viewpoint

Augmented Reality Reference Model

1

2

4 5

3

Engineering viewpoint

Augmented Reality Reference Model

AR Player

MediaServers

ServiceServers

User

Local/Remote Context

AR Document

Camera Mic Accelerometer Compass GPS …

Rendering Engine

Display(A/V/H)

Application Engine …

Glossary

Augmented Reality Reference Model

Use cases

Augmented Reality Reference Model

How to contribute?

Augmented Reality Reference Model

Use Trac!http://wg11.sc29.org/trac/augmentedreality/

Topics of the day

MPEG-A Part 14 Augmented Reality Reference Model

MPEG-A Part 13 Augmented Reality Application Format

MPEG offer in the Augmented Reality field

What is MPEG?

MPEG Augmented Reality Tutorial

A set of scene graph nodes/protos as defined in MPEG-4 Part 11– Existing nodes

– Audio, image, video, graphics, programming, communication, user interactivity, animation

– New standard PROTOs– Map, MapMarker, Overlay, ReferenceSignal, ReferenceSignalLocation,

CameraCalibration, AugmentedRegion

Connection to sensors as defined in MPEG-V– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic,

Altitude– Local camera sensor

Compressed media

3 components: scene, sensors/actuators, medias

MPEG-A Part 13 ARAF

Category Sub-category Node, Protos / Elements name in MPEG-4 BIFS / XMT

Elementary media

AudioAudioSource

SoundSound2D

Image and video

ImageTextureMovieTexture

Textual information

FontStyleText

Graphics

AppearanceColor

LinePropertiesLinearGradient

MaterialMaterial2DRectangle

ShapeSBVCAnimationV2

SBBoneSBSegment

SBSkinnedModelMorphShapeCoordinate

TextureCoordinateNormal

IndexedFaceSetIndexedLineSet

Programming Script

User interactivity

InputSensorSphereSensorTimeSensorTouchSensorMediaSensorPlaneSensor

Category Sub-category Node, Protos / Elements name in MPEG-4 BIFS / XMT

Scene related information (spatial and

temporal relationships)

AugmentationRegionBackground

Background2DCameraCalibration

GroupInline

Layer2DLayer3DLayout

NavigationInfoOrderedGroup

ReferenceSignalReferenceSignalLocation

SwitchTransform

Transform2DViewpointViewport

Form

Dynamic and animated scene

OrientationInterpolatorScalarInterpolator

CoordinateInterpolatorColorInterpolator

PositionInterpolatorValuator

Communication and compression

BitWrapperMediaControl

MapsMap

MapOverlayMapMarker

Terminal TermCap

Scene: 63 XML Elements

MPEG-A Part 13 ARAF

Category Sub-category Node, Protos / Elements name in MPEG-4 BIFS / XMT

Elementary media

AudioAudioSource

SoundSound2D

Image and video

ImageTextureMovieTexture

Textual information

FontStyleText

Graphics

AppearanceColor

LinePropertiesLinearGradient

MaterialMaterial2DRectangle

ShapeSBVCAnimationV2

SBBoneSBSegment

SBSkinnedModelMorphShapeCoordinate

TextureCoordinateNormal

IndexedFaceSetIndexedLineSet

Programming Script

User interactivity

InputSensorSphereSensorTimeSensorTouchSensorMediaSensorPlaneSensor

Category Sub-category Node, Protos / Elements name in MPEG-4 BIFS / XMT

Scene related information (spatial and

temporal relationships)

AugmentationRegionBackground

Background2DCameraCalibration

GroupInline

Layer2DLayer3DLayout

NavigationInfoOrderedGroup

ReferenceSignalReferenceSignalLocation

SwitchTransform

Transform2DViewpointViewport

Form

Dynamic and animated scene

OrientationInterpolatorScalarInterpolator

CoordinateInterpolatorColorInterpolator

PositionInterpolatorValuator

Communication and compression

BitWrapperMediaControl

MapsMap

MapOverlayMapMarker

Terminal TermCap

Scene: the distance between ARAF and X3D is 32 (XML Elements)

MPEG-A Part 13 ARAF

Name: Park

Chu-Young

Position: FW

Team:

Arsenal: F

C

Referen

ce Image

Name: Park Chu-Young Position: FWTeam: Arsenal: FC

Reference Image

3D graphic Synchronized with movement of marker image

Marker

Marker Tracking

Scene:: Reference Signal

MPEG-A Part 13 ARAF

Scene:: Reference Signal

MPEG-A Part 13 ARAF

<ProtoDeclare name="ReferenceSignal” locations="org:mpeg:referencesignal"> <field name="source" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="referenceResources" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="enabled" ="Boolean" vrml97Hint="exposedField" booleanValue="false"/> <field name="detectionHints" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="onInputDetected" ="Integer" vrml97Hint="eventOut"/> <field name="onError" ="Integer" vrml97Hint="eventOut"/></ProtoDeclare>

Scene:: Reference Signal Location

MPEG-A Part 13 ARAF

<ProtoDeclare name="ReferenceSignalLocation" locations="org:mpeg:referencesignallocation"> <field name="source" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="referenceResources" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="enabled" ="Boolean" vrml97Hint="exposedField" booleanValue="false"/> <field name="detectionHints" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/>

<field name="translation" ="Vector3Array" vrml97Hint="exposedField" Vector3ArrayValue=""/> <field name="rotation" ="Rotations" vrml97Hint="exposedField" rotationArrayValue=""/>

<field name="onInputDetected" ="Integer" vrml97Hint="eventOut"/> <field name="onTranslationChanged" ="Integer" vrml97Hint="eventOut"/> <field name="onRotationChanged" ="Integer" vrml97Hint="eventOut"/> <field name="onError" ="Integer" vrml97Hint="eventOut"/></ProtoDeclare>

Augmentation Region

Broadcaster

AR service provider A

AR service provider B

User A

User B

Scene:: Augmentation Region

MPEG-A Part 13 ARAF

Scene:: Augmentation Region

MPEG-A Part 13 ARAF

<ProtoDeclare name="AugmentationRegion" locations="org:mpeg:augmentationregion"> <field name="source" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="2DRegion" ="Vector2Array" vrml97Hint="exposedField" vector2ArrayValue=""/> <field name="arProvider" ="Strings" vrml97Hint="exposedField" stringArrayValue=""/> <field name="enabled" ="Boolean" vrml97Hint="exposedField" booleanValue="false"/> <field name="translation" ="Vector3Array" vrml97Hint="exposedField" Vector3ArrayValue=""/> <field name="rotation" ="Rotations" vrml97Hint="exposedField" rotationArrayValue=""/> <field name="onTranslationChanged" ="Integer" vrml97Hint="eventOut"/> <field name="onRotationChanged" ="Integer" vrml97Hint="eventOut"/> <field name="onARProviderChanged" ="Boolean" vrml97Hint="eventOut"/> <field name="onError" ="Integer" vrml97Hint="eventOut"/></ProtoDeclare>

Scene:: Map, MapMarkers and Overlay

MPEG-A Part 13 ARAF

Scene:: Map, MapMarkers and Overlay

MPEG-A Part 13 ARAF

<ProtoDeclare name="Map" protoID="1" locations="org:mpeg:map"> <field name="addChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="removeChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="addOverlays" ="Nodes" vrml97Hint="eventIn"/> <field name="removeOverlays" ="Nodes" vrml97Hint="eventIn"/> <field name="translate" ="Vector2" vrml97Hint="eventIn"/> <field name="zoom_in" ="Boolean" vrml97Hint="eventIn"/> <field name="zoom_out" ="Boolean" vrml97Hint="eventIn"/> <field name="gpscenter_changed" ="Vector2" vrml97Hint="eventOut"/> <field name="children" ="Nodes" vrml97Hint="exposedField"> <nodes></nodes> </field> <field name="overlays" ="Nodes" vrml97Hint="exposedField"> <nodes></nodes> </field><field name="gpsCenter" ="Vector2" vrml97Hint="exposedField" vector2Value="0 0"/> <field name="mode" ="Strings" vrml97Hint="exposedField" stringArrayValue="ROADMAP"/> <field name="provider" ="Strings" vrml97Hint="exposedField" stringArrayValue="ANY"/> <field name="size" ="Vector2" vrml97Hint="exposedField" vector2Value="0 0"/> <field name="mapWidth" ="Float" vrml97Hint="exposedField" floatValue="0"/> <field name="zoomLevel" ="Integer" vrml97Hint="exposedField" integerValue="0"/></ProtoDeclare>

Scene:: Map, MapMarkers and Overlay

MPEG-A Part 13 ARAF

<ProtoDeclare name="MapOverlay" locations="org:mpeg:mapoverlay"> <field name="addChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="removeChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="children" ="Nodes" vrml97Hint="exposedField"> <field name="keywords" ="Strings" vrml97Hint="exposedField stringArrayValue=""/></ProtoDeclare>

<ProtoDeclare name="MapMarker" locations="org:mpeg:mapmarker"> <field name="addChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="removeChildren" ="Nodes" vrml97Hint="eventIn"/> <field name="gpsPosition" ="Vector2" vrml97Hint="exposedField" vector2Value="0 0"/> <field name="children" ="Nodes" vrml97Hint="exposedField"> <nodes></nodes> </field> <field name="keywords" ="Strings" vrml97Hint="exposedField stringArrayValue=""/></ProtoDeclare>

Sensors/Actuators

MPEG-A Part 13 ARAF

MPEG-4 Player

MPEG-V Sensor 1

MPEG-V Sensor 2

MPEG-V Sensor 3

MPEG-4 Scene

Compositor Screen

InputSensor 1

InputSensor 2

InputSensor 3

MPEG-4 Player

CameraCamera Input

StreamRAW Decoder Compositor

Screen

hw://camera/back

Scene mapping of captured

data

Compositor mapping of captured

data

Acceleration SensorOrientation Sensor

Angular VelocityGlobal Position Sensor

Altitude Sensor

Camera Sensor

Sensors/Actuators:: MPEG-V

MPEG-A Part 13 ARAF

Real World(Sensor Device)

Real World(Sensory Device)

User

SensedInformation

(5)

Sensor Device

Capability(2)

Sensory Device

Capability(2)

DeviceCommands

(5)

SensoryEffects

Preferences (2)

Virtual World

SensoryEffects

(3)

SensorAdaptationPreferences

(2)

V→R Adaptation: converts Sensory Effects from VW into Device Cmds applied to RW

R→V Adaptation: converts Sensed Info from RW to VW Object

Char/Sensed Info applied to VW

SensedInformation

(5)

VW ObjectCharacteristics

(4)

Engine

Sensors/Actuators:: MPEG-V

MPEG-A Part 13 ARAF

Sensors/Actuators:: MPEG-V types

MPEG-A Part 13 ARAF

ActuatorsLight Flash Heating Cooling Wind Vibration Sprayer Scent Fog Color correction Initialize color correction parameter Rigid body motion Tactile Kinesthetic

Global position command

SensorsLight Ambient noise Temperature Humidity Distance Atmospheric pressure Position Velocity Acceleration Orientation Angular velocity Angular acceleration Force Torque Pressure Motion Intelligent camera typeMulti Interaction point Gaze tracking Wind

Dust Body height Body weight Body temperature Body fat Blood type Blood pressure Blood sugar Blood oxygen Heart rate

Electrograph EEG , ECG, EMG, EOG , GSR Weather Facial expression Facial morphology Facial expression characteristics Geomagnetic

Global position Altitude Bend Gas

Compression

MPEG-A Part 13 ARAF

Media Compression tool name Reference standardImage JPEG ISO/IEC 10918

JPEG2000 ISO/IEC 15444Video Visual ISO/IEC 14496-2

Advanced Video Coding ISO/IEC 14496-10Audio MP3 ISO/IEC-11172-3

Advanced Audio Coding ISO/IEC 14496-33D Graphics Scalable Complexity Mesh Coding ISO/IEC 14496-16

Bone-based Animation ISO/IEC 14496-16Scenes BIFS ISO/IEC 14496-11

Exercises

MPEG-A Part 13 ARAF

AR Quiz Augmented Book

Exercises

MPEG-A Part 13 ARAF

AR Quiz Augmented Book

http://youtu.be/LXZUbAFPP-Yhttp://youtu.be/la-Oez0aaHE

AR Quiz setting, preparing the medias

MPEG-A Part 13 ARAF

images, videos, audios, 2D/3D assetsGPS location

AR Quiz XML inspection

MPEG-A Part 13 ARAF

http://tiny.cc/MPEGARQuiz

AR Quiz Authoring Tool

MPEG-A Part 13 ARAF

www.MyMultimediaWorld.com go to Create / Augmented Reality

Augmented Book setting

MPEG-A Part 13 ARAF

images, audios

Augmented Book XML inspection

MPEG-A Part 13 ARAF

http://tiny.cc/MPEGAugBook

Augmented Book Authoring Tool

MPEG-A Part 13 ARAF

www.MyMultimediaWorld.com go to Create / Augmented Books

Next Steps

MPEG-A Part 13 ARAF

Support for metadata at scene and object levelSupport for usage rights at scene and object levelCollisions between real and virtual objects, partial rendering

ARAF distance to X3D

On Scene Graph– 32 elements

– including 2D graphics, humanoid animation, generic input, media control, and pure AR protos

On Sensors/Actuators– 6 elements

On Compression– MPEG-4 Part 25 already compresses X3D

• Joint development of AR Reference Model– The community at large is invited to react/contribute

such as the model became a reference– http://wg11.sc29.org/trac/augmentedreality

• MPEG promoted a first version of an integrated and consistent solution for representing content in AR applications and services– Continue synchronized/harmonized development of

technical specifications with X3D, COLLADA, OGC content models

Conclusions

Recommended