Overview of MPEG-4 Lihang Ying Department of Computing Science University of Alberta, Edmonton,...

Preview:

Citation preview

Overview of MPEG-4

Lihang Ying

Department of Computing Science

University of Alberta, Edmonton, Canada

These slides are available online: www.cs.ualberta.ca/~lihang/Share/mpeg4

Outline

MPEG-4 Demos and Overview Demos Overview

How to Organize MPEG-4 Contents – Scene/Object Description Examples Study

Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding

Demos

EnvivioTV:http://www.envivio.com/products/etv/content/technical.jsp

It’s a plug-in for realplayer, media player or quicktime

Characters(1)

MPEG-4 vs MPEG-1/2 Not merely video and audio Interactive

Object-based

Scalability

Characters(2)

Why MPEG-4? Interoperability:

Run on all kinds of platforms and devices Reuse Multimedia contents Create once, use everywhere

Multi-network Delivery Internet/Mobile/Broadcast Networks Different bandwidth

Scalability Different capacity (i.e. display resolution) of

different devices

MPEG-J

API: org.iso.mpeg.mpegj org.iso.mpeg.mpegj.scene org.iso.mpeg.mpegj.resource org.iso.mpeg.mpegj.decoder org.iso.mpeg.mpegj.net

Implement MPEG-4 Coder/Decoder conveniently with MPEG-J API

Create Coder/Decoder once, run on all kinds of devices and platforms

Profile/Level

Different Implementations: Profile

Divide functionality into different subsets Level

Constraints on parameters(bitrate,frames/sec…)

Example: EnvivioTV Video: Advanced simple profile at levels 0 - 5. Audio: High-quality profile at levels 1 - 2. Graphics: Advanced profile

•Interactive

Multi-network Delivery

Coder/Decoder: Using MPEG-J Scalability: Different Capacity Profile/Level

Not merely audio/videoObject-based Interoperability

Outline

MPEG-4 Demos and Overview Demos Overview

How to Organize MPEG-4 Contents – Scene/Object Description Examples Study

Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding

How to Organize Contents

Scene Descriptor Assemble objects into audiovisual

scene Scene description format—binary

format for MPEG-4 scenes (BIFS)

Object Descriptor Describe objects

initialobject description

ES_Descriptor1

ES_Descriptor2

scene descriptor stream

BIFS update (replace scene)

scenedescription

scenedescription

VideoSourceAudio

Source

object descriptor stream

object descriptor update

objectdescr.

object descr.

ES_Descr1

ES_Desc2

visual stream (base layer)

visual stream (e.g. temporal enhancement layer)

audio stream

ES_ID1

ES_ID2

ES_D1 ES_IDc

ES_IDb

ES_IDa

ES_IDi

ES_IDii

Scene Description - BIFS

Represented by XMT-A Format: Similar to XML Express bitstream syntax in

document Enable easy generation of bitstream

parser

BIFS Examples: …

BIFS Example(1)–Trivial Scene(MPEG-2/DVD)

Scene Tree

Layer2D

Sound2D

AudioSource

Shape

Bitmap

Appearance

MovieTexture

BIFS Example(1)–Trivial Scene(MPEG-2/DVD)

BIFS Example(2)–Movie with Subtitles

BIFS Example(3)–Icons

Icons

BIFS Example(4)–Buttons

Event Response

Object Description

Syntactic Description Language (SDL) Express bitstream syntax in

document Enable easy generation of bitstream

parser

SDL Example:…

Object Description - SDL

ObjectDescriptorclass ObjectDescriptor extends ObjectDescriptorBase: bit(8)

tag=ObjectDescrTag {

bit(10) ObjectDescriptorID;

bit(1) URL_Flag;

const bit(5) reserved=0b1111.1;

if (URL_Flag) {

bit(8) URLlength;

bit(8) URLstring(URLlength);

} else {

ES_Descriptor esDescr[1..255];

OCI_Descriptor ociDescr[0..255];

IPMP_DescritporPointer ipmpDescriPtr[0..255];

}

ExtensionDescriptor extDescr[0..255];

}

Object Descriptor Summary ObjectDescriptor

ObjectDescriptorID URL_Flag ES_Descriptor // Elementary Streaming

ES_ID, streamDependenceFlag, URL_Flag, OCRstreamFlag, streamPriority, DecoderConfigDescriptor, SLConfigDescriptor, IPI_DescrPointer, IP_IdentificationDataSet, IPMP_DescriptorPointer, LanguageDescriptor, QoS_Decriptor...

OCI_Descriptor // Object Content Information

ContentClassificationDescriptor, KeywordDescriptor, RatingDecriptor, LanguageDescriptor, ShortTextualDescriptor, ExpandedTextualDescriptor, ContentCreatorNameDescriptor, ContentCreationDataDescriptor, OCICreatorNameDescriptor, OCICreationNameDescriptor, SmpteCameraPositionDescriptor, MediaTimeDescriptor, ...

IPMP_DescriptorPointer // Intellectual Property Management and Protection

Applications of OCI/IPMP–eDonkey’s problems

MPEG-4 Objects and Tools

Audio Natural Audio Synthetic and Natural Hybrid

Coding(SNHC) Visual

Natural Video Object-based/Scalability

SNHC 2D/3D Mesh Object/Face and Body Animation

Image Text …

Outline

MPEG-4 Demos and Overview Demos Overview

How to Organize MPEG-4 Contents – Scene/Object Description Examples Study

Synthetic and Natural Hybrid Coding(SNHC) – Visual Part 2D Mesh Coding 3D Mesh Coding

[2D Mesh Coding]

Natural Video Coding Block-based textual and motion coding Shape information coding

2D Mesh Coding Designed for video manipulation 2D mesh or 2D planar graphs with triangles Natural images and video mapped on 2D

meshes Applications: Object tracking, Content-based

video retrieval(e.g. motion-based queries), 2D animation, Augmented reality, …

Example

(a)original frame

(b)Mesh generated

(c)Text overlaid on video:Text moves along with the fish’s meshs

Architecture of 2D Mesh Coding

2D Mesh Object Also called 2D Dynamic Mesh Support video coding by moving the

vertices of the mesh Topology of the mesh does not

change in one session

Mesh Data includes: Connectivity: how vertices are

connected Geometry: 2D coordinates of vertices Motion: temporal difference of vertices’

positions

I-MOP and P-MOP

I-MOP:Intra-Mesh Object Plane For a given session, connectivity and

geometry information needs to be transmitted only once

P-MOP:Inter-Mesh Object Plane The deformation of the given mesh

over time can be described as temporal difference of the geometry, or geometry motion

2D Mesh Decoding Scheme

Mesh Data - Connectivity

Uniform Triangulation: Suited for rectangular video objects Located in x and y grids Specify the length of grid intervals

Mesh Data - Connectivity

Delaunay Triangulation: Suited for arbitrarily shaped video

objects Guarantee:

Close to Equilateral: producing the largest minimal angle

Unique: unique triangulation for given vertices

Coding of Connectivity Data

Uniform Triangulation:

Delaunay Triangulation: Differential coding:

xn=xn-1+dxn, yn=yn-1+dyn

Coding Order of Delaunay Triangulation

1) Boundary vertices Start from top-left most Counterclockwise

2) Inside vertices Choice the next by distance-closest

one

Coding of Mesh Motion

Motion: temporal difference of vertices’ positions Mesh Traversal:

1) Start from top-left, breadth-first 2) Right(Next counterclockwise) 3) Left This order remain unchanged(intact) until

next I-MOP is decoded Mesh Motion Coding

Encoded based on previously encoded two neighboring vertices, e.g. cbaabcIn ),(,

[3D Mesh Coding]

2D Mesh Coding: supports to map natural images and

video mapped on 2D meshes 3D Mesh Coding:

Represent and compress 3D objects onto which images and videos may be mapped

Compress static 3D models, not their animation

Functionalities

High compression 2%-4% of VRML ASCII file

Incremental rendering Building the model with part bitstream

Error resilience Suffer less from network errors

Hierarchical buildup Scalable bitstream with different

resolutions, depending on viewing distance

Incremental Rendering

Data of 3D Mesh Object

Connectivity: how vertices are connected

Geometry: 3D coordinates of vertices

Photometry Colors Normals Texture

Bitstream of 3D Mesh Coding

Connectivity Data Vertex graph Triangle tree

Triangle Data Contains: geometry coordinates,

colors, normals, texture coordinates Largest part of the bitstream

Bitstream of 3D Mesh Coding

Connectivity Data is packed separately and before the Triangle Data.

Benefits: Incremental rendering:

Could decode Triangle Data incrementally since full Connectivity(topology) Data is already available

Shorten the latency Error resilience:

Can form 3D structure even with some missing Triangle Data

Decoding Scheme of 3D Mesh

Vertex Graph

Triangle Tree

Data of 3D Mesh Object

Connectivity: how vertices are connected

Geometry: 3D coordinates of vertices

Photometry Colors Normals Texture

Coding of Geometry and Photometry Data

1) Quantization

2) Differential Coding No prediction Parallelogram prediction Tree prediction

3) Adaptive Arithmetric Entropy Coding Code the differential values

3D Mesh Coding Modes

Error-Resilience Mode To minimize the impact of errors,

divide into partition or packet Render partitions independently

Progressive Transmission Mode Scalable coding

One base layer One or more enhancement layers

Provide Forest Split operations Contains face forest, triangle tree, triangle

data

Forest Split Operation

(a) Cut through the edges of vertex tree

(b) Open the dotted line

(c) Triangulate the opening to form a triangle tree

(d) Refined mesh

References Books:

Major Reference: Major Reference: Fernando Pereira,Touradj Ebrahimi,The MPEG-4 Book, Prenticle Hall PTR, 2002

Natural Video Coding Technology: Joan L.Mitchell,etc. MPEG Video Compression Standard, Chapman&Hall, 1996

MPEG Official Websites: Overview: http://mpeg.telecomitalialab.com/standards.htm ResourcesResources: http://www.m4if.org/resources.php

Demos: http://www.envivio.com/products/etv/content/technical.jsp http://www.ivast.com/aboutmpeg4/index.html

MPEG-4 Series Slides, Course Presentation of C640/2003 Winter, U. of Alberta:

http://www.cs.ualberta.ca/~anup/Courses/604/604_3D.htm

The End

Acknowledgements Yongjie Liu Michael Closson

Questions and Comments?

DecoderConfigDescriptor

Class DecoderConfigDescriptor extends BaseDescriptor : bit(8) tag=DecoderConfigDescrTag {bit(8) objectTypeIndication;bit(6) streamType;bit(1) upStream;const bit(1) reserved=1;bit(24) bufferSizeDB;bit(32) maxBitrate;bit(32) avgBitrate;DecoderSpecificInfo decSpecificInfo[0..1];profileLevelIndicationIndexDescriptor

profileLevelIndicationIndexDescr[0..255];

}Back

Recommended