51
Overview of MPEG-4 Lihang Ying Department of Computing Science University of Alberta, Edmonton, Canada These slides are available online: www.cs.ualberta.ca/~lihang/Share/mpeg4

Overview of MPEG-4

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Overview of MPEG-4

Lihang YingDepartment of Computing Science

University of Alberta, Edmonton, Canada

These slides are available online: www.cs.ualberta.ca/~lihang/Share/mpeg4

Outline� MPEG-4 Demos and Overview

� Demos� Overview

� How to Organize MPEG-4 Contents –Scene/Object Description� Examples Study

� Synthetic and Natural Hybrid Coding(SNHC) – Visual Part� 2D Mesh Coding� 3D Mesh Coding

Demos

� EnvivioTV:http://www.envivio.com/products/etv/content/technical.jsp

It’s a plug-in for realplayer, media player or quicktime

Characters(1)

� MPEG-4 vs MPEG-1/2� Not merely video and audio� Interactive

� Object-based� Scalability

Characters(2)� Why MPEG-4?

� Interoperability: � Run on all kinds of platforms and devices� Reuse Multimedia contents� Create once, use everywhere

� Multi-network Delivery� Internet/Mobile/Broadcast Networks� Different bandwidth

� Scalability� Different capacity (i.e. display resolution) of

different devices

MPEG-J

� API:� org.iso.mpeg.mpegj� org.iso.mpeg.mpegj.scene� org.iso.mpeg.mpegj.resource� org.iso.mpeg.mpegj.decoder� org.iso.mpeg.mpegj.net

� Implement MPEG-4 Coder/Decoder conveniently with MPEG-J API

� Create Coder/Decoder once, run on all kinds of devices and platforms

Profile/Level

� Different Implementations:� Profile

� Divide functionality into different subsets

� Level� Constraints on parameters(bitrate,frames/sec…)

� Example: EnvivioTV� Video: Advanced simple profile at levels 0 - 5.� Audio: High-quality profile at levels 1 - 2.� Graphics: Advanced profile

•Interactive

� Multi-network Delivery

� Coder/Decoder: Using MPEG-J� Scalability: Different Capacity� Profile/Level

�Not merely audio/video�Object-based �Interoperability

Outline� MPEG-4 Demos and Overview

� Demos� Overview

�How to Organize MPEG-4 Contents –Scene/Object Description� Examples Study

� Synthetic and Natural Hybrid Coding(SNHC) – Visual Part� 2D Mesh Coding� 3D Mesh Coding

How to Organize Contents

� Scene Descriptor� Assemble objects into audiovisual scene� Scene description format—binary format

for MPEG-4 scenes (BIFS)

� Object Descriptor� Describe objects

initialobject descriptionES_Descriptor1

ES_Descriptor2

scene descriptor stream

BIFS update (replace scene)

scenedescription

scenedescription

VideoSourceAudio

Source

object descriptor stream

object descriptor update

objectdescr.

object descr.

ES_Descr1

ES_Desc2

visual stream (base layer)

visual stream (e.g. temporal enhancement layer)

audio stream

ES_ID1

ES_ID2

ES_D1 ES_IDc

ES_IDbES_IDa

ES_IDi

ES_IDii

Scene Description - BIFS� Represented by XMT-A Format:

� Similar to XML� Express bitstream syntax in document� Enable easy generation of bitstream parser

� BIFS Examples: …

BIFS Example(1)–Trivial Scene(MPEG-2/DVD)

� Scene Tree

Layer2D

Sound2D

AudioSource

Shape

Bitmap

Appearance

MovieTexture

BIFS Example(1)–Trivial Scene(MPEG-2/DVD)

BIFS Example(2)–Movie with Subtitles

BIFS Example(3)–Icons

� Icons

BIFS Example(4)–Buttons

Event Response

Object Description� Syntactic Description Language (SDL)

� Express bitstream syntax in document� Enable easy generation of bitstream parser

� SDL Example:…

Object Description - SDL� ObjectDescriptorclass ObjectDescriptor extends ObjectDescriptorBase: bit(8)

tag=ObjectDescrTag {

bit(10) ObjectDescriptorID;

bit(1) URL_Flag;

const bit(5) reserved=0b1111.1;

if (URL_Flag) {

bit(8) URLlength;

bit(8) URLstring(URLlength);

} else {

ES_Descriptor esDescr[1..255];

OCI_Descriptor ociDescr[0..255];

IPMP_DescritporPointer ipmpDescriPtr[0..255];

}

ExtensionDescriptor extDescr[0..255];

}

Object Descriptor Summary� ObjectDescriptor

� ObjectDescriptorID� URL_Flag� ES_Descriptor // Elementary Streaming

ES_ID, streamDependenceFlag, URL_Flag, OCRstreamFlag, streamPriority, DecoderConfigDescriptor, SLConfigDescriptor, IPI_DescrPointer, IP_IdentificationDataSet, IPMP_DescriptorPointer, LanguageDescriptor, QoS_Decriptor...� OCI_Descriptor // Object Content Information

ContentClassificationDescriptor, KeywordDescriptor, RatingDecriptor, LanguageDescriptor, ShortTextualDescriptor, ExpandedTextualDescriptor, ContentCreatorNameDescriptor, ContentCreationDataDescriptor, OCICreatorNameDescriptor, OCICreationNameDescriptor, SmpteCameraPositionDescriptor, MediaTimeDescriptor, ...� IPMP_DescriptorPointer // Intellectual Property Management and

Protection

� Applications of OCI/IPMP–eDonkey’s problems

MPEG-4 Objects and Tools

� Audio� Natural Audio� Synthetic and Natural Hybrid Coding(SNHC)

� Visual� Natural Video

� Object-based/Scalability

� SNHC� 2D/3D Mesh Object/Face and Body Animation

� Image� Text …

Outline� MPEG-4 Demos and Overview

� Demos� Overview

� How to Organize MPEG-4 Contents –Scene/Object Description� Examples Study

�Synthetic and Natural Hybrid Coding(SNHC) – Visual Part� 2D Mesh Coding� 3D Mesh Coding

[2D Mesh Coding]� Natural Video Coding

� Block-based textual and motion coding� Shape information coding

� 2D Mesh Coding� Designed for video manipulation� 2D mesh or 2D planar graphs with triangles� Natural images and video mapped on 2D meshes� Applications: Object tracking, Content-based video

retrieval(e.g. motion-based queries), 2D animation, Augmented reality, …

Example

�(a)original frame

�(b)Mesh generated

�(c)Text overlaid on video:Text moves along with the fish’s meshs

Architecture of 2D Mesh Coding

2D Mesh Object� Also called 2D Dynamic Mesh� Support video coding by moving the

vertices of the mesh� Topology of the mesh does not change in

one session

� Mesh Data includes:� Connectivity: how vertices are connected� Geometry: 2D coordinates of vertices� Motion: temporal difference of vertices’

positions

I-MOP and P-MOP

� I-MOP:Intra-Mesh Object Plane� For a given session, connectivity and

geometry information needs to be transmitted only once

� P-MOP:Inter-Mesh Object Plane� The deformation of the given mesh over

time can be described as temporal difference of the geometry, or geometry motion

2D Mesh Decoding Scheme

Mesh Data - Connectivity

� Uniform Triangulation: � Suited for rectangular video objects� Located in x and y grids� Specify the length of grid intervals

Mesh Data - Connectivity

� Delaunay Triangulation: � Suited for arbitrarily shaped video objects� Guarantee:

� Close to Equilateral: producing the largest minimal angle

� Unique: unique triangulation for given vertices

Coding of Connectivity Data

� Uniform Triangulation:

� Delaunay Triangulation:� Differential coding:

xn=xn-1+dxn, yn=yn-1+dyn

Coding Order of Delaunay Triangulation

� 1) Boundary vertices� Start from top-left most� Counterclockwise

� 2) Inside vertices� Choice the next by distance-closest one

Coding of Mesh Motion

� Motion: temporal difference of vertices’ positions� Mesh Traversal:

� 1) Start from top-left, breadth-first� 2) Right(Next counterclockwise)� 3) Left� This order remain unchanged(intact) until next I-

MOP is decoded

� Mesh Motion Coding� Encoded based on previously encoded two

neighboring vertices, e.g. cbaabcIn →∆ ),(,

[3D Mesh Coding]

� 2D Mesh Coding:� supports to map natural images and video

mapped on 2D meshes

� 3D Mesh Coding:� Represent and compress 3D objects onto

which images and videos may be mapped� Compress static 3D models, not their

animation

Functionalities

� High compression� 2%-4% of VRML ASCII file

� Incremental rendering� Building the model with part bitstream

� Error resilience� Suffer less from network errors

� Hierarchical buildup� Scalable bitstream with different

resolutions, depending on viewing distance

Incremental Rendering

Data of 3D Mesh Object

� Connectivity:� how vertices are connected

� Geometry:� 3D coordinates of vertices

� Photometry� Colors� Normals� Texture

Bitstream of 3D Mesh Coding

� Connectivity Data� Vertex graph� Triangle tree

� Triangle Data � Contains: geometry coordinates, colors,

normals, texture coordinates� Largest part of the bitstream

Bitstream of 3D Mesh Coding

� Connectivity Data is packed separatelyand before the Triangle Data.

� Benefits:� Incremental rendering:

� Could decode Triangle Data incrementally since full Connectivity(topology) Data is already available

� Shorten the latency

� Error resilience:� Can form 3D structure even with some missing

Triangle Data

Decoding Scheme of 3D Mesh

Vertex Graph

Triangle Tree

Data of 3D Mesh Object

� Connectivity:� how vertices are connected

� Geometry:� 3D coordinates of vertices

� Photometry� Colors� Normals� Texture

Coding of Geometry and Photometry Data

� 1) Quantization

� 2) Differential Coding� No prediction� Parallelogram prediction� Tree prediction

� 3) Adaptive Arithmetric Entropy Coding� Code the differential values

3D Mesh Coding Modes

� Error-Resilience Mode� To minimize the impact of errors, divide

into partition or packet� Render partitions independently

� Progressive Transmission Mode� Scalable coding

� One base layer� One or more enhancement layers

� Provide Forest Split operations� Contains face forest, triangle tree, triangle data

Forest Split Operation

(a) Cut through the edges of vertex tree

(b) Open the dotted line

(c) Triangulate the opening to form a triangle tree

(d) Refined mesh

References� Books:

�� Major Reference: Major Reference: Fernando Pereira,Touradj Ebrahimi,The MPEG-4 Book, Prenticle Hall PTR, 2002

� Natural Video Coding Technology: Joan L.Mitchell,etc. MPEG Video Compression Standard, Chapman&Hall, 1996

� MPEG Official Websites:� Overview: http://mpeg.telecomitalialab.com/standards.htm�� ResourcesResources: http://www.m4if.org/resources.php

� Demos:� http://www.envivio.com/products/etv/content/technical.jsp� http://www.ivast.com/aboutmpeg4/index.html

� MPEG-4 Series Slides, Course Presentation of C640/2003 Winter, U. of Alberta:� http://www.cs.ualberta.ca/~anup/Courses/604/604_3D.htm

The End

� Acknowledgements� Yongjie Liu� Michael Closson

� Questions and Comments?

DecoderConfigDescriptorClass DecoderConfigDescriptor extends

BaseDescriptor : bit(8)tag=DecoderConfigDescrTag {

bit(8) objectTypeIndication;bit(6) streamType;bit(1) upStream;const bit(1) reserved=1;bit(24) bufferSizeDB;bit(32) maxBitrate;bit(32) avgBitrate;DecoderSpecificInfo decSpecificInfo[0..1];profileLevelIndicationIndexDescriptor

profileLevelIndicationIndexDescr[0..255];}

Back