14
Jens-Rainer Ohm Multimedia Communication Technology

Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Embed Size (px)

Citation preview

Page 1: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Jens-Rainer Ohm

Multimedia Communication Technology

Page 2: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Springer-Verlag Berlin Heidelberg GmbH

Engineering ONLINE LlBRARY

springeronline.com

Page 3: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Jens-Rainer Ohm

Multimedia Communication Technology Representation, Transmission and Identification of Multimedia Signals

With 441 Figures

, Springer

Page 4: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Professor Jens-Rainer Ohm RWTH Aachen University Chair and Institute of Communications Engineering Melatener Str. 23 52074 Aachen Germany

Cataloging-in-Publication Data applied for

ISBN 978-3-642-62277-9 ISBN 978-3-642-18750-6 (eBook) DOI 10.1007/978-3-642-18750-6

This work is subject to copyright. AlI rights are reserved, whether the whole or part of the material is concemed, specifically the rights of translation, reprinting, reuse of illustratţons, recitation, broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions ofthe German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under German Copyright Law.

springeronline.com

C Springer-Verlag Berlin Heidelberg 2004 Originally published by Springer-Verlag Berlin Heidelberg New York in 2004 Softcover reprint of the hardcover 18t edition 2004

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

Typesetting: Digital data supplied by author Cover-Design: Design & Production, Heidelberg Printed on acid-free paper 62/3020 Rw 5432 1 O

Page 5: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Preface

Information technology provides a plenty of new ways to process, store, distributeand access audiovisual information. Beyond traditional broadcast and telephonechannels and analog storage media like film or tapes, the emerging Internet, mo­bile networks and digital storage are going to revolutionize the terms of distribu­tion and access . This development is ruled by the convergence of audiovisualmedia technology, information technology and telecommunications technology. Bycapabilities of digital processing, established media like photography, movie, tele­vision and radio are changing their roles and are becoming subsumed by new inte­grated services which are mobile, interactive, pervasive, usable from anywhere,giving freedom to play with, and penetrating everyday life. Multimedia communi­cation establishes new forms of communication between people, between peopleand machines, allows also communication between machines using audiovisualinformation or related feature parameters . Intelligent media interfaces are becom­ing increasingly important, and machine assistance in accessing media, in acquir­ing, organizing, distributing, manipulating and consuming audiovisual informationbecomes inevitable in the future.

This book intends to provide a deep insight into important enabling technolo­gies of multimedia communication systems, which are methods of multimediasignal processing, analysis, identification and recognition, and schemes for multi­media signal representation, compression and expression by features or otherproperties. All these are lively and highly innovative areas at present, where thisbook reviews state-of-the-art technology and its scientific foundations, but shallprimarily support systematic understanding of underlying methods, algorithms andtheir theoretical foundations . It is strongly believed that this is the best approach tocontribute to future improvements in the field.

In part, the book is a substantially upgraded translation of my German languagetextbook on digital image and video coding, which was published by the mid '90s.Since then, the progress that was made in compression of audiovisual data hasbeen breath-taking, and consequently newest developments are reflected, includingthe Advanced Video Coding standard and motion-compensated Wavelet coding.The second basis for this book are my lectures on topics of multimedia communi­cations held regularly at RWTH Aachen University. These treat all aspects ofimage, video and audio compression, including networking interfaces, and alsoinclude multimedia signal identification and recognition . These latter aspects,topically related to the MPEG-7 multimedia content description standard, establisha profound basis for intelligent multimedia systems.

Most chapters are supplemented by homework problems, for which solutions areavailable from http://www.ient.rwth-aachen.de.

Page 6: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

VI

The book would not have been possible without contributions of numerous stu­dents and many other people who have worked with me on topics of image, videoand audio processing, encoding and recognition over more than 15 years. Theseare (in alphabetical order) Sven Bauer, Michael Becker, Markus Beermann, SvenBrandau, Nicole Brandenburg, Michael Briinig, Ferry Bunjamin, Kai Cliiver, Em­manuelle Come, Holger Crysandt, Sila Ekmekci, Christoph Fehn, Ingo Feldmann,Oliver Fromm, Karsten Griineberg, Karsten Griinheit, Jens Guther, Hafez Hadine­jad, Konstantin Hanke, Guido Heising, Hans Dieter Holme, Michael Hoynck,Laetitia Hue, Ebroul Izquierdo, Peter Kauff, Jorg Kramer, Silko Kruse, PatrickLaurent, Thomas Ledworuski, Wolfram Liebsch, Oliver Lietz, Phuong Ma, BelaMakai, Claudia Mayer, Bernd Menser, Domingo Mery, Karsten Muller, PatrickNdjiki-Nya, Bernhard Pasewaldt, Andreas Praatz, Lars Prokop, Oliver Rockinger ,Katrin Riimtnler, Thomas Rusert, Mihaela van der Schaar, Ansgar Schiffler, OliverSchreer, Holger Schulz, Aljoscha Smolic, Frank Sperling, Peter Stammnitz, JensWellhausen , Mathias Wien and DetlefZier. Please forgive me ifI forgot anybody.

Very special thanks are also directed to my scientific mentors Peter Noll, HansDieter Luke and Irmfried Hartmann, all people ofIENT and to my family.

Aachen, August 15, 2003Jens-Rainer Ohm

Page 7: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

Table of Contents

1 Introduction 1

1.1 Concepts and Terminology 11.1.1 Signal Representation by Source Coding .41.1.2 Optimization ofTransmission 61.1.3 Content Identification 7

1.2 Signal Sources and Acquisit ion 9

1.3 Digital Representation of Multimedia Signals 131.3.1 Image and Video Signals 131.3.2 Speech and Audio Signals 18

1.4 Problems 19

Part A: Multimedia Signal Processing and Analysis 21

2 Signals and Sampling 23

2.1 Signals and Fourier Spectra 232.1.1 Spatial Signals and Two-dimensional Spectra 242.1.2 Spatio-temporal Signals .30

2.2 Sampling of Multimedia Signals 332.2.1 The Sampling Theorem 332.2.2 Separable Two-dimensional Sampling 352.2.3 Non-separable Two-dimensional Sampling 372.2.4 Sampling of Video Signals .42

2.3 Problems 46

3 Statistical Analysis of Multimedia Signals .49

3.1 Properties Related to Sample Statistics .49

3.2 Joint Statistical Properties 54

3.3 Spectral Properties 63

3.4 Statistical Modeling and Tests 68

3.5 Statistical Foundations ofInfonnation Theory 73

3.6 Problems 77

4 Linear Systems and Transforms 79

4.1 Two- and Multi-dimensional Linear Systems 794.1.1 Propert ies of Two-dimensional Filters 794.1.2 Frequency Transfer Functions of Multi-dimensional Filters 854.1.3 Image filtering by Matrix Operations 914.1.4 Realization ofTwo-dimens ional Filters 93

Page 8: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

VIII

4.2 Linear Prediction 964.2.1 One- and Two-dimensional Autoregressive Models 964.2.2 Linear Prediction 104

4.3 Linear Block Transforms 1094.3.1 Orthogonal Basis Functions 1094.3.2 Basis Functions of Orthogonal Transforms 1134.3.3 Efficiency ofTransforms 1264.3.4 Fast Transform Algorithms 1294.3.5 Transforms with Block Overlap 130

4.4 Filterbank Transforms 1334.4.1 Decimation and Interpolation 1354.4.2 Properties of Subband Filters 1384.4.3 Implementation of Filterbank Structures 1454.4.4 Wavelet Transform 1514.4.5 Two- and Multi-dimensional Filter Banks 1604.4.6 Pyramid Decomposition 164

4.5 Problems 167

5 Pre- and Postprocessing 171

5.1 Nonlinear Filters 1715.1.1 Median Filters and Rank Order Filters 1725.1.2 Morphological Filters 1755.1.3 Polynomial Filters 179

5.2 Signal Enhancement 180

5.3 Amplitude-value transformations 1825.3.1 Amplitude Mapping Functions 1835.3.2 Probability Distribution Modification and Equalization 185

5.4 Interpolation 1875.4.1 Zero- and First-order Interpolators 1885.4.2 Interpolation using linear Filters 1905.4.3 Interpolation based on Frequency Extension 1935.4.4 Spline and Lagrangian Interpolation 1945.4.5 Interpolation on Irregular 2D Grids 198

5.5 Problems 200

Part B: Content-related Multimedia Signal Analysis 203

6 Perceptual Properties of Vision and Hearing 205

6.1 Properties of Vision 2056.1.1 Physiology of the Eye 2056.1.2 Sensitivity Functions 2076.1.3 Color Vision 210

6.2 Properties of Hearing 2116.2.1 Physiology of the Ear 2116.2.2 Sensitivity Functions 212

Page 9: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

IX

7 Features of Multimedia Signals 217

7.1 Color 2177.1.1 Color Space Transformations 2187.1.2 Representation of Color Features 223

7.2 Texture 2287.2.1 Statistical Texture Analysis 2297.2.2 Spectral Features ofTexture 235

7.3 Edge Analysis 2427.3.1 Edge Detection by Gradient Operators 2427.3.2 Edge Characterization by second Derivative 2447.3.3 Edge Finding and Consistency Analysis 2477.3.4 Edge Model Fitting 2497.3.5 Description and Analysis of Edge Properties 251

7.4 Contour and Shape Analysis 2537.4.1 Contour fitting 2537.4.2 Contour Description by Orientation and Curvature 2597.4.3 Geometric Features and Binary Shape Features 2637.4.4 Projection and geometric mapping 2677.4.5 Moment analysis 2747.4.6 Shape Analysis by Basis Functions 2787.4.7 Three-dimensional Shapes 279

7.5 Correspondence analysis 284

7.6 Motion Analysis 2887.6.1 Mapping of motion into the image plane 2887.6.2 Motion Estimation by the Optical Flow Principle 2927.6.3 Motion Estimation by Matching 2977.6.4 Estimation of Parameters for Warping Grids 3077.6.5 Estimation of non-translational Motion Parameters 3107.6.6 Estimation of Motion Vector Fields at Object Boundaries 3137.6.7 Analysis of 3D Motion 315

7.7 Disparity and Depth Analysis 3167.7.1 Central Projection in Stereoscopic and Multiple-camera Systems 3217.7.2 Epipolar Geometry for arbitrary Camera Configurations 323

7.8 Mosaics 326

7.9 Face Detection and Description 328

7.10 Audio Signal Features 3317.10.1 Basic Features 3327.10.2 Speech Signal Analysis 3337.10.3 Musical Signals, Instruments and Sounds 3347.10.4 Room Properties 344

7.11 Problems 346

8 Signal and Parameter Estimation 353

8.1 Observation and Degradation Models 353

Page 10: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

x

8.2 Estimation based on linear filters 3558.2.1 Inverse Filtering 3558.2.2 Wiener Filtering 356

8.3 Least Squares Estimation 358

8.4 Singular Value Decomposition 361

8.5 ML and MAP Estimation 363

8.6 Kalman Estimation 366

8.7 Outlier rejection in estimation 370

8.8 Problems 373

9 Feature Transforms and Classification 375

9.1 Feature Transforms 3759.1.1 Eigenvector Analysis of Feature Value Sets 3769.1.2 Independent Component Analysis 3779.1.3 Generalized Hough Transform .378

9.2 Feature Value Normalization and Weighting 3799.2.1 Normalization ofFeature Values 3809.2.2 Simple Distance Metrics 3819.2.3 Distance Metrics related to Statistical Distributions 3829.2.4 Distance Metrics based on Class Features 3859.2.5 Reliability measures 387

9.3 Feature-based Comparison 389

9.4 Feature-based Classification 3919.4.1 Linear Classification oftwo Classes 3939.4.2 Generalization of Linear Classification 3989.4.3 Nearest-neighbor and Cluster-based Methods .4009.4.4 Maximum a Posteriori (Bayes) Classification .4049.4.5 Artificial Neural Networks 4079.4.6 Hidden Markov Models .414

9.5 Problems .415

10 Signal Decomposition .....................•...............•..............••••..•........•.••..•........•..•••..417

10.1 Segmentation of Image Signals .41810.1.1 Pixel-based Segmentation .41810.1.2 Region-based Methods 42310.1.3 Texture Elimination .42510.1.4 Relaxation Methods .42810.1.5 Image Region Labeling .433

10.2 Segmentation of Video Signals .43410.2.1 Temporal Segmentation for Scene Changes .43410.2.2 Combination of Spatial and Temporal Segmentation .43610.2.3 Segmentation of Objects based on Motion Information 438

10.3 Segmentation and Decomposition ofAudio Signals .440

Page 11: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

XI

10.4 Problems .441

Part C: Coding of Multimedia Signals 443

11 Quantization and Coding 445

11.1 Scalar Quantization 445

11.2 Coding Theory 45011.2.1 Source Coding Theorem and Rate Distortion Function .45011.2.2 Rate-Distortion Function for Correlated Signals .45111.2.3 Rate Distortion Function for Multi-dimensional Signals .454

11.3 Rate-Distortion Optimization of Quantizers .456

11.4 Entropy Coding .46111.4.1 Properties ofVariable-length Codes .46111.4.2 Huffman Codes 46411.4.3 Systematic Variable-length Codes .46611.4.4 Arithmetic Coding .47011.4.5 Context-dependent Entropy Coding .47511.4.6 Adaptive Entropy Coding .47611.4.7 Entropy Coding and Transmission Errors .47811.4.8 Run-length Coding .47911.4.9 Lempel-Ziv Coding .481

11.5 Vector Quantization .48311.5.1 Basic Principles ofVector Quantization 48311.5.2 Vector Quantization with Uniform Codebooks .48811.5.3 Vector Quantization with Non-uniform Codebooks .49111.5.4 Structured Codebooks 49411.5.5 Rate-constrained Vector Quantization .498

11.6 Sliding Block Coding 50111.6. I Trellis Coding .50211.6.2 Tree Coding 504

11.7 Problems 506

12 Still Image Coding 509

12.1 Compression of Binary Images 509

12.2 Vector Quantization of Images 514

12.3 Predictive Coding 52112.3.1 DPCM Systems 52112.3.2 Predictor filters in 2D DPCM 52412.3.3 Quantization and Encoding of Prediction Errors 52612.3.4 Error propagation in DPCM 531

12.4 Transform Coding 53312.4.1 Block Transform Coding 53312.4.2 Subband and Wavelet Transform Coding 54412.4.3 Vector Quantization of Transform Coefficients 55412.4.4 Adaptation of transform bases to signal properties 557

Page 12: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

XII

12.4.5 Transform coding and transmission losses 559

12.5 Fractal Coding 56212.5.1 Principles of Fractal Transforms 56312.5.2 Collage Theorem 56312.5.3 Fractal Decoding 565

12.6 Region-based coding 57112.6.1 Binary Shape Coding 57I12.6.2 Contour shape coding 57312.6.3 Coding within arbitrary-shaped Regions 575

12.7 Problems 578

13 Video Coding 583

13.1 Methods without Motion Compensation 58313.1.1 Frame Replenishment 58513.1.2 3D Transform and Subband coding 586

13.2 Hybrid Video Coding 59013.2.1 Motion-compensated Hybrid Coders 59013.2.2 Characteristics ofInterframe Prediction Error Signals 59213.2.3 Quantization error feedback and error propagation 59513.2.4 Forward, Backward and Multiframe Prediction 59813.2.5 Bi-directional Prediction 60013.2.6 Improved Methods of motion compensation 60413.2.7 Hybrid Coding ofInteriaced Video Signals 61113.2.8 Scalable Hybrid Coding 61313.2.9 Multiple-description Video Coding 62413.2.10 Optimization ofHybrid Encoders 627

13.3 MC Prediction Coding using the Wavelet Transform 62913.3.1 Wavelet Transform in the Prediction Loop 63013.3.2 Frequency Coding with In-band Motion Compensation 63I

13.4 Spatio-temporal Frequency Coding with MC 63713.4.1 Temporal-axis Haar Filters with MC 63813.4.2 Temporal-axis Lifting Filters for arbitrary MC 64313.4.3 Improvements on Motion Compensation 65313.4.4 Quantization and Encoding of 3D Wavelet Coefficients 65613.4.5 Delay and Complexity onD Wavelet Coders 662

13.5 Encoding of Motion Parameters 66613.5.1 Spatial Contexts in Motion Coding 66613.5.2 Temporal Contexts in Motion Coding 66813.5.3 Fractal Video Coding 670

13.6 Problems 671

14 Audio Coding 673

14.1 Coding of Speech Signals 673

14.2 Waveform Coding ofAudio signals 676

Page 13: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

XIII

14.3 Parametric Coding of Audio and Sound Signals 681

Part D: Applications and Standards 685

15 Transmission and Storage 687

15.1 Convergence ofDigital Multimedia Services 687

15.2 Adaptation to Channel Characteristics 69015.2.1 Rate and Transmission Control 69315.2.2 Error Control 697

15.3 Digital Broadcast. 703

15.4 Media Streaming 706

15.5 Content-based Media Access 711

15.6 Content Protection 715

16 Signal Composition, Rendering and Presentation 717

16.1 Composition and Mixing of Visual Signals 718

16.2 Warping and Morphing 724

16.3 Viewpoint Adaptation 725

16.4 Frame Rate Conversion 728

16.5 Rendering of Image and Video Signals 732

16.6 Composition and Rendering of Audio Signals 735

17 Multimedia Representation Standards 739

17.1 Interoperabilityand Compatibility 739

17.2 Definitions at Systems Level.. 745

17.3 Still Image Coding 75117.3.1 The JBIG Standards 75I17.3.2 The JPEG Standards 75217.3.3 MPEG-4 Still Texture Coding 760

17.4 Video Coding 76017.4.1 ITU-T Recommendations H.261 and H.263 76117.4.2 MPEG-I and MPEG-2 76417.4.3 MPEG-4 Visual 76917.4.4 H.264/MPEG-4 Part 10 Advanced Video Coding (AVC) 774

17.5 Audio Coding 77817.5.1 Speech Coding 77817.5.2 Music and Sound Coding 779

17.6 Multimedia Content Description Standard MPEG-7 78317.6.1 Elements of MPEG-7 Descriptions 78517.6.2 Generic Multimedia Description Concepts 78617.6.3 Visual Descriptors 789

Page 14: Jens-RainerOhm MultimediaCommunicationTechnology978-3-642-18750-6/1.pdf · Preface Information technology provides a plenty ofnew ways to process, store, distribute and access audiovisual

XIV

17.6.4 Audio Descriptors 794

17.7 Multimedia Framework MPEG-21 797

Appendices 801

A Quality Measurement 803

A.l Signal Quality 803A.l .l Objective Signal Quality Measurements 803A.l .2 Subjective Assessment 806

A.2 Classification Quality 808

B Vector and Matrix Algebra 813

C Symbols and Variables 819

D Acronyms 823

References 829

Index 853