Academic Press Library in Signal ProcessingSECTION III VIDEO PROCESSING CHAPTER13 Video Processing—An Overview 343 AmitK. Roy-Chowdhury 4.13.1 Basictasks in video analysis 343 4.13.2

Academic Press Library in

Signal ProcessingVolume 4

Image, Video Processing and Analysis, Hardware,

Audio, Acoustic and Speech Processing

Editors

Joel Trussed

Department of Electrical and Computer

Engineering, North Carolina State University,

Raleigh, NC, USA

Anuj Srivastava

Department of Statistics, Florida State

University, Tallahassee, EL, USA

Amit K. Roy-ChowdhuryDepartment of Electrical Engineering,

University of California, CA, USA

Ankur Srivastava

Department of ECE, University of Maryland,

College Park, MD, USA

Patrick A. NaylorDepartment of Electrical and Electronic Engineering,

Imperial College, Exhibition Road, London, UK

Rama ChellappaDepartment of Electrical and Computer Engineering

and Center for Automation Research, University of

Maryland, College Park, MD, USA

Sergios Theodoridis

Department of Informatics & Telecommunications,

University of Athens, Greece

ELSEVIER

AMSTERDAM • WALTHAM • HEIDELBERG • LO|NEWYORK • OXFORD • PARIS • SAN DIEO

SAN FRANCISCO • SYDNEY • TOKYO

Academic Press is an imprint of Elsevier

tDON

ycrrjij

&jP)^T)

Contents

Introduction xxvn

About the Editors xxx'

Section Editors xxxm

Authors Biography xxxv

SECTION I IMAGE ENHANCEMENT RESTORATION AND DIGITAL IMAGING

CHAPTER 1 Digital Imaging: Capture, Display, Restoration, and

Enhancement 3

H. Joel Trussell

4.01.1 Introduction 3

4.01.2 Image capture 4

4.01.2.1 Digital camera quality issues 4

4.01.2.2 Image and document capture 5

4.01.3 Image displays 5

4.01.3.1 Printing 6

4.01.3.2 Small mobile displays 7

4.01.3.3 Small mobile image processing 7

4.01.4 Restoration 8

4.01.4.1 Classical restoration 8

4.01.4.2 Iterative restoration 8

CHAPTER 2 Image Quality in Consumer Digital Cameras 11

Bruce H. Pillman and James E. Adams, Jr.

4.02.1 Introduction H

4.02.2 Digital camera image processing chain 13

4.02.3 Camera engineering 16

4.02.3.1 Optical issues 17

4.02.3.2 Sensor sampling 25

4.02.3.3 Temporal sampling 27

4.02.3.4 Spectral sensitivity 29

4.02.3.5 Sensitivity and noise 32

4.02.3.6 Resource limitations 33

4.02.4 Quality modeling 34

4.02.5 System interactions 38

4.02.6 Processing methods and algorithms 40

4.02.6.1 Improving images 40

4.02.6.2 Camera control algorithms (A*) 41

4.02.6.3 Color and tone 47

V

vi Contents

4.02.6.4 Spatial processing 53

4.02.6.5 Camera-specific algorithms 63

4.02.7 Applications 71

4.02.8 Open issues and future directions 72

4.02.9 Implementations 73

4.02.10 Datasets 74

Glossary 75

References 76

CHAPTER 3 Image and Document Capture—State-of-the-Art and a

Glance into the Future 79

Kathrin Berkner


4.03.2 Basic steps of conventional document capture processing 80

4.03.2.1 Image capture pipeline 81

4.03.2.2 Digital image processing steps 81

4.03.3 Document and image capture applications that are still challenging today 85

4.03.3.1 Multispectral analysis of documents 86

4.03.3.2 Adaptive learning for document analysis algorithms 86

4.03.4 Looking into the future of document and image capture 87

4.03.4.1 Riding the mobile wave—document image capture with

digital cameras 87

4.03.4.2 Other mobile phone applications 88

4.03.5 Data capture via novel sensor multiplexing techniques 88

4.03.6 Data sets and open source code 90

4.03.7 Conclusions and future trends 90

References 91

CHAPTER 4 Image Display—Mobile Imaging and Interactive

Image Processing 95

Oscar C. Au and Lu Fang

4.04.1 The small screen challenge in mobile imaging 95

4.04.2 Subpixel-based hardware design in mobile display 95

4.04.3 Subpixel-based software design in mobile display: font rendering 99

4.04.4 Subpixel-based software design in mobile display: color image

down-sampling 100

4.04.4.1 Spatial-domain algorithm Design of Subpixel-based

Down-Sampling 101

Contents vii

4.04.4.2 Frequency-domain analysis of subpixel-based down-sampling 106

4.04.5 Conclusion 114

References 114

CHAPTER 5 Image Display—Printing (Desktop, Commercial) 117

Philipp Urban, Simon Stahl, and Edgar Dorsam


4.05.2 Printing technologies 119

4.05.2.1 Gravure printing 119

4.05.2.2 Flexographic printing 120

4.05.2.3 Offset printing 121

4.05.2.4 Inkjet printing 122

4.05.2.5 Electrophotography 124

4.05.3 Workflow 124

4.05.3.1 Basic colorimetry used in printing 125

4.05.3.2 Gamut mapping 127

4.05.3.3 Separation 131

4.05.3.4 Halftoning 134

4.05.4 Printer models 141

4.05.4.1 Mechanical dot gain 142

4.05.4.2 Optical dot gain 142

4.05.4.3 Murray-Davies model 143

4.05.4.4 Neugebauer model 145

4.05.4.5 Yule-Nielsen modified spectral Neugebauer (YNSN) model 146

4.05.4.6 Enhanced YNSN-based models 147

4.05.4.7 Clapper-Yule model 148

4.05.4.8 Other models 150

4.05.5 Research directions in printing 150

4.05.5.1 Spectral printing 150

4.05.5.2 Printed reproductions beyond color 155

4.05.5.3 Functional printing 156

Glossary 156

References 157

CHAPTER 6 Image Restoration: Fundamentals of Image Restoration 165

Stanley J. Reeves


4.06.2 Observation model 165

4.06.2.1 Blur models 167

viii Contents

4.06.2.2 Sensor nonlinearity 168

4.06.2.3 Discretization 169

4.06.2.4 Noise models 170

4.06.3 Restoration algorithms 170

4.06.3.1 Inverse filter 170

4.06.3.2 Wiener filter 172

4.06.3.3 Constrained least squares 174

4.06.3.4 Regularized least squares 176

4.06.3.5 Bayesian solution 180

4.06.4 Boundary effects 181

4.06.5 Blur identification I82

4.06.5.1 Background 182

4.06.5.2 Classical techniques 183

4.06.5.3 Parametric methods 186

4.06.5.4 Dual smoothing methods 187


Glossary I90

References 191

CHAPTER 7 Iterative Methods for Image Restoration 193

Sebastian Berisha and James G. Nagy


4.07.2 Background I94

4.07.2.1 Regularization 195

4.07.2.2 Matrix structures and matrix-vector multiplications 198

4.07.2.3 Preconditioning 201

4.07.2.4 MATLAB notes 203

4.07.3 Model problems 204

4.07.3.1 Spatially invariant Gaussian blur 204

4.07.3.2 Spatially invariant atmospheric turbulence blur 206

4.07.3.3 Spatially variant Gaussian blur 207

4.07.3.4 Spatially variant motion blur 209


4.07.4 Iterative methods for unconstrained problems 215

4.07.4.1 Richardson iteration 216

4.07.4.2 Preconditioned Richardson methods 217

4.07.4.3 Steepest descent methods 222

4.07.4.4 Conjugate gradient methods 223


Contents ix

4.07.5 Iterative methods with nonnegativity constraints 228

4.07.5.1 Projection methods 228

4.07.5.2 Statistically motivated methods 229


4.07.6 Examples 234

4.07.6.1 Unconstrained iterative methods 235

4.07.6.2 Nonnegativity constrained iterative methods 238

4.07.7 Concluding remarks and open questions 242

Acknowledgments 243

References 243

CHAPTER 8 Image Processing at Your Fingertips: The New Horizon

of Mobile Imaging 249

Xin Li

4.08.1 Historical background and overview 249

4.08.2 Mobile imaging: following Feynman's idea on infinitesimal machinery 250

4.08.3 Mobile computing: interacting with computer without an interface 251

4.08.4 Image processing at fingertips: where mobile imaging meets

mobile computing •252

4.08.4.1 Intelligent image acquisition 252

4.08.4.2 Interactive image matting 253

4.08.4.3 Dynamic image mosaicing 254

4.08.4.4 Supervised image restoration 256


4.08.5.1 Mobile photography 259

4.08.5.2 Computer vision and pattern recognition 260

4.08.5.3 Human network interaction 260

4.08.6 Open issues and problems 261

A. Appendix: course material, source codes and datasets 262

References 263

SECTION II IMAGE ANALYSIS AND RECOGNITION

CHAPTER 9 Image Analysis and Recognition 267

Anuj Srivastava

4.09.1 General background 267

4.09.2 Chapter introductions 269

References 270

x Contents

CHAPTER 10 Multi-Path Marginal Space Learning for Object Detection 271

Adrian Barbu


4.10.2 Related work 272

4.10.3 Marginal space learning overview 274

4.10.4 Face detection with Marginal Space Learning 275

4.10.4.1 Scale and rotation invariant eye detection 275

4.10.4.2 Four parameter face detection 276

4.10.5 Multiple computational paths in Marginal Space Learning 279

4.10.6 Experimental validation 282

4.10.6.1 The training dataset 283

4.10.6.2 Implementation details 283

4.10.6.3 Evaluation 284



4.10.9 Datasets 287


Glossary 2§8

References 2&9

CHAPTER 11 Markov Models and MCMC Algorithms in Image Processing 293

Xavier Descombes

4.11.1 Introduction: the probabilistic approach in image analysis 293

4.11.2 Lattice based models and the Bayesian paradigm 294

4.11.2.1 Modeling 294

4.11.2.2 Optimization 296

4.11.3 Some inverse problems 299

4.11.3.1 Denoising and deconvolution: the restoration problem 299

4.11.3.2 Segmentation problem 301

4.11.3.3 Texture modeling 304

4.11.4 Spatial point processes 306

4.11.4.1 Modeling 307

4.11.4.2 Optimizing 309

4.11.5 Multiple objects detection 312

4.11.5.1 Trees counting 312

4.11.5.2 Road network detection 316


References 323

Contents xi

CHAPTER 12 Identifying Multivariate Imaging Patterns: Supervised,Semi-Supervised, and Unsupervised Learning Perspectives 327

Roman Filipovych, Bilwaj Gaonkar, and Christos Davatzikos


4.12.2 Materials 329

4.12.3 Supervised learning of predictive models 329

4.12.4 Semi-supervised learning of predictive models 331

4.12.4.1 Support vector machines (SVM): supervised vs. semi-supervised

classification 331

4.12.4.2 Predicting conversion from MCI to AD via semi-supervisedclassification 332

4.12.5 Unsupervised learning as the means to disentangle heterogeneity 334

4.12.5.1 Exploring disease heterogeneity: a sparse dictionary learningbased approach 334

4.12.5.2 Understanding heterogeneity of normal aging via clustering 335

4.12.6 Summary 338

Acknowledgment 339

References 339

SECTION III VIDEO PROCESSING

CHAPTER 13 Video Processing—An Overview 343

Amit K. Roy-Chowdhury

4.13.1 Basic tasks in video analysis 343

4.13.2 Applications in video analysis 345

4.13.3 Overview of chapters 346

CHAPTER 14 Foveated Image and Video Processing and Search 349

Andrew Floren and Alan C. Bovik


4.14.2 The human visual system 353

4.14.2.1 Anatomy of the eye 353

4.14.2.2 Optics of the eye 354

4.14.2.3 The retina and fovea 355

4.14.2.4 Photoreceptors 356

4.14.2.5 Retinal ganglion cells 357

4.14.2.6 Primary visual cortex 358

4.14.3 Modeling the human visual system 359

xii Contents

4.14.3.1 Psychophysics 359

4.14.3.2 Optical filtering 360

4.14.3.3 Photoreceptor sampling 360

4.14.3.4 Retinal ganglion cell models 360

4.14.3.5 The primary visual cortex and gabor filters 361

4.14.3.6 Eccentricity 361

4.14.3.7 Spatial frequency 362

4.14.3.8 The contrast sensitivity function 363

4.14.3.9 Information theory and entropy 365

4.14.3.10 Efficient coding hypothesis and natural scene statistics 366

4.14.4 Foveated images and video 367

4.14.4.1 Foveation as compression 367

4.14.4.2 Foveated quality assessment 368

4.14.4.3 Foveation filtering 372

4.14.4.4 Foveated motion estimation and compensation 376

4.14.4.5 Foveated rate control 378

4.14.4.6 Multiple fixation points 379

4.14.4.7 Hardware foveation 380

4.14.5 Fixation selection 381

4.14.5.1 Human fixation prediction 381

4.14.5.2 Video compression 384

4.14.5.3 Search and detection 385


4.14.6.1 Teleconferencing 386

4.14.6.2 Teleoperation 386

4.14.6.3 Super high resolution images 387

4.14.6.4 Infrared and wide band imaging 387

4.14.6.5 Detection and search 387


4.14.8 Implementation/code 389

4.14.8.1 Psychophysical models 389

4.14.8.2 Conversion functions 391

4.14.8.3 Foveated quality assessment 391

4.14.8.4 Foveation filtering 393

4.14.9 Data sets 395

4.14.9.1 Psychophysics 395

4.14.9.2 Perceptual quality 395

4.14.9.3 Human fixation prediction 396


Glossary 397

References 398

Contents xiii

CHAPTER 15 Segmentation-Free Biometric Recognition Using Correlation

Filters 403

Andres Rodriguez and B.V.K. Vijaya Kumar


4.15.2 Advanced correlation filters 405

4.15.2.1 Matched filter and efficient computation of correlation filters 407

4.15.2.2 Equal correlation peak synthetic discriminant function

(ECPSDF) filter 408

4.15.2.3 Minimum variance synthetic discriminant function

(MVSDF) filter 409

4.15.2.4 Minimum average correlation energy (MACE) filter 410

4.15.2.5 Optimal tradeoff synthetic discriminant function (OTSDF) filter 411

4.15.2.6 Minimum noise and correlation energy (MINACE) filter 413

4.15.2.7 Gaussian MACE (GMACE) filter 414

4.15.2.8 Unconstrained CFs: MACH and UMACE and UMSESDF filters 415

4.15.2.9 Average of synthetic exact filters (ASEF) 418

4.15.2.10 Minimum output sum of squared error (MOSSE) filter 419

4.15.2.11 Optimal tradeoff" circular harmonic function (OTCHF) filter 420

4.15.2.12 MACE-mellin radial harmonic (MACE-MRH) filter 424

4.15.2.13 Distance classifier correlation filter (DCCF) 428

4.15.2.14 Polynomial correlation filters (PCFs) 430

4.15.2.15 Quadratic correlation filters 432

4.15.3 Pre- and post-processing images 435

4.15.3.1 Preprocessing the training images 435

4.15.3.2 Selecting and registering training images 435

4.15.3.3 Transforming the images and the filter to produce sharp peaks 436

4.15.3.4 Normalizing correlation output 437

4.15.3.5 Selecting parameters 440

4.15.4 Correlation filters for videos 441

4.15.4.1 Frame to frame CFs 441

4.15.4.2 Adaptive CFs 441

4.15.4.3 Multi-frame correlation filter 441

4.15.4.4 Kalman correlation filter 442

4.15.4.5 Action MACH filter 446

4.15.5 Experiments: recognizing subjects in video only using ocular regions 447


A Appendix 456

A.1 Minimizing a quadratic subject to linear constraints 456

A.2 Minimizing a ratio of quadratic terms 457

References 457

xiv Contents

CHAPTER 16 Dynamical Systems in Video Analysis 461

Gianfranco Doretto, Avinash Ravichandran, Rene Vidal, and

Stefano Soatto

4.16.1 Introduction461

4.16.2 Model463

4.16.3 Identification463

4.16.3.1 Constrained identification 466

4.16.4 Comparing dynamical models470

4.16.4.1 Distances between LDSs470

4.16.4.2 Kernels between LDSs472

4.16.5 Applications473

4.16.5.1 Synthesis and editing474

4.16.5.2 Recognition476

4.16.5.3 Segmentation477

4.16.5.4 Registration479

4.16.6 Datasets481

4.16.6.1 Recognition datasets481

4.16.6.2 Segmentation datasets482

4.16.7 Discussion484

References485

CHAPTER 17 Image-Based Rendering 489

Yao-Jen Chang and Tsuhan Chen

4.17.1 Introduction489

4.17.2 Integral imaging490

4.17.3 Sampling494

4.17.3.1 The light field and the plenoptic function 494

4.17.3.2 The 2D and 3D light fields 495

4.17.3.3 The 4D light field 497

4.17.3.4 Spectral analysis on light field sampling 499

4.17.4 Scene representation501

4.17.4.1 No geometry502

4.17.4.2 Implicit geometry502

4.17.4.3 Explicit geometry503

4.17.5 Rendering506

4.17.5.1 Light field fusion 506

4.17.5.2 Plane sweeping and spherical surface sweeping 507

4.17.5.3 Image-based visual hull and voxel sweeping 511

Contents xv

4.17.5.4 Depth sweeping 513

4.17.5.5 Depth-image-based rendering 514


4.17.6.1 Motivation and background 519

4.17.6.2 System architecture 521

4.17.6.3 Camera calibration 522

4.17.6.4 Scene depth estimation 522

4.17.6.5 Rendering 524

4.17.6.6 Post-processing and video broadcasting 524


4.17.8 Implementation/code 526

4.17.9 Data sets 527


Glossary 528

References 529

CHAPTER 18 Activity Retrieval in Large Surveillance Videos 535

Greg Castanon, Pierre-Marc Jodoin, Venkatesh Saligrama, and

Andre Caron


4.18.1.1 Previous work 537

4.18.2 Feature extraction 538

4.18.2.1 Structure 538

4.18.2.2 Resolution 539

4.18.2.3 Features 539

4.18.3 Indexing 540

4.18.3.1 Hashing 541

4.18.3.2 Data structure542

4.18.3.3 Building the lookup table 543

4.18.4 Search engine 544

4.18.4.1 Queries544

4.18.4.2 Partial matches 545

4.18.4.3 Full matches 546

4.18.5 Experimental results 551

4.18.5.1 Datasets 551

4.18.5.2 Comparison with HDP-based video search 551

4.18.5.3 Eleven tasks 554

4.18.5.4 Dynamic programming 554

4.18.5.5 Discussion 556

xvi Contents


c<7References J

CHAPTER 19 Multi-Target Tracking in Video 561

Fabio Poiesi and Andrea Cavallaro


4.19.2 Problem formulation 561

4.19.3 Challenges565

4.19.4 Feature extraction 568

4.19.5 Prediction 570

4.19.6 Localization and association 571

4.19.6.1 Sequential localization 573

4.19.6.2 Batch association 574

4.19.7 Track initialization and termination 576

4.19.8 Scene contextual information 576

4.19.9 Summary and outlook 578

References 579

CHAPTER 20 Compressive Sensing for Video Applications 583

Ashok Veeraraghavan, Aswin C. Sankaranarayanan, and

Richard G. Baraniuk


4.20.1.1 A brief tour of compressive sensing 583

4.20.1.2 Video compressive sensing 587

4.20.2 Imaging architectures 588

4.20.2.1 Classification of imaging architectures 588

4.20.2.2 Single pixel camera 589

4.20.2.3 Flutter shutter video camera 589

4.20.2.4 Voxel sub-sampling camera 590

4.20.2.5 Programmable pixel compressive camera 592

4.20.2.6 Coded aperture video camera 592

4.20.2.7 Hyper-spectral video camera 593

4.20.2.8 Common multiplexing constraints 594

4.20.2.9 Comparison of architectures 594

4.20.3 Signal models and algorithms 594

4.20.3.1 Wavelet sparsity 595

4.20.3.2 Total variation 595

4.20.3.3 Dictionary learning 596

4.20.3.4 Motion flow 596

Contents xvii

4.20.4 Existing systems for video compressive sensing 597

4.20.4.1 Coded strobing camera 597

4.20.4.2 Single pixel camera 597

4.20.4.3 Coded aperture snapshot spectral imager (CASSI) 599

4.20.4.4 Coded exposure video camera 600

4.20.4.5 Flexible voxels 601

4.20.4.6 Programmable pixel compressive camera (P2C2) 601

4.20.4.7 Flutter shutter video camera (FSVC) 603

4.20.5 Discussion 603

Acknowledgments 604

References 604

CHAPTER 21 Virtual Vision for Camera Networks Research 609

Faisal Z. Qureshi and Demetri Terzopoulos


4.21.1.1 Virtual vision 609

4.21.1.2 Overview 611

4.21.2 Related work 611

4.21.3 Virtual vision simulators 612

4.21.3.1 A train station ' 612

4.21.3.2 A floor of an office tower 613

4.21.3.3 Simulated cameras 614

4.21.3.4 Visual analysis 614

4.21.3.5 Image-driven PTZ zoom and fixation 615

4.21.3.6 Behavior-based camera nodes 616

4.21.4 Prototype camera networks 618

4.21.4.1 Active camera scheduling 618

4.21.4.2 Collaborative persistent surveillance 620

4.21.4.3 Proactive PTZ control 620

4.21.4.4 Multi-tasking PTZ cameras 620


Glossary 624

Acknowledgments 624

References 624

SECTION IV HARDWARE AND SOFTWARE

CHAPTER 22 Introduction: Hardware and Software 629

Ankur Srivastava

4.22.1 Hardware and software systems 630

4.22.2 New developments: 3D integration 631

xviii Contents

CHAPTER 23 Distributed Smart Cameras for Distributed Computer Vision 633

Marilyn Wolf and Jason Schlessman


4.23.2 Basic techniques in computer vision 634

4.23.3 Camera calibration 635

4.23.4 Gesture recognition 637

4.23.5 Tracking with overlapping fields-of-view 638

4.23.6 Tracking in sparse camera networks 638

4.23.7 Summary 640

Acknowledgments640

References 640

CHAPTER 24 Mapping Parameterized Dataflow Graphs onto

FPGA Platforms 643

Hsiang-Huang Wu, Chung-Ching Shen, Hojin Kee, Nimish Sane,

William Plishker, and Shuvra S. Bhattacharyya


4.24.2 Background 644

4.24.2.1 FPGA technology 644

4.24.2.2 Model-based design of signal processing systems 646

4.24.2.3 Parameterized dataflow 648

4.24.3 Dynamic reconfiguration techniques in FPGAs 649

4.24.4 Modeling dynamic reconfiguration using PSDF techniques 651

4.24.4.1 Execution of PSDF graphs 652

4.24.4.2 PSDF design methodology 653

4.24.5 Hardware mapping 654

4.24.5.1 Modular mapping 654

4.24.5.2 Schedule-based mapping 658

4.24.6 Case studies 660

4.24.6.1 Reconfigurable phase-shift keying 660

4.24.6.2 Foreground/background extraction 664


References 670

CHAPTER 25 Distributed Estimation

Yuzhe Xu, Vijay Gupta, and Carlo Fischione

4.25.1 Notation 675

4.25.2 Network with a star topology 675

4.25.2.1 Static sensor fusion 675

Contents xix

4.25.2.2 Dynamic sensor fusion 679

4.25.3 Non-ideal networks with star topology 683

4.25.3.1 Sensor fusion in presence of message loss 683

4.25.3.2 Sensor fusion with limited bandwidth 686

4.25.4 Network with arbitrary topology 692

4.25.4.1 Static sensor fusion with limited communication range 692

4.25.4.2 Dynamic sensor fusion 693

4.25.5 Computational complexity and communication cost 699

4.25.5.1 On computational complexity 699

4.25.5.2 On communication cost 701

4.25.5.3 Summary of the computational complexity and communication cost 701


Appendix 702

A.1 Optimal mean square estimate of a random variable 702

A.2 Matrix inversion formula 704

References 705

SECTION V AUDIO SIGNAL PROCESSING

CHAPTER 26 Introduction to Audio Signal Processing 709

Patrick A. Naylor


4.26.2 Overview of the chapters 710

4.26.2.1 Music signal processing 710

4.26.2.2 Audio coding 710

CHAPTER 27 Music Signal Processing 713

Meinard Muller and Anssi Klapuri


4.27.2 Pitch and harmony 714

4.27.2.1 Music representations 714

4.27.2.2 Spectrogram 715

4.27.2.3 Log-frequency spectrogram 716

4.27.2.4 Chromagram 719

4.27.2.5 Applications 721

4.27.3 Tempo and beat 724

4.27.3.1 Onset detection and novelty curve 725

4.27.3.2 Periodicity analysis and tempo estimation 728

4.27.3.3 Beat tracking 733

xx Contents

4.27.3.4 Applications735

4.27.4 Timbre and instrumentation 737

4.27.4.1 Spectral envelope738

4.27.4.2 Temporal evolution 741

4.27.4.3 Musical instrument recognition 742

4.27.4.4 Music classification and similarity 743

4.27.5 Melody and vocals 744

4.27.5.1 Melody transcription746

4.27.5.2 Vocals separation749

4.27.5.3 Applications751

References752

CHAPTER 28 Perceptual Audio Coding 757

Jurgen Herre and Sascha Disch

Introduction 757

4.28.1 Principles and background758

4.28.1.1 Redundancy and irrelevancy 758

4.28.1.2 Auditory perception and psychoacoustics 758

4.28.1.3 Sound quality760

4.28.2 Concepts and architectures 761

4.28.2.1 Monophonic audio coding and decoding 761

4.28.2.2 Practical issues763

4.28.2.3 Joint stereo coding766

4.28.2.4 Tools for coding enhancement 771

4.28.2.5 Bandwidth extension 771

4.28.2.6 Parametric multi-channel coding/spatial audio coding 774

4.28.2.7 Parametric coding of audio objects777

4.28.2.8 Scalable coding779

4.28.3 Standards 780

4.28.3.1 MPEG-1 780

4.28.3.2 MPEG-2 781

4.28.3.3 MPEG-2 AAC 783

4.28.3.4 MPEG-4 783

4.28.3.5 MPEG AAC family (MPEG-2/4 AAC, HE-AAC, HE-AAC

v2 and MPEG-D USAC) 784

4.28.3.6 MPEG-4 low delay AAC family (AAC-LD, AAC-ELD

and AAC-ELD v2) 787

4.28.3.7 MPEG-4 lossless audio coding (ALS, SLS) 788

4.28.3.8 MPEG-D MPEG surround 789

4.28.3.9 MPEG-D MPEG Spatial Audio Object Coding (SAOC) 790

Contents xxi

4.28.3.10 MPEG-D unified speech and audio coding (USAC) 792

4.28.4 Summary and conclusions 793

References 794

SECTION VI ACOUSTIC SIGNAL PROCESSING

CHAPTER 29 Introduction to Acoustic Signal Processing 803

Patrick A. Naylor



4.29.2.1 Sound field synthesis 804

4.29.2.2 Acoustic echo control 804

4.29.2.3 Dereverberation 805

CHAPTER 30 Acoustic Echo Control 807

Gerald Enzner, Herbert Buchner, Alexis Favrot and Fabian Kuech


4.30.1.1 Problem statement and early developments 809

4.30.1.2 Typical building blocks of current systems 810

4.30.1.3 Applications of acoustic echo control 815

4.30.1.4 Quality measures 816

4.30.1.5 Outline of this chapter 818

4.30.2 Echo cancellation and postfiltering 819

4.30.2.1 Uncertainty model of the linear echo path 820

4.30.2.2 Generalized Wiener filter architecture 822

4.30.2.3 General and special cases 826

4.30.2.4 Dynamical echo path modeling 827

4.30.2.5 Adaptive algorithms 829

4.30.3 Echo suppression 832

4.30.3.1 Alternative problem statement 833

4.30.3.2 Echo path estimation 835

4.30.3.3 Echo suppression filter 839

4.30.3.4 Perceptual acoustic echo suppression 840

4.30.3.5 Derivation of echo estimation filters 841

4.30.4 Multichannel acoustic echo cancellation 843

4.30.4.1 Description of signals and systems 844

4.30.4.2 Elements from estimation and information theory 846

4.30.4.3 Elements from psychoacoustics 847

4.30.4.4 MIMO processing and elements from wave physics 850

4.30.5 Nonlinear modeling and cancellation of echo 851

xxii Contents

4.30.5.1 Nonlinear acoustic echo paths 852

4.30.5.2 Cascaded structure 853

4.30.5.3 Power filters 855

4.30.5.4 Second-order Volterra filters 857

4.30.6 Application to realistic and real systems 859

4.30.6.1 Car environment 859

4.30.6.2 Desktop conferencing 861

4.30.6.3 Living room 863

4.30.6.4 Mobile phones 865

4.30.7 Links to codes and recommendations 867

4.30.8 Conclusions, open issues, future trends 868

Glossary 869

References 870

CHAPTER 31 Dereverberation 879

Patrick A. Naylor

4.31.1 Introduction and overview 879

4.31.2 Example applications 880

4.31.2.1 Hands-free speech telecommunication 880

4.31.2.2 Hearing aids 881

4.31.2.3 Music and film production 881

4.31.3 Room reverberation 883

4.31.3.1 Room acoustics 883

4.31.3.2 Listener perception of reverberation 885

4.31.3.3 Simulating room acoustics 885

4.31.3.4 Overview of the main approaches for dereverberation 886

4.31.4 Measurement of reverberation 887

4.31.4.1 Reverberation time 888

4.31.4.2 The energy decay curve 889

4.31.4.3 Channel-based measures 890

4.31.4.4 Signal-based measures 891

4.31.5 Spatial filtering for dereverberation 892

4.31.5.1 Concept overview 892

4.31.5.2 Delay-and-sum beamformer 893

4.31.5.3 Beamforming in 2D and 3D 894

4.31.5.4 Further advances in beamforming 895

4.31.6 Speech enhancement methods for dereverberation 896


4.31.6.2 Methods based on linear prediction and harmonicity 897

4.31.6.3 Methods based on spectral subtraction 898

Contents xxiii

4.31.7 Acoustic channel-based methods for dereverberation 899


4.31.7.2 Blind SIMO acoustic system identification 900

4.31.7.3 Inverse filtering 902

4.31.8 Summary and conclusions 908

Acknowledgments 909

References 909

CHAPTER 32 Sound Field Synthesis 915

Rudolf Rabenstein, Sascha Spors and Jens Ahrens


4.32.1.1 Virtual acoustic environments 915

4.32.1.2 Rendering of virtual acoustic environments 916

4.32.1.3 Organization of this chapter 918

4.32.2 Acoustic wave equation 919

4.32.2.1 Coordinate free representation 919

4.32.2.2 Coordinate systems 920

4.32.3 Signal representations 924

4.32.3.1 Introduction 924

4.32.3.2 The phasor approach for the wave equation 925

4.32.3.3 Fourier transformations in time and space 928

4.32.3.4 Circular harmonics 934

4.32.3.5 Spherical harmonics 939

4.32.4 Response to sound sources 946

4.32.4.1 The inhomogenous wave equation 946

4.32.4.2 Green's function 947

4.32.4.3 Calculation of the response to interior and exterior sources 948

4.32.4.4 Calculation of the Green's function 949

4.32.5 Physical foundations of sound field synthesis 951

4.32.5.1 The Kirchhoff-Helmholtz integral equation 951

4.32.5.2 Monopole only synthesis 953

4.32.5.3 Three-dimensional synthesis 955

4.32.5.4 2.5-dimensional synthesis 955

4.32.6 Near-field Compensated Higher Order Ambisonics (NFC-HOA) 956

4.32.6.1 Outline 956

4.32.6.2 Spherical secondary source distributions 957

4.32.6.3 Circular secondary source distributions 959

4.32.6.4 Spatial sampling and application example 961

4.32.6.5 Extensions to basic principle 962

4.32.7 Spectral Division Method (SDM) 962

xxiv Contents

4.32.7.1 Planar secondary source distributions 962

4.32.7.2 Linear secondary source distributions 965


4.32.7.4 Approximate solution for non-planar and non-linear secondary

source distributions 968

4.32.8 Wave Field Synthesis (WFS) 971

4.32.8.1 Outline 971

4.32.8.2 Three-dimensional synthesis 972

4.32.8.3 2.5-dimensional synthesis 973


4.32.8.5 Extensions to basic principle 974

Acknowledgments 975

References 975

SECTION VII SPEECH PROCESSING__

CHAPTER 33 Introduction to Speech Processing 983

Patrick A. Naylor



4.33.2.1 Speech production modeling and analysis 984

4.33.2.2 Speech enhancement 984

CHAPTER 34 Speech Production Modeling and Analysis 985

Jon Gudnason


4.34.1.1 Mathematical background 985

4.34.1.2 Historical overview 986

4.34.1.3 Phonetics and articulation 987

4.34.1.4 Applications of speech modeling 991

4.34.2 Speech production modeling 991

4.34.2.1 Linear source-filter model 992

4.34.2.2 Vocal tract modeling 992

4.34.2.3 Modeling the voice source 996

4.34.2.4 Estimating autoregressive parameters 996

4.34.2.5 Current work on time-varying estimation 1000

4.34.3 Estimating the voice source signal 1000

4.34.3.1 Time varying covariance analysis of speech 1001

4.34.3.2 Closed-phase covariance analysis 1005

Contents xxv

4.34.3.3 Other methods and current research 1007

4.34.4 Glottal closure instants 1007

4.34.4.1 Overview of GCI detection 1008

4.34.4.2 Preprocessing for GCI detection 1008

4.34.4.3 Postprocessing: determining the significance of an energy peak 1011

4.34.5 Voice source modeling 1012

4.34.5.1 Piecewise approximation of the voice source 1013

4.34.5.2 Data-driven analysis of the voice source 1013

References 1016

CHAPTER 35 Enhancement 1019

Mike Brookes and Nikolay D. Gaubitch


4.35.2 Speech enhancement methods 1020

4.35.2.1 Noise cancelation 1021

4.35.2.2 Static filtering 1022

4.35.2.3 Signal dependent filtering 1023

4.35.2.4 Time-frequency gain modification 1029

4.35.2.5 Minimum mean square estimators (MMSE) 1031

4.35.2.6 Temporal gain changes 1033

4.35.2.7 Excision/interpolation 1034

4.35.3 Enabling algorithms 1037

4.35.3.1 Time-frequency analysis and synthesis 1037

4.35.3.2 Voice activity detection 1039

4.35.3.3 Noise estimation 1040

4.35.3.4 Pitch tracking 1041

4.35.4 Intelligibility and quality measures 1042

4.35.4.1 Objectives 1042

4.35.4.2 Evaluation of speech intelligibility 1042

4.35.4.3 Evaluation of speech quality 1044

References 1048

INDEX 1057

Documents

Academic Press Library in Signal ProcessingSECTION III VIDEO PROCESSING CHAPTER13 Video Processing—An Overview 343 AmitK. Roy-Chowdhury 4.13.1 Basictasks in video analysis 343 4.13.2