Chapter 2. Advanced Telecommunications and Signal

Chapter 2. Advanced Telecommunications and Signal Processing Program

Chapter 2. Advanced Telecommunications and SignalProcessing Program

Academic and Research Staff

Professor Jae S. Lim

Visiting Scientists and Research Affiliates

Dr. Hae-Mook Jung

Graduate Students

John G. Apostolopoulos, David M. Baylon, Shiufun Cheung, Raynard O. Hinds, Kyle K. Iwai, Peter A. Monta,Julien J. Nicolas, Alexander Pfajfer, Eric C. Reed, Lon E. Sunshine, Carl Taniguchi, Chang Dong Yoo

Technical and Support Staff

Cindy LeBlanc, Denise M. Rossetti

2.1 Introduction

The present television system was designed nearly40 years ago. Since then, there have been signif-icant developments in technology which are highlyrelevant to the television industries. For example,advances in very large scale integration (VLSI)technology and signal processing make it feasibleto incorporate frame-store memory and sophisti-cated signal processing capabilities into a televisionreceiver at a reasonable cost. To exploit this newtechnology in developing future television systems,Japan and Europe have established large laborato-ries, funded by government or industry-wideconsortia. The need for this type of organization inthe United States was considered necessary for thebroadcasting and equipment manufacturing indus-tries. In 1983 the advanced television researchprogram (ATRP) was established at MIT by a con-sortium of U.S. companies.

The major objectives of the ATRP are:

* To develop the theoretical and empirical basisfor improving existing television systems, aswell as the design of future television systems;

* To educate students through television-relatedresearch and development and motivate themto pursue careers in television-related indus-tries;

* To facilitate the continuing education of scien-tists and engineers already working in theindustry;

* To establish a resource center where problemsand proposals can be brought for discussionand detailed study; and

* To transfer the technology developed from theATRP program to the television-related indus-tries.

Research areas of the program include the designof a receiver-compatible advanced television (ATV)system and digital ATV system, as well as thedevelopment of transcoding methods. Significantadvances have already been made in some ofthese research areas. A digital ATV system wasdesigned and tested in the fall of 1992 by the FCCfor its possible adoption as the U.S. HDTV standardfor terrestrial broadcasting. Some elements of thissystem are likely to be included in the U.S. HDTVstandard.

In addition to research on advanced televisionsystems, our research program also includesresearch on speech processing. Current researchtopics include the development of a new speechmodel and algorithms to enhance speech degradedby background noise.

2.2 ATRP Facilities

The ATRP facilities are currently based on anetwork of eight SUN workstations and threeDecStation 5000 workstations. There is approxi-mately 14.0 GB of disk space, distributed amongthe various machines. Attached to one of the SUNworkstations is a VTE display system with 256 MBof RAM. This display system is capable of drivingthe Sun-4 monitors, or a 29-inch Conrac monitor, atrates up to 60 frames/sec. In addition to displayinghigh-resolution real-time sequences, the ATRP facil-ities include a Metheus frame buffer which drives a

309


Sony 2Kx2K monitor. For hard copy output, the labuses a Kodak XL7700 thermal imaging printerwhich can produce 2Kx2K color or black and whiteimages on 11xl 1 inch photographic paper.

Other peripherals include tape drives (Exabyte 8mm and AMPEX DST600), a 16-bit digital audiointerface with two channels and sampling rates upto 48 kHz per channel, and an audio workstationwith power amplifier, speakers, CD player, tapedeck, etc. In addition, the lab has a 650 MB opticaldisk drive, a CD-ROM drive, two laser printers, andone color printer. The ATRP facilities also includea PowerMacintosh, two Mac II computers, and anApple LaserWriter for preparing presentations.

A fast network (FDDI) is under consideration toaugment the current 10 Mbps Ethernet. The newnetwork would enable much faster data transfer todisplay devices and would support large NFS trans-fers more easily.

2.3 Signal Representations forVery-low-bit-rate Video Compression

Sponsors

AT&T FellowshipAdvanced Telecommunications Research Program

Project Staff

John G. Apostolopoulos

Video, which plays an important role in many appli-cations today, is expected to become even moreimportant in the near future with the advent of multi-media and personal communication devices. Thelarge amount of data contained in a video signal,together with the limited transmission capacity inmany applications, requires the compression of avideo signal. A number of video compression algo-rithms have been developed for different applica-tions from video-phone to high-definition television.These algorithms perform reasonably well atrespective bit rates of 64 kb/s to tens of Mb/s.However, many applications, particularly thoseassociated with portable or wireless devices, willprobably be required in the near future to operate atconsiderably lower bit rates, possibly as low as 10kb/s. The video compression methodologies devel-oped to date cannot be applied at such low bitrates. The goal of this research is to create effi-cient signal representations which can lead toacceptable video quality at extremely low bit rates.

Conventional video compression algorithms may bedescribed as block-based coding schemes; they

partition each frame into square blocks and thenindependently process each block. Examples ofthese compression techniques include block-basedtemporal motion-compensated prediction andspatial block discrete-cosine transformation. Block-based processing is the basis for virtually all videocompression systems today because it is a simpleand practical approach for achieving acceptablevideo quality at required bit rates. However, block-based coding schemes can not effectively representa video signal at very low bit rates because thesource model is extremely limited: Block-basedschemes inherently assume a source model of(translational) moving square blocks, but a typicalvideo scene is not composed of translated squareblocks. In effect, block-based schemes impose anartificial structure on the video signal beforeencoding, instead of exploiting the structureinherent to a particular video scene.

The goal of this research is to develop signal repre-sentations that better match the structure that existswithin a video scene. By identifying and efficientlyrepresenting this structure, it may be possible toproduce acceptable video quality at very low bitrates. For example, since real scenes containobjects, a promising source model is one with two-or three-dimensional moving objects. Thisapproach may provide a much closer match to thestructure in a video scene than the block-basedschemes mentioned above. Three fundamentalissues must be addressed for the success of thisapproach: (1) appropriate segmentation of the videoscene into objects, (2) encoding the segmentationinformation, and (3) encoding the object interiors.In regard to the third issue, significant statisticaldependencies exist in regions belonging to eachobject and must be exploited. Conventionalapproaches to encoding arbitrarily shaped regionsare typically simple extensions of the block-basedapproaches, and hence suffer from inefficiencies. Anumber of novel methods for efficiently representingthe interior regions have been developed.

2.3.1 Publication

Apostolopoulos, J., A. Pfajfer, H.M. Jung, and J.S.Lim. "Position-Dependent Encoding." Pro-ceedings of ICASSP V: 573-576 (1994).

2.4 Constant Quality Video Coding

Sponsors

INTEL FellowshipAdvanced Telecommunications Research Program

310 RLE Progress Report Number 137


Project Staff

David M. Baylon

Traditional video compression algorithms for fixedbandwidth systems typically operate at fixed tar-geted compression ratios. A problem with fixed bitrate systems is that video quality can vary greatlywithin an image and across an image sequence. Inregions of low image complexity, quality is high,whereas in regions of high image complexity,quality can be low due to coarse quantization.

Recently, with the development of asynchronoustransfer mode (ATM) switching for broadband net-works, there has been increasing interest in vari-able rate video coding. Asynchronous channelscan use bandwidth efficiently and provide a timevarying bit rate well suited for the nonstationarycharacteristics of video. By allowing the com-pression ratio to vary with scene contents and com-plexity, constant quality video can be delivered.

This research focuses on methods for encodinghigh-definition video to yield constant high qualityvideo while minimizing the average bit rate. Simplemean square error approaches are insufficient formeasuring video quality, therefore distortion func-tions which incorporate visual models are beinginvestigated. Adaptive quantization schemes tomaintain constant quality are being studied also.This study includes development of a spatially andtemporally adaptive weighting matrix for frequencycoefficients based upon a visual criterion. Forexample, high-frequency coefficients correspondingto edges can be more visually important than thosecorresponding to texture. By studying how to varybit allocation with scene contents to maintain aspecified level of quality, a statistical characteriza-tion of the variable bit rate video coding can beobtained.

2.5 Signal Representations in AudioCompression

Sponsor

Advanced Telecommunications Research Program

Project Staff

Shiufun Cheung

The demand for high-fidelity audio in transmissionsystems such as digital audio broadcast (DAB) andhigh-definition television (HDTV) and in commercialproducts such as MiniDisc and Digital CompactCassette has generated considerable interest inaudio compression schemes. The common objec-

tive is to achieve high quality at a rate significantlysmaller than the 16 bits/sample used in current CDand DAT systems. We have been consideringapplications to HDTV; an earlier implementation,the MIT Audio Coder (MIT-AC), is one of thesystems that was considered for inclusion in theU.S. HDTV standard. In this research, we seek tobuild upon our previous efforts by studying animportant aspect of audio coder design: theselection of an appropriate and efficient acousticsignal representation.

In conventional audio coders, short-time spectraldecomposition serves to recast the audio signal in arepresentation that is not only amenable to percep-tual modeling but also conducive to deriving trans-form coding gains. This is commonly achieved by amultirate filter bank, or, equivalently, a lapped trans-form.

In the first stage of this research, we replace ouroriginal uniform multirate filter bank with a nonuni-form one. Filter banks with uniform subbands,while conceptually simpler and easier to implement,force an undesirable tradeoff between time and fre-quency resolution. For example, if the analysisbandwidth is narrow enough to resolve the criticalbands at low frequencies, poor temporal resolutioncan result in temporal artifacts such as the "pre-echo" effect. A nonuniform filter bank, on the otherhand, allows enough flexibility to simultaneouslysatisfy the requirements of good pre-echo controland frequency resolution consistent with critical-band analysis. In our prototype implementation, weuse a hierarchical structure based on a cascade ofM-band perfect-reconstruction cosine-modulatedfilter banks. Testing the new representation hasshown improvement over a uniform filter bankscheme. While pre-distortion still exists in transientsignals, listening sessions show that pre-echo isinaudible with the new filter bank.

In the second stage of this research, we arestudying the incorporation of biorthogonality intoMalvar's lapped transforms. Lapped transforms arepopular implementations of cosine-modulated filterbanks. They form the major building blocks in bothour uniform and nonuniform signal decompositionschemes. Conventional lapped transforms aredesigned to be orthogonal filter banks in which theanalysis and synthesis filters are identical. Byallowing the analysis and synthesis filters to differ,we create a biorthogonal transform with additionaldegrees of freedom over the orthogonal case. Thisincreased flexibility is very important for achieving aspectral analysis that approximates the criticalbands of the human ear well enough for subse-quent perceptual modeling.


2.5.1 Publications

Cheung, S., and J.S. Lim. "Incorporation ofBiorthogonality into Lapped Transforms forAudio Compression." Proceedings of ICASSP.Forthcoming.

Monta, P.A., and S. Cheung. "Low Rate AudioCoder with Hierarchical Filter Banks and LatticeVector Quantization." Proceedings of ICASSP II:209-212 (1994).

2.6 Pre-Echo Detection and Reduction

SponsorAdvanced Telecommunications Research Program

Project StaffKyle K. Iwai

In recent years, there has been an increasinginterest in data compression for storage and datatransmission. In the field of audio processing,various varieties of transform coders have success-fully demonstrated reduced bit rates while main-taining high audio quality. However, there arecertain coding artifacts which are associated withtransform coding. The pre-echo is one suchartifact. Pre-echos typically occur when a sharpattack is preceded by silence. Quantization noiseadded by the coding process is normally hiddenwithin the signal. When the coder is stationary overthe window length, an assumption breaks down in atransient situation. The noise is unmasked in thesilence preceding the attack, creating an audibleartifact called a pre-echo.

If the length of the noise can be shortened to about5 ms, psycho-acoustic experiments tell us that thenoise will not be audible. Using a shorter windowlength shortens the length of the pre-echo.However, shorter windows also have poorer fre-quency selectivity and poorer coder efficiency. Onesolution is to use shorter windows only when thereis a quiet region followed by a sharp attack.

In order to use adaptive window length selection, adetector had to be designed. A simple detectorwas implemented which compares the variancewithin two adjacent sections of audio. In a transientsituation, the variance suddenly increases from onesection to the next. The coder then uses shortwindows to reduce the length of the pre-echo, ren-dering the artifact inaudible.

2.7 Video Source Coding for High-Definition Television

Sponsor


Project Staff

Peter A. Monta

Efficient source coding is the technology for high-definition television (HDTV) which will enable broad-casting over the relatively narrow channels (e.g.,terrestrial broadcast and cable) envisioned for thenew service. Coding rates are on the order of 0.3bits/sample, and high quality is a requirement. Thiswork focuses on new source coding techniques forvideo relating to representation of motion-compensated prediction errors, quantization andentropy coding, and other system issues.

Conventional coders represent video with the use ofblock transforms with small support (typically 8x8pixels). Such independent blocks result in a simplescheme for switching a predictor from a motion-compensated block to a purely spatial block; this isnecessary to prevent the coder from wastingcapacity in some situations.

Subband coders of the multiresolution or wavelettype-with their more desirable localization proper-ties, lack of "blocking" artifacts, and better match tomotion-compensated prediction errors-complicatethis process of switching predictors, since theblocks now overlap. A novel predictive codingscheme is proposed in which subband coders cancombine the benefits of good representation andflexible adaptive prediction.

Source-adaptive coding is a way for HDTV systemsto support a more general imaging model than con-ventional television. With a source coder that canadapt to different spatial resolutions, frame rates,and coding rates, the system may then maketradeoffs among the various imagery types (forexample, 60 frames/s video, 24 frames/s film, highlydetailed still images, etc.). In general, this is aneffort to make HDTV more of an image transportsystem rather than a least-common-denominatorformat to which all sources must either adhere orbe adjusted to fit. These techniques are also appli-cable to NTSC to some extent; one result is analgorithm for improved chrominance separation forthe case of "3-2" NTSC, that is, NTSC upsampledfrom film.



2.8 Transmission of HDTV Signals in aTerrestrial Broadcast Environment

Sponsor

Advanced Telecommunciations Research Program

Project Staff

Julien J. Nicolas

High-definition television (HDTV) systems currentlybeing developed for broadcast applications require15-20 Mbps to yield good quality images for roughlytwice the horizontal and vertical resolutions of thecurrent NTSC standard. Efficient transmission tech-niques must be found in order to deliver this signalto a maximum number of receivers while respectingthe limitations stipulated by the FCC for over-the-airtransmission. This research focuses on the princi-ples that should guide the design of such trans-mission systems.

The major constraints related to the transmission ofbroadcast HDTV include (1) a bandwidth limitation(6 MHz, identical to NTSC), (2) a requirement forsimultaneous transmission of both NTSC and HDTVsignals on two different channels (Simulcastapproach), and (3) a tight control of the interferenceeffects between NTSC and HDTV, particularly whenthe signals are sharing the same frequency bands.Other considerations include complexity and costissues of the receivers and degradation of thesignal as a function of range.

A number of ideas are currently being studied; mostsystems proposed to date use some form offorward error-correction to combat channel noiseand interference from other signals. The overheaddata reserved for the error-correction schemesrepresents up to 30 percent of the total data, it istherefore well worth trying to optimize theseschemes. Current work is focusing on the use ofcombined modulation/coding schemes capable ofexploiting the specific features of the broadcastchannel and the interference signals. Other areas ofinterest include the use of combined source/channelcoding schemes for HDTV applications and multi-resolution coded modulation schemes.

2.8.1 Publication

Nicolas, J., and J.S. Lim. "On the Performance ofMulticarrier Modulation in a Broadcast MultipathEnvironment." Proceedings of ICASSP III:245-248 (1994).

2.9 Position-Dependent Encoding

Sponsor

Advanced Telcommunications Research Program

Project Staff

Alexander Pfajfer

In typical video compression algorithms, the DCT isapplied to the video, and the resulting DCT coeffi-cients are quantized and encoded for transmissionand storage. Some of the DCT coefficients are setto zero. Efficient encoding of the DCT coefficientsis usually achieved by encoding the location andamplitude of the nonzero coefficients. Since intypical MC-DCT compression algorithms, up to 90percent of the available bit rate is used to encodethe location and amplitude of the nonzero quantizedDCT coefficients, efficient encoding of the locationand amplitude information is extremely important forhigh quality compression.

A novel approach to encoding of the location andamplitude information, position-dependent encoding,is being examined. Position-dependent runlengthencoding and position-dependent encoding of theamplitudes attempts to exploit the inherent differ-ences in statistical properties of the runlengths andamplitudes as a function of their position. Thisnovel method is being compared to the classical,separate, single-codebook encoding of therunlength and amplitude, as well as to the jointrunlength and amplitude encoding.

2.9.1 Publication

Apostolopoulos, J., A. Pfajfer, H.M. Jung, andLim. "Position-Dependent Encoding."ceedings of ICASSP V: 573-576 (1994).

J.S.Pro-

2.10 MPEG Compression

SponsorsU.S. Navy - Office of Naval Research

NDSEG Graduate FellowshipAdvanced Telecommunications Research Program

Project StaffEric C. Reed

In a typical MC-DCT compression algorithm, almost90 percent of the available bit rate is used toencode the location and amplitude of the non zeroquantized DCT coefficients. Therefore efficientencoding of the location and amplitude information

313


is extremely important. One novel approach toencoding the location and amplitude information ofthe nonzero coefficients is position-dependentencoding. Position-dependent encoding, in contrastto single-codebook encoding, exploits the inherentdifferences in statistical properties of the runlengthsand amplitudes as a function of position.

Position-dependent encoding has been investigatedas an extension to separate encoding of therunlengths and amplitudes and has proven toprovide a substantial reduction in the overall bit ratecompared to single-codebook methods. However,MPEG compression does not allow separateencoding of the runlengths and amplitudes. There-fore, this research involves developing a position-dependent extension to encode the runlengths andamplitudes jointly as a single event. Rather thanhaving two separate codebooks for the runlengthsand amplitudes, one two-dimensional codebook willbe utilized. This method will be compared to con-ventional approaches as well as the position-dependent encoding approach using separatecodebooks.

2.11 HDTV Transmission FormatConversion and the HDTV MigrationPath

Sponsor


Project Staff

Lon E. Sunshine

The current proposal for terrestrial HDTV broad-casting allows for several possible transmissionformats. Because production and display formatsmay differ, it will be necessary to convert betweenformats effectively. Since HDTV will presumablymove toward progressive display systems, de-interlacing non-progressive source material will be akey process. This research will consider topicsrelating to conversion among the six formats beingproposed for the U.S. HDTV standard.

As HDTV evolves, it is probable that more trans-mission formats will be accepted. Furthermore,additional bandwidth may be allocated for somechannels (terrestrial and/or cable). This researchwill consider the issues related to the migration ofHDTV to higher resolutions. Backward compatibilityand image compression and coding issues will beaddressed.

2.12 Removing Degradations in ImageSequences

Sponsor


Project Staff

Carl Taniguchi

The development of two-dimensional noisesmoothing algorithms have been an active area ofresearch since the 1960s. Many of the traditionalalgorithms fail to use the temporal correlation thatexists between frames when processing imagesequences. However, with the increasing speed ofmicroprocessors and the rising importance of video,three-dimensional algorithms have not only becomefeasible, but also practical.

Three-dimensional median filters that are sensitiveto motion is the first step in using the temporal cor-relation in images. Existing algorithms of this typeeffectively reduce to two-dimensional median filtersunder areas of the image undergoing motion. Animprovement in the use of temporal correlation canbe obtained by using a motion estimation algorithmbefore filtering the image with a three-dimensionalmedian filter. Various median filters and motioncompensation algorithms will be tested in the pres-ence of noise.

Uniform processing of an image tends to unneces-sarily blur areas of the image that are not affectedby noise. In this case, a degradation detector maybe of practical use.

2.13 Speech Enhancement

Sponsor

Maryland Procurement OfficeContract MDA904-93-C-4180

Project Staff

Chang Dong Yoo

The development of the dual excitation (DE) speechmodel has led to some interesting insights into theproblem of speech enhancement. Based on theideas of the DE model, a new speech model isbeing developed. The DE model provides moreflexible representation of speech and possessesfeatures which are particularly useful to the problemof speech enhancement. These features, alongwith a variable length window, are the backbone ofthe new speech model being developed.



Because the DE model does not place anyrestrictions on its characterization of speech, theenhancement system based on the DE model per-forms better than the one based on any of the pre-vious speech models. While a model should beinclusive in its characterization, it should have somerestrictions. Specifically, a speech model shouldpertain to speech. The DE model is somewhatunrestrictive and simple in its characterization ofspeech. It is solely based on the separation of thevoiced and unvoiced components. Whether itmakes sense to represent a stop as a voiced andan unvoiced component is only one of many inter-esting issues which are being investigated. Anextension of the DE model which deals with theseissues better is currently being studied.

All model-based enhancement methods to datehave been formulated on the premise that eachsegment of speech is stationary for a fixed windowlength. The performance of these enhancementalgorithms depends on the validity of this assump-tion. We use a variable-length window to capturevarying durations of stationarity in the speech.There are several algorithms which adaptivelydetect changes in auto-regressive model parame-ters in quasi-stationary signals which have beensuccessfully used in speech recognition. Wepropose to investigate some of these algorithms.Using a variable length window will allow (1) betterand "cleaner" separation of the voiced andunvoiced components, and (2) a greater reductionin the number of characteristic parameters, such asthe amplitudes of the voiced components and theLP coefficients of the unvoiced component.

315

Professor Emeritus William F. Schreiber (Photo by John F. Cook)


Documents

Chapter 2. Advanced Telecommunications and Signal