RMI system: internet meets the future home theater ...eiger.ddns.comp.nus.edu.sg/pubs/rmi_ieeemultimedia04.pdf · immersive audio provide the highest quality rendering. T ... required

The Remote MediaImmersion (RMI)system blendsmultiple cutting-edge mediatechnologies tocreate the ultimatedigital mediadelivery platform. Itsstreaming mediaserver deliversmultiple high-bandwidth streams,transmissionresilience and flow-control protocolsensure dataintegrity, and high-definition videocombined withimmersive audioprovide the highestquality rendering.

The charter of the Integrated MediaSystems Center (IMSC) at theUniversity of Southern California(USC) is to investigate new methods

and technologies that combine multiple modal-ities into highly effective, immersive technolo-gies, applications, and environments. One resultof these research efforts is the Remote MediaImmersion (RMI) system. RMI’s goal is to createand develop a complete aural and visual envi-ronment that places participants in a virtualspace where they can experience events thatoccurred elsewhere. RMI technology realisticallyrecreates, on demand, the visual and aural cuesrecorded in widely separated locations.1

The RMI system is the next step in audio-visualfidelity for streaming media delivered on demandover the Internet. The RMI system pushes theboundaries beyond what’s currently available inany commercial system or other research proto-type. Its emphasis is on the highest quality ofaudio-visual experiences and realistic, immersiverendering. To achieve our goal, we were facedwith numerous challenges and had to designnovel techniques to make RMI a reality.

In this article, we detail some of the RMI com-ponents and the techniques that we employwithin each (focusing mainly on the transmis-sion and rendering aspects). The hope is that ouradvances in digital media delivery will enable

new applications in the future in the entertain-ment sector (sports bars, digital cinemas, andeventually the home theater), distance educa-tion, or elsewhere.

The RMI effortThe focus of the RMI effort is to enable the

most realistic recreation of an event possiblewhile streaming the data over the Internet.Therefore, we push the technological boundariesbeyond what current video-on-demand orstreaming media systems can deliver. As a conse-quence, the system requires high-end renderingequipment and significant transmission band-width. However, we trust that advances in elec-tronics, compression, and residential broadbandtechnologies will make such a system financiallyfeasible first in commercial settings and later athome in the not-too-distant future. Some of theindicators that support this assumption are, forexample, that the DVD specification’s next gen-eration calls for network access of DVD players.2

Furthermore, Forrester Research forecasts thatalmost 15 percent of films will be viewed by on-demand services rather than by DVD or video by2005.3 The infrastructure necessary for these ser-vices is gradually being built. For instance, inUtah, 17 cities are planning to construct an ultra-high-speed network for businesses and residents.4

The RMI project integrates several technolo-gies that are the result of research efforts at IMSC.The current operational version is based on fourmajor components that are responsible foracquiring, storing, transmitting, and renderinghigh-quality media.

❚ Acquiring high-quality media streams. Thisauthoring component is an important part ofthe overall chain to ensure users experiencehigh-quality rendering. As the saying “garbagein, garbage out” implies, no amount of quali-ty control in later stages of the delivery chaincan make up for poorly acquired media. In thecurrent RMI version, authoring is an offlineprocess and involves its own set of technolo-gies. Due to space constraints, this articledoesn’t focus on this component.

❚ Real-time digital storage and playback of multipleindependent streams. The Yima5 ScalableStreaming Media Architecture provides real-time storage, retrieval, and transmission capa-bilities. The Yima server is based on a scalablecluster design. Each cluster node is an off-the-

48 1070-986X/04/$20.00 © 2004 IEEE Published by the IEEE Computer Society

RMI System:Internet Meetsthe Future HomeTheater

Roger Zimmermann, Chris Kyriakakis, Cyrus Shahabi,Christos Papadopoulos, Alexander A. Sawchuk,

and Ulrich NeumannUniversity of Southern California

Digital Multimedia on Demand

shelf PC with attached storage devices and, forexample, a fast or gigabit Ethernet connec-tion. The Yima server software manages thestorage and network resources to provide real-time service to multiple clients requestingmedia streams. Media types include, but aren’tlimited to, MPEG-2 at NTSC and HDTV reso-lutions, multichannel audio (such as 10.2channel immersive audio), and MPEG-4.

❚ Protocols for synchronized, efficient real-timetransmission of multiple media streams. A selec-tive data retransmission scheme improvesplayback quality while maintaining real-timeproperties. A flow-control component reducesnetwork traffic variability and enables streamsof various characteristics to be synchronizedat the rendering location. Industry standardnetworking protocols such as the Real-TimeTransport Protocol (RTP) and Real-TimeStreaming Protocol (RTSP) provide compati-bility with commercial systems.

❚ Rendering immersive audio and high-resolutionvideo. Immersive audio is a technique developedat IMSC for capturing the audio environmentat a remote site and accurately reproducing thecomplete audio sensation and ambience at theclient location with full fidelity, dynamic range,and directionality for a group of listeners (16channels of uncompressed linear PCM at a datarate of up to 17.6 Mbits per second [Mbps]). TheRMI video is rendered in HDTV resolutions(1080i or 720p format) and transmitted at a rateof up to 45 Mbps.

Overview of componentsRMI’s group sizes can range from one person

or family at home to a large venue seating hun-dreds. For the visual streams, we decided that werequired at least high-definition resolution asdefined by the Advanced Television SystemsCommittee (ATSC).1 The highest quality ATSCmodes are either 1920 × 1080 pixels at an inter-laced frame rate of 29.97 per second, or 1280 ×720 pixels at a progressive frame rate of 59.94 persecond. (We discuss video playback details later.)For the audio rendering, we rely on the immer-sive audio technology developed at IMSC, whichuses a 10.2-channel playback system. Immersiveaudio’s rendering capabilities go beyond currentstereo and 5.1-channel systems, which we alsodiscuss in more detail later.

Each presentation session retrieves and plays

back at least one high-definition visual and oneimmersive aural stream in synchronization. Theavailable media content imposed this choice onthe RMI system, so it isn’t an inherent limitationin the Yima design. We store the streams sepa-rately on the server for two reasons. First, wedesigned the RMI system to be extensible so thatadditional video or other streams can becomepart of a presentation in the future. Second, let-ting streams be separately stored lets RMI retrievedifferent components of a presentation from dif-ferent server locations. The final, fine-grainedsynchronization is achieved at the client side.

Delivering high-resolution mediaAn important component of delivering

isochronous multimedia over IP networks to endusers and applications is the careful design of amultimedia storage server. Such a server mustefficiently store the data and schedule the data’sretrieval and delivery precisely before it is trans-mitted over the network. RMI relies on our Yimastreaming media architecture. IMSC researchersdesigned Yima to be a scalable media deliveryplatform that can support multiple, high-band-width streams. For space reasons, this articlefocuses on Yima’s features and techniques thatare most relevant for the RMI system. (See relatedresearch for additional information.5)

Figure 1 (next page) shows the server clusterarchitecture, which can harness the resources ofmany nodes and many disk drives per node con-currently. In our current implementation, theserver consists of a four-way cluster of rack-mountable PCs running Red Hat Linux.However, larger configurations are possible toincrease the number of concurrent RMI sessionssupported. Each cluster node is attached to a

49

Ap

ril–June 2004

We designed the RMI system

to be extensible so that

additional video or other

streams can become

part of a presentation

in the future.

local network switch with a fast or GigabitEthernet link. The nodes communicate with eachother and send the media data via these networkconnections. We connected the local switch toboth a wide area network (WAN) backbone (toserve distant clients) and a local area network(LAN) environment with local clients. Choosingan IP-based network keeps the per-port equip-ment cost low and is immediately compatiblewith the public Internet.

The RMI system stores the media data in seg-ments (called blocks) across multiple, say four,high-performance hard disk drives. We canassign a media object’s data blocks (in a load-bal-anced manner) to the magnetic disk drives thatform the storage system in a round-robinsequence6 or randomly.7 Yima uses a pseudoran-dom data placement combined with a deadline-driven scheduling approach. This combinationenables short startup latencies and can easily sup-port multimedia applications with nonsequen-tial data access patterns including variable bitrate (VBR) video or audio as well as interactiveoperations such as pause and resume. It alsoenables efficient data reorganization during sys-tem and storage scaling.8

Error control with selective retransmissionsfrom multiple server nodes

One characteristic of continuous mediastreams is that they require data to be deliveredfrom the server to a client location at a predeter-mined rate. This rate can vary over time forstreams that have been compressed with a VBRmedia encoder. VBR streams enhance the ren-

dering quality, but they generate bursty traffic ona packet-switched network such as the Internet.In turn, this can easily lead to packet loss due tocongestion. Such data loss adversely affects com-pressed audio and video streams because thecompression algorithm has already removedmuch of the data’s temporal or spatial redun-dancy. Furthermore, important data such asaudio–video synchronization information mightget lost that will introduce artifacts in a streamfor longer than a single frame. As a result, it’simperative that as little as possible of a stream’sdata is lost during the transmission between theserver and a rendering location.

The Yima cluster architecture takes advantageof the distributed storage resources among themultiple nodes and the multiple network con-nections that link all the nodes together. Mediadata is transmitted via RTP encapsulated in con-nectionless User Datagram Protocol (UDP) pack-ets. To avoid traffic bottlenecks, each nodetransmits the data blocks that it holds directly tothe clients via RTP. Hence, each client willreceive RTP data packets from each server nodewithin the cluster.

The current Internet infrastructure wasn’tdesigned for streaming media and provides onlybest-effort packet delivery. Therefore, RTP/UDPdatagrams aren’t guaranteed to arrive in order orat all. We can easily achieve reordering with thehelp of a global sequence number across all pack-ets, but information loss requires special provi-sions if the rendered streams’ quality at thereceiving side should be acceptable.

One possible solution is using forward error cor-

50

IEEE

Mul

tiM

edia

Node 0 Node 1 Node 2 Node N

Ethernetswitch

Ethernet(100 Mbpsor 1 Gbps)

Ethernet(100 Mbpsor 1 Gbps)

High-performance, Ultra320 SCSI disks

Up to 320 Mbps

100 Mbps or 1Gbps NICs

Ultra320 SCSI controller

Personalcomputer PCI bus

Disk 0 Disk 1 Disk 2 Disk N

Fast ethernet or gigabit ethernet (1 Gbps)

Internet backbone routers

End user

End user

End user

Figure 1. The Yimaserver clusterarchitecture.

rection (FEC). With this method, the server con-tinuously adds redundant information to thestream that aids the receiver in reconstructing theoriginal information if data is corrupted or lost dur-ing transmission. Because of its preemptive nature,FEC can add significant overhead that consumesadditional bandwidth even when it isn’t needed.With RMI, we’re transmitting some streams thatrequire in excess of 45 Mbps bandwidth.9 In thatcase, retransmission-based error control (RBEC) isan attractive option. RBEC can be an effective solu-tion for streaming media applications that use aplay-out buffer at the client side.10

A central question arises when data is ran-domly stored across multiple server nodes andRBEC is used: When multiple servers deliverpackets that are part of a single stream, and apacket doesn’t arrive, how does the client knowwhich server node attempted to send it? In otherwords, it isn’t obvious where the client shouldsend its request for retransmission of the packet.There are two general solutions to this problem.First, the client can broadcast the retransmissionrequest to all server nodes, or second, it can com-pute the server node to which it issues theretransmission request.

Broadcast retransmissions. With the broad-cast approach, all server nodes receive a packetretransmission request. The request broadcastingin this scenario can be well targeted by the clientto include all the server nodes but no other com-puters. By observing the RTP/UDP packet headersource IP address, the client can easily establishthe complete set of server nodes. Once a serverreceives a request, it checks whether it holds thepacket, and either ignores the request or retrans-mits. Consequently, this approach wastes net-work bandwidth and increases server load.

Unicast retransmissions. The second, moreefficient and scalable method of sending retrans-mission requests requires that we identify theunique server node that holds the missing pack-et. To accomplish this, the client could reproducethe pseudorandom number sequence that wasoriginally used to place the data across multipleserver nodes. This approach has several draw-backs. First, identical algorithms on both theclients and servers must be used at all times. If weupgrade the server software, then we mustupgrade all clients immediately, too. The logisticsof such an undertaking can be daunting if theclients are distributed among thousands of end

users. Second, during scaling operations the num-ber of server nodes or disk drives changes, andhence new parameters need to be propagated tothe clients immediately. Otherwise, the servernodes will be misidentified. Third, if for any rea-son the client computation is ahead, or behindthe server computation (for example, the numberof packets received doesn’t match the number ofpackets sent), then any future computations willbe wrong. This could potentially happen if theclient has only a limited memory and packetsarrive sufficiently out of sequence.

A more robust approach exists. The client candetermine the server node from which a lost RTPpacket was intended to be delivered by detectinggaps in node-specific packet-sequence numbers.We term these local sequence numbers (LSN) asopposed to the global sequence number (GSN)that orders all packets. Although this approachrequires packets to contain a node-specificsequence number along with a GSN, the clientsrequire little computation to identify and locatemissing packets.

We implemented this technique and evaluat-ed it with an extensive set of experiments acrossLAN and WAN environments. The results showthat the method is feasible and effective. (See ourrelated research for more details.11)

Client-server adaptive flow controlRMI relies on the efficient transfer of multi-

media streams between a server and a client.These media streams are captured and displayedat a predetermined rate. For example, videostreams might require a rate of 24, 29.97, 30,59.94, or 60 frames per second. Audio streamscan require 44,100 or 48,000 samples per second.An important measure of quality for such multi-media communications is the precisely timedplayback of the streams at the client location.

Achieving this precise playback is complicat-ed by the popular use of VBR media stream com-pression. VBR encoding algorithms allocate morebits per time to complex parts of a stream andfewer bits to simple parts to keep the visual andaural quality at near constant levels. For exam-ple, a movie’s action sequence might requiremore bits per second than its credits. As a result,different transmission rates might be necessaryover the length of a media stream to avoid star-vation or overflow of the client buffer. As a con-tradictory requirement, we want to minimize thevariability of the data transmitted through a net-work. High variability produces uneven resource

51

Ap

ril–June 2004

utilization and might lead to congestion andexacerbate display disruptions.

The RMI flow-control mechanism’s focus is onachieving high-quality media playback by reduc-ing the variability of the transmitted data andhence avoiding display disruptions due to datastarvation or overflow at the client. We identifiedthe following desirable characteristics for thealgorithm:

❚ Online operation. This is necessary for livestreaming and desirable for stored streams.

❚ Content independence. An algorithm that isn’ttied to any particular encoding technique willcontinue to work when new compressionalgorithms are introduced.

❚ Minimizing feedback control signaling. The over-head of online signaling should be negligibleto compete with offline methods that don’tneed any signaling.

❚ Rate smoothing. The peak data rate and thenumber of rate changes should be loweredcompared with the original, unsmoothedstream. This will greatly simplify the designof efficient real-time storage, retrieval, andtransport mechanisms to achieve high-resource utilization.

We designed a high-performance rate controlalgorithm that adjusts the multimedia trafficbased on an end-to-end rate control mechanismin conjunction with an intelligent buffer man-agement scheme. Unlike previous approaches,we consider multiple signaling thresholds andadaptively predict the future bandwidth require-

ments. With this multithreshold flow-control(MTFC) scheme, we accommodate VBR streamswithout a priori knowledge of the stream bit rate.Furthermore, because the MTFC algorithmencompasses server, network, and clients, itadapts itself to changing network conditions.Display disruptions are minimized even with fewclient resources (such as a small buffer size).MTFC uses a modular rate change computationframework in which we can easily incorporatenew consumption prediction algorithms andfeedback-delay estimation algorithms. It is cur-rently implemented in our Yima5,12 streamingmedia server and clients. This lets us measure andverify its effectiveness in an end-to-end applica-tion such as RMI.

The client play-out buffer is a crucial compo-nent of any feedback-control paradigm. Theserver is the data producer that places data intothe buffer, and the media decoder is the con-sumer that retrieves data. If the production andconsumption rates are exactly the same, then theamount of data in the buffer doesn’t change. Ifthere is a mismatch, however, data will eitheraccumulate or drain. If the buffer overflows orunderflows then display disruptions will appear.Hence, the goal of managing the buffer is to keepthe data level approximately at half the buffersize so that fluctuations in either direction can beabsorbed. In an online feedback scheme, whenthe data level sufficiently deviates from the buffermidpoint, a correction message is sent to theserver to adjust the sending rate.

We can configure the MTFC buffer managerwith a variable number of data level watermarksor thresholds. For example, we can mark thebuffer space with five, nine, or 17 thresholds (cre-ating six, 10, or 18 logical data partitions). If thebuffer data level crosses one of the thresholds,then a speed-up or slow-down message is sent tothe server. Note that the thresholds trigger a mes-sage only if the data increase or drain happens inthe correct direction—thresholds above themidlevel send a slow-down message for dataincreases and thresholds below midlevel triggera speed-up message when data continues todrain. The desired rate at which the server shouldspeed up or slow down the transmission is calcu-lated based on the current buffer level, its rate ofincrease or decrease, and the predicted futuretransmission. This design provides soft and infre-quent rate changes during normal operation,while it takes corrective action aggressively whena buffer overflow or underflow looms. MTFC also

52

IEEE

Mul

tiM

edia

Unlike previous approaches,

we consider multiple

signaling thresholds and

adaptively predict the

future bandwidth

requirements.

has different threshold spacings at its disposal toaid in its operation. We can space thresholds atan equal distance, geometrically, or logarithmi-cally to provide different control characteristics.

From our real-world experiments, we concludethat more than three buffer thresholds reduces thevariability of the data transmission and the feed-back overhead. At the same time, a consumptionrate prediction algorithm smoothes streams withno prior knowledge of their transmission sched-ule. Therefore, our technique is well suited forhighly dynamic environments that need to adaptto changing network and load conditions.Furthermore, the achieved smoothing allows forimproved resource use by reducing peak datarates. The experimental results further show thatour scheme outperforms existing algorithms interms of traffic smoothness with similar or less sig-naling frequency. Figure 2 shows an example ofan original VBR movie transmission profile andthe resulting schedule when using MTFC.

RenderingThe rendering side of the RMI system consists

of several parts. The video and audio streams arereceived over a sufficiently fast IP network con-nection on a PC running two instances of the

Yima playback software. We structured thismedia player into several components and onlyone of them interfaces with the actual mediadecoder. This lets us plug in multiple softwareand hardware decoders and hence support vari-ous media types. For RMI, one of the playersinterfaces with a CineCast HD MPEG-2 decom-pression board manufactured by Vela Research.This decoder accepts MPEG-2 compressed high-definition video at data rates in excess of 50Mbps and in both 1080-interlaced and 720-pro-gressive formats. An HD serial digital interface(SDI) connection transports the video to a high-resolution front projector. Depending on thevenue’s size, we can use different projector mod-els. MPEG-2 video at a data rate of 40 to 50 Mbpsis referred to as having contribution quality. It’soften the format of choice between productionfacilities and provides high visual quality.Consequently, extending the visual field requiresthat we improve the aural presentation as well.

Immersive audioAudio can play a key role in creating a fully

immersive experience. Achieving this requiresthat we exceed the spatial limitations of tradi-tional two-channel stereo. Researchers have pro-

53

Ap

ril–June 2004

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

0 200 400 600 800 1,000 1,200 1,400

Dat

a ra

te (

byte

s/se

cond

)

Time (seconds)

Movie consuption rateServer sending rate

Figure 2. An example of an unsmoothed versus smoothed movie transmission profile.

posed several rendering methods that use digitalfilters to represent the spectral characteristics ofsound sources from different directions in 3Dspace.13 These methods rely on accurate repre-sentation of head-related transfer functions(HRTFs) that represent the modifications impart-ed on sound by the head and pinnae. To deliversound over loudspeakers, these methods alsorequire precise cancellation of the crosstalk sig-nals resulting from the opposite side loudspeakerto deliver the desired sound to each ear. As aresult, they work well only for a single listener ina precisely defined position.

Multichannel surround-sound systems existthat use three front channels and two surroundchannels to provide a sense of envelopment formultiple listeners. Although these systems workwell in movies, they aren’t suitable for immersivesystems. They leave significant spatial gaps in theazimuth plane (for example, at 90 degrees to theside of the listeners) and provide no coverage inthe median plane (no elevation cues).

To minimize localization errors, the numberof loudspeakers must increase linearly with thelistening area’s width.14 A listener that moves justa few centimeters from the designated listeningspot is subjected to high imaging distortion andno longer experiences the correct localizationcues. Increasing the number of loudspeakersaddresses this problem.

We designed and implemented a multichan-nel rendering system that addresses some ofthese limitations. The psychoacoustics literature

explains that human listeners can localizesounds precisely in the front hemisphere and lessprecisely to the sides and in the rear hemi-sphere.15 Therefore, localization errors from thefront channels that arise when the listener’s posi-tion is offset from the desired center location willbe particularly evident.

In our implementation, we allocate five chan-nels to the front horizontal plane by augment-ing the traditional three front loudspeakers withtwo additional wide channels. This reduces local-ization errors in the front hemisphere and pro-vides the wide channels with simulated side-wallreflection cues that increase the sense of spatialenvelopment.16 In addition to the five frontchannels, we added a rear surround channel tofill the gap directly behind the listener. We alsoreproduce elevation information by placing twoheight channels above the front left and rightloudspeakers. Early experiments we performedwith this configuration show that these 10 chan-nels significantly increase listeners’ sense of local-ization and envelopment.

The RMI implementation uses a second soft-ware player to interface with a 16/24-channelsound card model RME 9652 Hammerfall. Theaudio data is received via 16 channels in either16- or 24-bit uncompressed linear PCM format ata data rate of up to 17 Mbps. The audio is trans-ported via Alesis Digital Audio Tape (ADAT) light-pipes to a Digidesign ProTools system andrendered via individual equalizers and poweredspeakers. Figure 3 shows the speaker locations ofa typical 10.2-channel setup.

A major challenge in multilistener environ-ments arises from room acoustical modes, par-ticularly in small rooms, that cause a significantvariation in the responses measured at differentlistener locations. Responses measured only a fewcentimeters apart can vary by up to 15 to 20 deci-bels (dB) at certain frequencies. This makes it dif-ficult to equalize an audio system for multiplesimultaneous listeners. Furthermore, it makes itimpossible to render a remote performance for alarge audience and have them experience thesound exactly as it is being produced at theremote event. Previous methods for room equal-ization have used multiple microphones and spa-tial averaging with equal weighting. Thisapproach tends to smooth out large variationsdue to standing waves, but it doesn’t account forthe effects of room modes that for some fre-quencies can be more concentrated in one regionthan another.

54

IEEE

Mul

tiM

edia

Figure 3. An immersive 10.2-channel audio setup illustrating the 12 speakerlocations.

We developed a new method that derives arepresentative room response from several roomresponses that share similar characteristics.17,18

We can use this response to equalize the entireclass of responses it represents. Our approach isbased on a fuzzy unsupervised technique (c-means clustering) for finding similarities, clus-tering room responses, and determining theirrepresentatives. Our results show a significantimprovement in equalization performance oversingle point equalization methods, and we’re cur-rently implementing a real-time version that canperform the necessary corrections on the incom-ing audio channels.

Applications and demonstrationsWe tested and showcased the RMI system in

several configurations and locations. We deployedYima servers in our laboratory on the USC campusand remotely at the Information SciencesInstitute/East, Arlington, Virginia. The connectiv-ity was provided either through Internet2 or via aSuperNet link.

We demonstrated the client and renderingsetup at the following venues:

❚ The demonstration room next to our labora-tory on the USC campus seats 10 people for anintimate, living room type of experience. Thisroom contains a permanent setup and is ourprimary test location. Our first public demon-stration of the RMI system took place there inJune 2002. We routinely update the equip-ment with the latest software and hardwareversions and continually demonstrate the sys-tem to IMSC visitors.

❚ In October 2002, we presented a recorded per-formance by the New World Symphony con-ducted by Alasdair Neale in USC’s Bing Theater.This venue seats approximately 500 people andwas specially equipped with a 30 × 17-footscreen and a 10.2-channel audio system for theRMI performance, which was part of anInternet2 Consortium meeting hosted by USC.

❚ In March 2003, we demonstrated the RMI sys-tem as part of the Internet2 PerformanceProduction Workshop and the Internet2Music Education Symposium at the NewWorld Symphony’s Lincoln Theater in MiamiBeach, Florida. Again, the venue was specifi-cally equipped with a large screen and a 10.2-channel audio system.

❚ In September 2003 the Remote MediaImmersion system was demonstrated at alibrary dedication at Inha University, SouthKorea. The library was dedicated by Y.H. Cho,the Chairman of Korean Airlines, in memoryof his father. This milestone marked the firstinternational demonstration of RMI.

We’ve shown that RMI is adaptable to differ-ent situations. It also has proven reliable enoughto be moved or recreated in different locationsand for large audiences. Work to simplify the ren-dering setup (speaker calibrations and so forth) iscurrently under way at IMSC.

Discussion and conclusionsThe current RMI setup is out of reach for most

home users. For widespread adoption, numeroustechnological advances will be necessary, whichsubsequently will lead to more affordable pricesand make the RMI system feasible for high-endhome use. For example, we’re currently using theMPEG-2 algorithm at a low-compression ratio toachieve our target visual quality. An improve-ment in compression algorithms and affordablehardware availability will most certainly makethe same quality available at lower bit rates in thefuture—MPEG-4 is a candidate here. Hence, weenvision a cross over point in the next few years,when the bandwidth required for RMI is belowthe bit rates that new high-speed, residentialbroadband technologies can deliver—for exam-ple, very high-speed DSL (VDSL).

Additionally, we’re working toward simplifyingand automating the setup and calibration proce-dures currently necessary. Subsequently, we canincorporate these signal-processing algorithms intomultichannel home receivers at moderate cost.

55

Ap

ril–June 2004

In the more distant future,

we can envision a

portable device with a

high-resolution screen

and a personal immersive

audio system.

In the more distant future, we can envision aportable device with a high-resolution screen anda personal immersive audio system. For a singlelistener, it’s possible to achieve enveloping audiorendered with only two speakers, as long as weknow the position and shape of the listener’sears. The PDA of the future will at some pointhave enough processing capabilities to combinevisual tracking with adaptive rendering of bothaudio and video. MM

References1. D. McLeod et al., “Integrated Media Systems,” IEEE

Signal Processing, vol. 16, Jan. 1999, pp. 33-76.

2. T. Smith, “Next DVD Spec. to Offer Net Access,

Not More Capacity,” The Register, 27 Oct. 2003.

3. T. Richardson, “CDs and DVDs Are ‘Doomed’,” The

Register, 2 Sept. 2003, pp. C1, C2.

4. M. Richtel, “In Utah, Public Works Project in

Digital,” The New York Times, 17 Nov. 2003,

Section C, pp. 1; http://www.nytimes.com/

2003/11/17/technology/17utopia.html?8hpib.

5. C. Shahabi et al., “Yima: A Second Generation

Continuous Media Server,” Computer, vol. 35, no.

6, June 2002, pp. 56-64.

6. S. Berson et al., “Staggered Striping in Multimedia

Information Systems,” Proc. ACM SIGMOD Int’l Conf.

Management of Data, 1994, ACM Press, pp. 79-90.

7. J.R. Santos and R.R. Muntz, “Performance Analysis

of the RIO Multimedia Storage System with

Heterogeneous Disk Configurations,” ACM

Multimedia Conf., 1998, ACM Press, pp. 303-308.

8. J.R. Santos, R.R. Muntz, and B. Ribeiro- Neto,

“Comparing Random Data Allocation and Data

Striping in Multimedia Servers,” Proc. SIGMETRICS

Conf., 2000, ACM Press, pp. 44-55.

9. E.A. Taub, “On Internet of the Future, Surfers May

Almost Feel the Spray,” The New York Times,

Circuits section, 9 May 2002, p. C4;

http://dmrl.usc.edu/pubs/NYTimes- RMI.pdf.

10. C. Papadopoulos and G.M. Parulkar,

“Retransmission-Based Error Control for

Continuous Media Applications,” Proc. 6th Int’l

Workshop Network and Operating Systems Support

for Digital Audio and Video (NOSSDAV 96), LNCS

1080, Springer-Verlag, 1996, pp. 5-12.

11. R. Zimmermann et al., “Retransmission-Based Error

Control in a Many-to-Many Client-Server

Environment,” Proc. SPIE Conf. Multimedia

Computing and Networking (MMCN), SPIE Press,

2003, pp. 34-44.

12. R. Zimmermann et al., “Yima: Design and Evaluation

of a Streaming Media System for Residential

Broadband Services,” Proc. VLDB 2001 Workshop

Databases in Telecommunications (DBTel 2001),

LNCS 2209, Springer-Verlag, 2001, pp. 116-125.

13. C. Kyriakakis, “Fundamental and Technological

Limitations of Immersive Audio Systems,” IEEE Proc.,

vol. 86, 1998, pp. 941-951.

14. R. Rebscher and G. Theile, “Enlarging the Listening

Area by Increasing the Number of Loudspeakers,”

88th AES (Audio Engineering Society) Convention,

1990, Preprint No. 2932.

15. J.C. Makous and J.C. Middlebrooks, “Two

Dimensional Sound Localization by Human

Listeners,” J. Acoustical Soc. of America, vol. 87, no.

5, May 1990, pp. 2188-2200.

16. Y. Ando, Concert Hall Acoustics, Springer-Verlag,

1985.

17. S. Bharitkar and C. Kyriakakis, “Perceptual Multiple

Location Equalization with Clustering,” Proc. 36th

IEEE Asilomar Conf. Signals, Systems, and Computers,

IEEE Press, 2002, pp. 179-183.

18. S. Bharitkar, P. Hilmes, and C. Kyriakakis, “The

Influence of Reverberation on Multichannel

Equalization: An Experimental Comparison between

Methods,” Proc. 37th Conf. Signals, Systems, and

Computers, IEEE Press, 2003, pp. 155-159.

Christos Papadopoulos is an

assistant professor at the

University of Southern California

(USC). His research interests

include network security, multi-

media communications, and

operating systems. Papadopoulos has a PhD in com-

puter science from Washington University in St. Louis,

Missouri. He is affiliated with the Integrated Media

Systems Center (IMSC) and Information Sciences

Institute (ISI).

Chris Kyriakakis is the director

of the IMSC Immersive Audio

Laboratory at USC. His research

interests include multichannel

audio signal processing, auto-

matic room equalization, micro-

phone arrays for sound acquisition, stereoscopic

high-definition video capture and rendering, virtual

microphones for multichannel audio synthesis, and

streaming of multichannel audio over high-bandwidth

networks. Kyriakakis has a PhD in electrical engineer-

ing from the University of Southern California.

56

IEEE

Mul

tiM

edia

Ulrich Neumann is an associate

professor of computer science at

USC. He is also the director of the

Integrated Media Systems Center

(IMSC), an NSF Engineering

Research Center (ERC). Neumann

holds an MSEE from SUNY at Buffalo and a PhD in

computer science from the University of North

Carolina at Chapel Hill. His research relates to immer-

sive environments and virtual humans.

Alexander A. Sawchuk is a pro-

fessor in the Department of

Electrical Engineering and

deputy director of the Integrated

Media Systems Center at USC.

His research includes image pro-

cessing, acquisition and display, and optoelectronic

data storage and network systems. Sawchuk has a BS

from the Massachusetts Institute of Technology, and

an MS and PhD from Stanford University, all in elec-

trical engineering.

Cyrus Shahabi is an assistant pro-

fessor, the director of the

Information Laboratory (InfoLAB)

at the Computer Science

Department, and a research area

director at the Integrated Media

Systems Center (IMSC) at USC. His current research

interests include streaming architectures and multidi-

mensional databases. Shahabi has a PhD in computer

science from the University of Southern California.

Roger Zimmermann is a research

assistant professor of computer

science, a research area director

with the Integrated Media

Systems Center (IMSC), and the

director of the Data Management

Research Laboratory (DMRL). His research interests are

in streaming media architectures and distributed data-

base integration. Zimmermann has a PhD in comput-

er science from the University of Southern California.

Readers may contact Zimmermann at the Univ. of

Southern California, 3740 McClintock Ave., Suite 131,

Los Angeles, CA 90089; [email protected].

57

Ap

ril–June 2004

IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Stay on top of the exploding fields of computational biology andbioinformatics with the latest peer-reviewed research.

This new journal will emphasize the algorithmic, mathematical,statistical and computational methods that are central in bioinformaticsand computational biology including…

• Computer programs in bioinformatics• Biological databases• Proteomics• Functional genomics• Computational problems in geneticsLearn more about this new

publication and become a charter subscriber today.

http://computer.org/tcbb

NEW for 2004!

Publishing quarterly in 2004Member rate:

$35 print issues$28 online access$46 print and online

Institutional rate: $345

Documents

RMI system: internet meets the future home theater ...eiger.ddns.comp.nus.edu.sg/pubs/rmi_ieeemultimedia04.pdf · immersive audio provide the highest quality rendering. T ... required