9
49 0272-1732/99/$10.00 1999 IEEE Consumers and professional audio content developers continue to demand more powerful audio systems in the virtual worlds of games. These needs led our company to produce the EMU10K1 digital audio proces- sor, which places both a high-quality music synthesizer and a powerful audio effects processor on the same die. Implemented in a 0.35-micron, three-metal- layer CMOS process, the processor resides on a 6.7-mm × 6.5-mm die containing 2,439,711 transistors, a 33-MHz PCI clock, and 50-MHz and 100-MHz internal audio clocks. It has a 64-channel wavetable synthesizer with per- channel sample rate converter, digital filter, envelope generator, low-frequency oscillator, and routing/mixing logic. The EMU10K1 accesses digital audio data stored in system memory via a 64-channel PCI bus master sys- tem with virtual memory mapping identical to that in Intel CPUs. A powerful audio effects processor adds environmental simulation, 3D positioning, and special effects to audio from the wavetable synthesizer and numerous other digital audio sources such as S/PDIF (Sony/Philips Digital Interface), I 2 S (Philips Inter-IC Sound bus), and AC97 (Audio Codec 97). Since the processor operates at a fixed 48- kHz audio sampling rate, it contains dedicat- ed, high-quality sample rate converters. PC audio subsystem This subsystem has evolved to be quite com- plex with numerous methods for software appli- cations to interact with various audio sources and destinations. Applications can either call the operating system to play back audio stored in disk files or directly place the digital audio wave- forms in system memory and make device dri- ver calls for playback. Applications can produce music using an abstract language known as MIDI 1 that supports commands such as “play middle-C” and “use grand piano sound.” MIDI commands can even originate from an external synthesizer or keyboard controller connected via a cable to the computer. In addition to these sources, other devices such as CD-ROM, DVD, and microphones also produce audio that the user can hear and record to a disk file. Recent 3D computer games tend to include 3D audio as well as graphics, requiring additional pro- cessing to create a virtual 3D audio environment. Architecture The major architectural elements of the EMU10K1 are the PCI bus master and slave Thomas C. Savell Joint Emu/Creative Technology Center THE EMU10K1 DIGITAL A UDIO PROCESSOR THIS PC AUDIO SOLUTION FULFILLS ITS ORIGINAL DESIGN GOAL AS A MICROSOFT DIRECTSOUND ACCELERATOR AND, WITH ITS ENVIRONMENTAL SIMULATION CAPABILITIES, PROMPTED THE DEVELOPMENT OF THE ENVIRONMENTAL AUDIO EXTENSIONS (EAX) TO MICROSOFT DIRECTSOUND3D. ORIGINALLY DESIGNED FOR THE CONSUMER COMPUTER MARKET , THE EMU10K1 ALSO SUPPORTS PROFESSIONAL AUDIO CONTENT DEVELOPMENT .

EMU10K1

Embed Size (px)

DESCRIPTION

EMU10K1

Citation preview

Page 1: EMU10K1

490272-1732/99/$10.00 1999 IEEE

Consumers and professional audiocontent developers continue to demand morepowerful audio systems in the virtual worldsof games. These needs led our company toproduce the EMU10K1 digital audio proces-sor, which places both a high-quality musicsynthesizer and a powerful audio effectsprocessor on the same die.

Implemented in a 0.35-micron, three-metal-layer CMOS process, the processor resides ona 6.7-mm ×6.5-mm die containing 2,439,711transistors, a 33-MHz PCI clock, and 50-MHzand 100-MHz internal audio clocks. It has a64-channel wavetable synthesizer with per-channel sample rate converter, digital filter,envelope generator, low-frequency oscillator,and routing/mixing logic. The EMU10K1accesses digital audio data stored in systemmemory via a 64-channel PCI bus master sys-tem with virtual memory mapping identical tothat in Intel CPUs. A powerful audio effectsprocessor adds environmental simulation, 3Dpositioning, and special effects to audio fromthe wavetable synthesizer and numerous otherdigital audio sources such as S/PDIF(Sony/Philips Digital Interface), I2S (PhilipsInter-IC Sound bus), and AC97 (Audio Codec97). Since the processor operates at a fixed 48-

kHz audio sampling rate, it contains dedicat-ed, high-quality sample rate converters.

PC audio subsystemThis subsystem has evolved to be quite com-

plex with numerous methods for software appli-cations to interact with various audio sourcesand destinations. Applications can either call theoperating system to play back audio stored indisk files or directly place the digital audio wave-forms in system memory and make device dri-ver calls for playback. Applications can producemusic using an abstract language known asMIDI1 that supports commands such as “playmiddle-C” and “use grand piano sound.” MIDIcommands can even originate from an externalsynthesizer or keyboard controller connected viaa cable to the computer. In addition to thesesources, other devices such as CD-ROM, DVD,and microphones also produce audio that theuser can hear and record to a disk file. Recent3D computer games tend to include 3D audioas well as graphics, requiring additional pro-cessing to create a virtual 3D audio environment.

ArchitectureThe major architectural elements of the

EMU10K1 are the PCI bus master and slave

Thomas C. SavellJoint Emu/Creative

Technology Center

THE EMU10K1 DIGITALAUDIO PROCESSOR

THIS PC AUDIO SOLUTION FULFILLS ITS ORIGINAL DESIGN GOAL AS A

MICROSOFT DIRECTSOUND ACCELERATOR AND, WITH ITS ENVIRONMENTAL

SIMULATION CAPABILITIES, PROMPTED THE DEVELOPMENT OF THE

ENVIRONMENTAL AUDIO EXTENSIONS (EAX) TO MICROSOFT DIRECTSOUND3D.

ORIGINALLY DESIGNED FOR THE CONSUMER COMPUTER MARKET, THE

EMU10K1 ALSO SUPPORTS PROFESSIONAL AUDIO CONTENT DEVELOPMENT.

Page 2: EMU10K1

interfaces, 64-channel wavetable synthesizer,effects processor, high-quality sample rate con-verters, and digital audio receivers and trans-mitters. Figure 1 illustrates the connectionbetween blocks and their interaction with theapplication environment.

Wavetable synthesizerEach of the 64 channels in the wavetable

synthesizer consists of an interpolating oscil-lator; a resonant two-pole, digital low-pass fil-ter; three envelope generators; and a mixingamplifier with multiple selectable outputbuses. See Figure 2.

Background. Wavetable synthesis uses a digitalrecording of a natural sound such as a noteplayed on a piano. The wavetable synthesizerreplays this recording in response to a playerstriking keys on a keyboard. When synthesiz-ing an instrument such as a piano, it is not nec-

essary to record every note theinstrument can play. To con-serve memory, the sounddesigner samples a small num-ber of notes from across theentire range of the instrument,and the synthesizer createsintermediate notes on the flywith pitch-shifting hardware.The equivalence of pitch shift-ing and sample rate conver-sion allows the user freedomin choosing the source sample

rate. The wavetable synthesizer includes loop-ing hardware to create notes of arbitrary dura-tion and further conserve sample memory. Itresponds to the player’s touch with greaterexpressiveness, using the envelope generator tocontrol the amplitude and filter cutoff.

Interpolating oscillator. The EMU10K1wavetable synthesis oscillator contains aninterpolating sample rate converter and anaddressing unit capable of looping and pro-ceeding at an arbitrary, noninteger rate. Theaddressing unit maintains a phase accumula-tor that contains the current memory address,stored in an integer.fraction format. The inte-ger portion determines the memory read loca-tions, and the fractional portion determinesthe interpolation phase shift. The addressingunit adds a phase increment, stored in thesame integer.fraction format, to the currentaddress every sample period.

50

EMU10K1

IEEE MICRO

Waveformsin

systemmemory

PCI masterwith virtualmemorymapping

PCIslave

PCI

CPU PCI

Digitalaudio

receivers

Effectsprocessor

anddigitalaudiomixer

SRC

64-channelwavetablesynthesizer

SRC

ADC

S/PDIF

Digital audio(S/PDIF)

DAC

DAC

Tra

nsm

itter

s

EMU10K1 digital audio processor

Speakers

CD

Microphone

Figure 1. PC audio subsystem. All audio sources pass through the effects processor where they are mixed together to createa virtual 3D environment for rendering on a multiple-speaker system.

48kHz

+Outbus

Interpolatingoscillator

Low-passfilter Amp

Mixing amplifier

+Outbus

Envelopegenerator

Envelopegenerator

Envelopegenerator

Figure 2. EMU10K1 wavetable synthesizer.

Page 3: EMU10K1

The looping hardware uses two program-mable addresses, the loop start address and theloop end address. When the waveform addresspasses the loop end address, the addressing unitloads the current address with a value equal tothe loop start address plus the amount by whichthe address would have passed the loop endaddress. This causes the fetching of data at theloop start address again and repeating the loopuntil stopping the oscillator. Note that the cur-rent address register can accept any value tostart with, and it proceeds without interrup-tion until it reaches or passes the loop endaddress. This accommodates a different, non-repeating waveform during the initial attack ofa note, while providing an effective form of datacompression and placing no predeterminedlimit on the length of time a sound can play.

The 8-point interpolating sample rate con-verter uses the fractional address to determinethe position between input samples at whichto output a new sample. This is performedusing the well known Smith-Gossett2 algo-rithm, which requires 16 multiplies and 24adds. An on-chip ROM stores the filter coeffi-cients used in the interpolation algorithm. TheEMU10K1 uses extended-precision arithmeticto tolerate intermediate overflow operationsand to detect output overflow. In the case ofoutput overflow, the sample rate converter pro-duces a saturated output, avoiding severe two’s-complement wraparound distortion.

Resonant digital filter. The resonant digital fil-ter implementation is a two-pole low-passdesign. Figure 3 shows the filter topology,which requires two storage elements, three mul-tiplies, and four adds. The EMU10K1 usestechniques presented by Rossum3 in 1991 toreduce quantization noise, increase the poleangle resolution at low frequencies, and increasethe pole radius resolution near the unit circle.The implementation of these techniquesrequires special encoding of the coefficients andan additional two’s-complement operation.

The filter is capable of more than 20 dB ofresonance. The envelope generator can sweepthe filter cutoff frequency in real time.

Envelope generator. Traditional envelope gen-erators for music synthesizers contain fourphases: attack, decay, sustain, and release. Theenvelope generator of the EMU10K1 actual-

ly has six phases: delay, attack,hold, decay, sustain, andrelease, as illustrated in Figure4. The delay and hold phasesprovide additional control tothe sound designer.

Simple playback of record-ed samples does not have theexpressiveness of a musicianplaying a real instrument.This is because there arenumerous ways the instru-mental sound changes inresponse to a musician’s play-ing style. When the musicianstrikes an instrument harder,the sound is louder, has a sharper attack, andis brighter. A softer stroke results in a quietersound, a more gradual attack, and is duller.The envelope generator of the EMU10K1 pro-duces these effects by separately controlling theamplifier gain and filter cutoff frequency.

Virtual memory mappingWithin the PC environment, application

memory is virtualized into 4-Kbyte pages thatare mapped into a larger contiguous addressspace. An application program must requestan allocation of memory, a buffer, from theoperating system. Throughout the course ofnormal computer usage, programs allocate anddeallocate memory buffers millions of times.As the process continues, it may becomeimpossible to find a single block of physicalmemory to satisfy an allocation request. Theoperating system uses virtual memory map-ping to solve this problem. Each buffer ofmemory, while addressed as a contiguouswhole, may in fact be fragmented into many 4-Kbyte pieces randomly scattered throughoutthe physical memory. Since a DMA bus mas-ter must present physical addresses to the bus,there must be a way to resolve the virtualaddresses used by the application program.

Double buffering. One solution is to allocate asingle, physically contiguous buffer intowhich audio is copied from fragmented vir-tual memory prior to DMA transport. Dueto operating system limitations, physicallycontiguous buffer allocation is not guaran-teed, and it can only be requested during sys-tem start-up. Even if it is successful, the

51MARCH–APRIL 1999

+

Z−1

Z−1B1

B2

In Out

−1

2

DCNorm

Figure 3. Digital filter topology.

Page 4: EMU10K1

memory cannot be dynamically allocated anddeallocated at runtime. This is a decided dis-advantage to the user, as a large segment ofmain memory must be permanently assignedas an audio buffer. It also incurs the CPUoverhead of copying buffers from virtualizedmemory to the physically contiguous buffer.

Scatter-gather. Another way is to pass a scat-ter-gather list to the stream transport engine.A scatter-gather list is an ordered set of phys-ical addresses that must be parsed as thestream is being transported. This allowsdynamic allocation and deallocation, and doesnot require the copying of buffers. However,the same sound is often triggered multipletimes simultaneously, and for music and gameapplications each instance may require differ-ent sample rate conversion ratios. Thisrequires redundant copies of the same scatter-gather list. Even more significantly, soundsthat loop to allow them to play for long peri-ods require excessively long lists.

Page table. The EMU10K1 uses a bettermethod, translating the addresses in a similarfashion as the CPU by using a page table anda translation look-aside buffer. The page tableis located in system memory and contains thephysical addresses of each 4-Kbyte page with-in the logical sound memory, as shown in Fig-

ure 5 (next page).The on-chip translation look-aside buffer

contains the physical mapping of the currentlogical address. As the stream transport enginemoves through logical memory, it detectswhen the logical to physical mapping is nolonger valid, and issues a read from the pagetable to reload the translation look-asidebuffer. This method is very efficient for boththe CPU and audio stream transport engine.It permits dynamic allocation of audio buffersand has a minimum of redundancy.

As wavetable oscillators proceed, they sub-mit PCI bus master requests for data. A two-level priority scheme based on the degree ofdata starvation arbitrates bus master access.This reduces the probability that an oscillatorwill run out of input samples. A single badsample can be audible, especially wheninputting the audio to a recursive delay line.

Effects processorThe EMU10K1 effects processor creates

the final output that the listener will hear ona multiple-speaker system. It satisfies the mix-ing requirements of the PC environment witha total of 31 inputs and 32 outputs. The 24-bit audio I/O capability provides an ample144-dB dynamic range, enough to span theentire human hearing range from perceivedsilence to past the threshold of eardrum dam-age. However, all instructions use 32-bit inte-ger or fixed-point operands to support theadditional precision required for complexoperations such as recursive filtering.

ALU. At the core of most signal-processingalgorithms is the multiply-accumulate opera-tion. Naturally, the center of the EMU10K1effects processor is a high-speed arithmetic logicunit that implements variants of this powerfuloperation. All instructions use four registeraddresses: result, accumulator, multiplier, andmultiplicand. For maximum flexibility, theoperand address space is symmetrical in thatany address is valid for use in any operand. Theeffects processor forwards the result register toconceal pipeline delays and permits the resultto be used as an input operand on the nextinstruction period. It processes 32-bit fixed-point and integer operands and has a 67-bitaccumulator to allow for intermediate over-flows during multistage operations such as

52

EMU10K1

IEEE MICRO

Release

Delay

Hold

DecayAttack Sustain

Loopedsection

ofwaveform

Figure 4. The various phases of the EMU10K1 envelope gen-erator overlaid on the digital audio waveform. The attackphase typically contains more high-frequency information aswell as wideband noise. The high frequencies and noise con-tent become attenuated during the decay phase, and a fun-damental oscillation, or pitch, becomes prevalent. Once thisoccurs, a few points that are looped during the sustain andrelease phases can represent the remainder of the sound.

Page 5: EMU10K1

finite-length impulse response filtering.

Delay lines. Delay line buffer registers providezero-latency access to a small on-chip delaymemory and a large delay memory located inthe PC host memory. A dedicated delay lineprocessor performs the tedious job of main-taining all the circular addresses. The micro-code works with logical offset addresses that donot need updates on every sample period,except for special effects that require modula-tion of the delay length. Still, only the modu-lation requires microcode for calculation; thedelay line processor automatically performs thecircular address increment operation.

A dedicated data movement engine absorbsthe long latency of a large external memory.The effects processor requires instant access tothe delay line. That is not possible when thedelay memory is physically located across anarbitrated bus such as the PCI. Access time caneven present a challenge to a local memory,since delay memories must be many thousands

of bytes deep to be useful. By dedicating a smallbuffer memory to store the “current” data foreach delay pointer, access time can be instan-taneous. The data movement engine then canspread out the accesses into the large delaymemory over the course of a sample period.

Instruction sequencing. The effects processorhas a novel architecture: it does not contain aprogram counter or support branch/loop con-structs. Instead, a powerful conditional exe-cution mechanism provisionally stores theresults of operations. A special SKIP instruc-tion that implements the following logicaccomplishes conditional execution:

IF (cond) THEN SKIP instruction_count

The conditional expression, cond, is aBoolean AND-OR-NOT combination of amask operand with a hardware register thatstores condition codes from previous instruc-tions. This supports both conventional expres-

53MARCH–APRIL 1999

Physical memory

Logical memory

Scatter-gatherapproach

Seg 1 StartSeg 1 EndSeg 2 StartSeg 2 EndSeg 3 StartSeg 3 EndSeg 4 StartSeg 4 EndSeg 5 StartSeg 5 End

Scatter-gather list

Pagetable

0 121 132 23 34 45 86 177 188 199 10

Mapping

Play list

Start

Loop Start

Loop End

End

Pagetableapproach

Figure 5. Scatter-gather (upper) versus page table (lower) virtual memory approaches. Both approaches render the same wave-form. However, scatter-gather can require a very long list to support looping, while the page table requires no extra data.

Page 6: EMU10K1

sions such as less-than and more complex con-ditions such as negative-and-overflow-or-zero.The execution of a SKIP instruction does notalter the order in which instructions arefetched. Rather, the processor still fetches,decodes, and executes skipped instructions,but does not write back the results.

The processor determines the number ofconsecutive instructions to be skipped fromthe instruction_count operand. This permitsskipping entire blocks of code and facilitatesreal-time loading and unloading of signal-processing programs without affecting audiooutput. The SKIP instruction even serves asthe NOP instruction, using the construct,ALWAYS SKIP ZERO, which does not storeresults and continues executing the programon the next instruction cycle.

For every output sample period, the effectsprocessor reads all microcode memory insequential order and skips the writing of resultregisters based on the conditions specified inSKIP instructions. This provides for if-then-else execution, but not looping. A very impor-tant side effect of this architecture is that thetotal execution time of all concurrent pro-grams is always exactly one sample period.

In a general-purpose DSP, one must takecare to ensure that the entire program executeswithin a sample period. Otherwise, audibledefects can result. In the case of infinite-lengthimpulse response filters and recursive delaylines, a single sample defect can remain audi-ble for a very long time. Strange bugs in audio-processing algorithms have been traced to anoccasional interrupt service routine that causedprogram execution time to exceed the lengthof a sample period. These types of bugs are notpossible in the EMU10K1 effects processor.

Asynchronous digital audio receiversWe designed the EMU10K1 to receive dig-

ital audio directly from devices such as CD-ROM and DVD drives. However, the samplerate of compact disc audio is 44.1 kHz, andthe EMU10K1 output sample rate is 48 kHz.Due to manufacturing tolerances and drift,the clock frequency of each compact disc play-er differs slightly. Even if the output samplerate were also 44.1 kHz, the slight differencesin clock frequency would cause the relativephases of the input and output sample ratesto drift over time, eventually resulting in

repeated or dropped samples.It is possible to force the clock frequencies

to be exactly synchronous by using a trackingphase-locked loop, or PLL, rather than a fixed-frequency oscillator. Professional recordingstudios distribute a master clock to all inter-connected digital audio devices, which derivelocal clocks from the master. This guaranteessynchronicity of all digital audio streams. Thisapproach is expensive and difficult, requiringPLL-based synchronization capabilities in alldigital audio devices. Devices that cannot syn-chronize to an external clock source mustbecome the master clock source. This is a dis-tinct disadvantage as there can only be onemaster clock at a time.

A better solution is to use sample rate con-version to resolve the incoming sample rate tothe output rate. This requires a sample ratedetector that continuously updates an estimateof the asynchronous digital input rate. The sam-ple rate estimate maintains a phase accumulatorthat controls a 16-point Smith-Gossett2 samplerate converter. Such an asynchronous samplerate converter avoids the cost of a tracking PLLand provides support for multiple, simultaneousasynchronous audio streams. The EMU10K1can support three simultaneous asynchronousstereo streams using high-quality asynchronoussample rate conversion.

Digital audio recordingSpeech recognition, Internet telephony,

music recording, and content authoring appli-cations need digital audio recording capabil-ities. Noncritical recordings can use reducedsample rates to decrease data storage require-ments. The standard kHz rates needed with-in the PC are 48, 44.1, 32, 24, 22.05, 16,11.025, and 8. The conventional method isto reduce the clock rate of an analog-to-digitalconverter, or ADC. However, this requires theuse of separate ADC and DAC chips ratherthan an inexpensive monolithic codec.

To ensure adequate alias rejection at all sam-ple rates, more expensive tracking analog filtersmust be used as well. The EMU10K1 supportsthe various recording sample rates with veryhigh-quality, 64-point sample rate converters,thus permitting the use of low-cost, monolith-ic codec chips and simple analog antialiasingfilters. Recording incurs very low CPU over-head by using bus master DMA directly to sys-

54

EMU10K1

IEEE MICRO

Page 7: EMU10K1

tem RAM. At the half-buffer and full-bufferpoints, an interrupt signals that software shouldflush the audio to disk. All 32 output channelsof the effects processor may be selected for mul-tichannel recording, enabling the use of theEMU10K1 in a home studio environment.

Sample rate conversionThe EMU10K1 achieves relatively high-

order multipoint conversion using Smith andGossett’s particularly efficient algorithm of lin-

early interpolating the filter coefficients for con-volution with the sample data stream. Ratherthan incurring the high cost of ideal conversion,the EMU10K1 uses a perceptual optimizationtechnique so that most distortion componentsare inaudible. The key discovery was that mostof the energy in real musical sounds is in thelow-frequency band of human hearing. Theimages of these low-frequency components arelocated near multiples of the sample rate.

Designing the antialiasing filters to have deep

55MARCH–APRIL 1999

Digital audio mixingAudio mixing is a weighted summation of the inputs with saturation to avoid the severe

distortion caused by two’s-complement overflow. Mixing multiple digital audio streamsrequires that the streams have exactly the same sample rate. However, CD audio sampledat 44.1 kHz will inevitably need to be mixed with voice or sound effects sampled at someother rate. Lower sampling rates provide an effective data compression method for soundsthat do not have substantial high-frequency content. In addition, the native sample rate ofthe digital audio output may be different, typically 48 kHz. The solution is to convert all soundsources to the output sample rate before mixing, as shown in Figure A.

Digital audio is simply a numeric representation of an analog waveform. There are infi-nitely many digital representations of the same analog waveform. Sample rate conversionis a way to transform one representation into another, which can be done with various degreesof quality and corresponding complexity. To create a new sample stream at a different rate,one must interpolate between the original samples. Figure B illustrates the three common-ly used forms of interpolation: nearest-neighbor, linear, and multipoint.

The simplest form, nearest-neighbor interpolation, requires no arithmetic and has theworst quality. A better method is linear interpolation, which requires a small amount of arith-metic and has reasonable quality. Multipoint interpolation requires significant arithmetic, acoefficients table, and multiple input samples to create a single output sample,1 but pro-duces the best quality.

We can view sample rate conversion as a three-stage process. First, the conversion algo-rithm inserts a number of intermediate zero-valued samples in between the original sam-ples, thus creating a new sample stream at a higher sample rate. The new rate is an integermultiple of the original rate and is typically thousands of times higher when converting arbi-trary sample rates. Then, the algorithm filters the new high-rate stream to the lower of theinput and output Nyquist frequencies. The stopband of the low-pass filter must sufficientlyreject aliases to achieve the desired audio quality. Finally, the low-pass filtered stream isdecimated to generate output samples at the desired rate.

High-order sample rate conversion is an extremely computationally intensive operation.To maintain equivalent quality from input to output, noise due to aliasing must be below themagnitude of the least significant bit on the input signal. For example, a 20-bit digital audiowaveform has a dynamic range of about 120 decibels. To get equivalent 20-bit output, noiseintroduced in the sample rate conversion process must be less than −120 dB. For a 20-bitstereo signal, this requires more than 320 MIPS of processing power.

References1. R.E. Crochiere and L.R. Rabiner, Multirate Digital Signal Processing, Prentice-

Hall, Upper Saddle River, N.J., 1983, pp. 127-190.

+

IN1

Out

IN2

IN3

INn

Samplerate

converter

Samplerate

converter

Samplerate

converter

Samplerate

converter

Figure A. Digital audio mixing.

0.5

1

0

−0.5

−10 1 2 3 4 5 6 7 8 9 10

0.5

1

0

−0.5

−10 1 2 3 4 5 6 7 8 9 10

0.5

1

0

−0.5

−10 1 2 3 4 5 6 7 8 9 10

(1)

(2)

(3)

Figure B. Interpolation methods: nearest-neighbor (1), linear (2), and ideal multi-point (3).

Page 8: EMU10K1

notches at multiples of the sample rate at theexpense of some additional high-frequencyaliasing significantly improves the perceivedsound quality results. Rossum4,5 received apatent in 1992 for interpolation filters designedin this manner. The high-frequency alias com-ponents do not have a great deal of energybecause the source material is music, which hasvery little information in the upper band. Inaddition, Fletcher and Munson6 discovered thathuman hearing is strongest in the middle fre-quency range and very weak in the extremelow- and high-frequency range. Thus, thehuman hearing system itself attenuates thesmall amount of high-frequency aliasing. Thesenotches provide a dramatic improvement inthe sound quality with no increase in compu-tational complexity. Consequently, theEMU10K1 wavetable synthesizer has highersound quality than would otherwise be expect-ed when using only 8 points for interpolation.All EMU10K1 sample rate converters use thesame technique.

Physical design characteristicsAs stated earlier, the EMU10K1 is imple-

mented in a 0.35-micron CMOS process withthree metal layers. The 6.7 mm × 6.5 mm diewith 2,439,711 transistors uses a 33-MHzPCI clock and internal audio clocks runningat 50 and 100 MHz. The chip has 108 signalpins in a 144-pin PQFP. The device dissipatesabout one watt in normal operation, result-ing in a worst-case junction temperature of105°C, assuming ambient temperature of70°C. The core operates at an internal volt-age of 3.3 V, but the I/O is 5-V tolerant andsupports both 3.3-V and 5-V PCI buses.

Design methodologyWe used a VHDL-synthesis method to

design the EMU10K1, although we first devel-oped portions of the design using a C-languagemodel and then translated them into VHDL.

Logic verificationInitially, we verified the individual blocks

using RTL simulation. As chip-level integra-tion progressed, we ran more simulations toverify the connections between blocks. As wegot closer to tape-out, the importance of sim-ulation faded and gave way to emulation. Thedesign complexity coupled with the numerous

asynchronous clock boundaries called for theuse of emulation for final verification. Usingthe emulator, we could operate the design inan actual PC and run millions of times morecycles than would be possible using simulationalone. We could also sniff out a few nasty asyn-chronous boundary bugs that would have oth-erwise made it into silicon. Emulation doeshave its pitfalls, however; it is an immaturetechnology that often left us feeling frustrated.

Because of the emulator’s physical size andits FPGA use, the design cannot be run at fullspeed. To accomodate the low clock rate of theemulated design, we modified the target PCmotherboard to reduce its PCI clock speed.We found that a factor of about 1:50 was nec-essary for reliable operation, so we operatedthe PCI bus at 0.6 MHz. We then ran theaudio codec at the reduced rate, so we couldobserve analog audio output on an oscilloscopeand even perform spectral analysis to detectdistortion. These benefits far outweighed theinconveniences we suffered in getting the emu-lation environment up and running.

To make efficient use of the emulator, werequired early software support to write a chipdebugger that gave us direct access to thechip’s features. The debugging software need-ed to be a DOS program, since Windowsrequired over an hour to boot. The chipdebugger supported macro scripts, so wecould build higher level operations fromsequences of commands. That made it easy toperform first-silicon evaluation by reusing themacro scripts written during development.

Timing analysisWe used static timing analysis for the major-

ity of sign-offs. We could not properly analyzecertain design sections, so we relied on a com-bination of SDF-annotated (Standard DelayFormat) gate-level simulation and “correct-by-design” practices supported with billions ofemulation cycles.

One of the more difficult challenges over-come during timing analysis was fixing setupand hold violations on the internal RAM inhib-it pins. We chose to inhibit the RAMs on allunused clock cycles to conserve power, therebyreducing the worst-case junction temperatureand increasing our chance of successful silicon.The vendor’s RAMs inhibit pins worked by gat-ing the clock. This required setup time before

56

EMU10K1

IEEE MICRO

Page 9: EMU10K1

the clock’s falling edge and hold time after theclock’s rising edge, reducing our timing win-dow to much less than 5 ns in some cases. Wesolved the problem with careful placement androuting, and by tweaking the clock arrival timesin layout. Also, satisfying critical PCI timingrequired careful placement of the PCI core and,in some cases, flip-flop duplication so the out-puts could be placed close to the pad cell.

TestabilityThe chip has several test modes including

scan, parametric NAND tree, and RAM test.To support extremely high production vol-umes, we used functional test vectors to sup-plement the scan test and achieve the highestpossible fault coverage. We developed the testvectors by running an RTL simulation torecord the pin states.

Interestingly, we ran a gate-level simulationto verify vectors. We used emulation as ourprimary logic verification tools. So the pri-mary reason to run the gate-level simulationwas to verify that the vectors would not failbecause of RTL/gate differences in the mod-eling of unknowns (X’s). The vectors also pro-vided handoff verification at the vendor. Weencountered a number of vector mismatchesduring the handoff that were traced to incon-sistencies in the handling of X’s by the inter-nal SRAM models between the varioussimulators involved.

The EMU10K1 is an evolutionary stepforward in the development of digital

audio for the PC. The choice of placing botha high-quality music synthesizer and a pow-erful audio effects processor on the same diehas helped to push the expectations for 3Daudio in the PC.

The role of the 3D audio accelerator hasbecome analogous to that of the 3D graphicsaccelerators. The demand for more power willcontinue as game developers create more elab-orate virtual worlds that use up processingbandwidth faster than CPU manufacturerscan create it.

The 33-MHz, 32-bit PCI bus appears toprovide adequate audio bandwidth for now,but it could become a bottleneck in the nearfuture. The EMU10K1 internal arbitrationand priority scheme is very optimal and leaves

very little room for increased memory accessperformance. Newer 0.25-micron and small-er processes present some challenges in termsof I/O voltage, especially considering that mostPC motherboards have 5-V-only PCI buses.These system limitations will eventually needto be eliminated to allow further growth andinnovation in the PC audio subsystem. MICRO

References1. The Complete MIDI 1.0 Detailed Specifica-

tion, MIDI Manufacturers Assoc., 1984-1996.2. J.O. Smith and P. Gossett, “A Flexible Sam-

pling-Rate Conversion Method,” Proc. IEEEInt’l Conf. Acoustics, Speech, and Signal Pro-cessing, IEEE Press, Piscataway, N.J., Mar.1984, pp. 19.4.1-19.4.4.

3. D. Rossum, “The Armadillo Coefficient En-coding Scheme for Digital Audio Filters,” Proc.IEEE ASSP Workshop on Applications of Sig-nal Processing to Audio and Acoustics, 1991.

4. D. Rossum, U.S. patent no. 5,111,727, May12, 1992.

5. D. Rossum, “Constraint Based Audio Inter-polators,” Proc. IEEE ASSP Workshop onApplications of Signal Processing to Audioand Acoustics, 1993.

6. H. Fletcher and W.A. Munson, “Loudness,Definition, Measurement and Calculation,” J.Acoustic Soc. of America, 1933, Vol. 6, p. 59.

Thomas C. Savell is a staff ASIC engineer atthe Joint E-mu/Creative Technology Center.His responsibilities include digital and audioVLSI specification, architecture, design,implementation, and verification. He holds aBA in music technology from the Universityof California, San Diego, with a double minorin computer science and cognitive science. Heis a member of the IEEE Computer Society,Signal Processing Society, and MIDI Manu-facturers Association Technical StandardsBoard for 1997 and 1998.

Direct questions concerning this article tothe author at the Joint Emu/Creative Tech-nology Center, 1600 Green Hills Road, Suite101, PO Box 660015, Scotts Valley, CA95067-0015; [email protected].

57MARCH–APRIL 1999