38
Audio in VEs Ruth Aylett

Lectures on Virtual Environment Development L6

Embed Size (px)

DESCRIPTION

Virtual Reality development is taking the world by storm. Follow all 16 Lecture Notes to learn how to build your own VR experiences. -By Ruth Aylett, Prof.Comp Sci. @ Heriot Watt University

Citation preview

Page 1: Lectures on Virtual Environment Development L6

Audio in VEs

Ruth Aylett

Page 2: Lectures on Virtual Environment Development L6

Use of audio in VEs

Important but still under-utilised channel forHCI including virtual environments.

Speech recognition for hands-free input All computers now have sound output

– At least a beep– Usually CD-quality stereo sound

Conventional stereo places a sound in anyspot between the left and right loudspeakers.– In true 3D sound, the source can be placed in any

location: right or left, up or down, near or far.

Page 3: Lectures on Virtual Environment Development L6

Potential uses

Associate sounds with particular events Associate sounds with static objects Associate sound with the motion of an object Use localised sound to attract attention to an object Use ambient sounds to add to the feeling of

immersion Use sounds to add to the feeling of realism Use speech to communicate with devices or avatars Use sound as a warning or alarm signal

Page 4: Lectures on Virtual Environment Development L6

Overall impact

High Quality audio provides:– Increased realism

• Reinforces visuals

– Strong immersive sense• World exists beyond part that is seen

– Strong positional cues– Extra information about the environment– The shape of the world

What does ‘High Quality’ mean?

Page 5: Lectures on Virtual Environment Development L6

VR sound environment

VR equipment creates a difficult soundenvironment.

CAVE– Stand in a glass box– Pretend you can’t hear the echoes– Also hard to place speakers

Semi-immersive VR theatre– Sit in a big cylinder– Reflects sound in a very strange way.

Page 6: Lectures on Virtual Environment Development L6

Workbench

Not so bad but still...– Big flat screen 1 metre in front of you– Sound coming from surround speakers– Creates echoes inappropriate for scene

Much VR audio is based on very high qualityheadphones– Use head tracking to get position and orientation– Play to user– Problem solved! - well, no

Page 7: Lectures on Virtual Environment Development L6

Mon-aural source

Phantom source

L speaker R speaker

Stereo sound In the entertainment industry, stereo was

the first successfulcommercialproductinvolvingspatialsound.

To placesound on the left, send its signal to the leftloudspeaker, to place it on the right, send itssignal to the right loudspeaker.

Page 8: Lectures on Virtual Environment Development L6

*and if the speakers are wired "in phase" and if the listener is more or less midwaybetween the speakers and if the room is not too acoustically irregular

Stereo techniques

If same signal sent to both speakers*, phantom sourceseems to originate from point midway between them.

Crossfading signal from one speaker to the other givesimpression of source moving continuously between thetwo positions.

Simple crossfading cannot create impression of sourceoutside of line segment between speakers.

Can also shift the location of the phantom source byexploiting the precedence effect (delay).

Page 9: Lectures on Virtual Environment Development L6

A world of sound

You are surrounded by sound all thetime– Silence is unheard of!

The environment affects (shapes?) thesound you hear– Size, shape, materials

Page 10: Lectures on Virtual Environment Development L6

Rendering sound: auralisation

To generate correct echoes must modelsound behaviour in the space– Rooms are complex– Filled with different materials

• Reflective, Absorbant, Frequency filtering

Just like rendering light

Page 11: Lectures on Virtual Environment Development L6

Simulation of Room Characteristics

---- Early Reflections---- Reverberant Field----- Direct Sound

Page 12: Lectures on Virtual Environment Development L6

Over time

Page 13: Lectures on Virtual Environment Development L6

Putting sound into a VE

What sound? ‘Ambient’ sounds

– ‘Surround’ sound‘– Often use recorded sounds

Positional sounds– Designed to give a strong sense of something

happening in a particular place– Also often provided by using recorded sounds

Page 14: Lectures on Virtual Environment Development L6

Positional sound

Using sound to create the sense ofactive things in the environment– Enhances presence– Enhances immersion

Need to deal with many components– Reflections (echoes)– Diffraction effects

Page 15: Lectures on Virtual Environment Development L6

City models

VisClim: scene in Linköping’s Storatorget Surrounding environment

– Vehicles?• Several roads nearby

– People?• Many people in the square

– Weather noise effects• Rainfall• Snowfall (no sound but damping effect)

Page 16: Lectures on Virtual Environment Development L6

Visclim

Page 17: Lectures on Virtual Environment Development L6

Air Traffic control

Simulation– No ‘ambient’ sound required– No aircraft noises– No realism wanted?

Positional warnings?– Designed to draw the users attention to

the location of a problem– Which may be out of the field of view

Page 18: Lectures on Virtual Environment Development L6

What it looked like

Page 19: Lectures on Virtual Environment Development L6

Creating positional sound

Amplitude– (or more)

Synchronisation– Audio delays

Frequency– Head-Related Transfer Function (HRTF)

Page 20: Lectures on Virtual Environment Development L6

Amplitude

Generate audio from position sources Calculate amplitude from distance Include damping factors

– Air conditions– Snow– Directional effect of the ears

Page 21: Lectures on Virtual Environment Development L6

Synchronisation

Ears are very precise instruments Very good at hearing when something

happens after something else– Sound travels slowly (c 340 m/sec in air):

different distance to each ear Use this to help define direction

– Difference in amplitude gives only veryapproximate direction information

Page 22: Lectures on Virtual Environment Development L6

Speed effect

30 centimetres =0.0008seconds

Human can hear ≤ 700µS

Page 23: Lectures on Virtual Environment Development L6

What is 3D sound?

Able to position sounds all around a listener. Sounds created by loudspeakers/headphones:

perceived as coming from arbitrary points inspace.

Conventional stereo systems generally cannotposition sounds to side, rear, above, below

Some commercial products claim 3Dcapability - e.g stereo multimedia systemsmarketed as having “3D technology”. Butusually untrue.

Page 24: Lectures on Virtual Environment Development L6

3D positional sound

Humans have stereo ears Two sound pulse impacts

– One difference in amplitude– One difference in time of arrival

How is it that a human can resolvesound in 3D?– Should only be possible in 2D?

Page 25: Lectures on Virtual Environment Development L6

Frequency

Frequency responses of the earschange in different directions– Role of pinnae– You hear a different frequency filtering in

each ear– Use that data to work out 3D position

information

Page 26: Lectures on Virtual Environment Development L6

Head-Related TransferFunction

Unconscious use of time delay, amplitude difference,and tonal information at each ear to determine thelocation of the sound.– Known as sound localisation cues.– Sound localisation by human listeners has been studied

extensively. Transformation of sound from a point in space to the

ear canal can be measured accurately– Head-Related Transfer Functions (HRTFs).

Measurements are usually made by insertingminiature microphones into ear canals of a humansubject or a manikin.

Page 27: Lectures on Virtual Environment Development L6

HRTFs

HRTFs are 3D– Depend on ear shape (Pinnae) and

resonant qualities of the head!– Allows positional sound to be 3D

Computationally difficult– Originally done in special hardware

(Convolvotron)– Can now be done in real-time using DSP

Page 28: Lectures on Virtual Environment Development L6

HRTFs

First series of HRTFmeasurement experimentsin 1994 by Bill Gardner andKeith Martin, MachineListening Group at MITMedia Lab.

Data from theseexperiments made available for free on the web.

Picture shows Gardner and Martin with dummy usedfor experiment - called a KEMAR dummy.

A measurement signal is played by a loudspeakerand recorded by the microphones in the dummyhead.

Page 29: Lectures on Virtual Environment Development L6

HRTFs Recorded signals processed by computer,

derives two HRTFs (left and right ears)corresponding to sound source location.– HRTF typically consists of several hundred

numbers– describes time delay, amplitude, and tonal

transformation for particular sound source locationto left and right ears of the subject.

Measurement procedure repeated for manylocations of sound source relative to head– database of hundreds of HRTFs describing sound

transformation characteristics of a particular head.

Page 30: Lectures on Virtual Environment Development L6

HRTFs

Mimick process of natural hearing– reproducing sound localisation cues at the ears of listener.

Use pair of measured HRTFs as specification for apair of digital audio filters.

Sound signal processed by digital filters and listenedto over headphones– Reproduces sound localisation cues for each ear

– listener should perceive sound at the location specified bythe HRTFs.

This process is called binaural synthesis (binauralsignals are defined as the signals at the ears of alistener).

Page 31: Lectures on Virtual Environment Development L6

HRTFs

You should be able to describe the process involved ingenerating true 3D audio using HRTFs.

Page 32: Lectures on Virtual Environment Development L6

The problem

Rendering audio is really, really hard Much bigger problem than lighting Material properties are more complex

– Can’t fake it as easily– Properties are always a problem

Good methods exist but problem toocomputationally hard for these to be ingeneral use at present

Page 33: Lectures on Virtual Environment Development L6

What is possible now

Constraint is real time audio rendering– Must adapt to dynamic user who moves

unpredictably

Simple (reflectionless) stereo positional sound– Using amplitude– Using synchronization– Using HRTF frequency filtering

Useful for audio cues and simpleenvironmental sounds

Page 34: Lectures on Virtual Environment Development L6

What about surround sound?

Principal format for digitaldiscrete surround is the"5.1 channel" system.

The 5.1 name stands for fivechannels (in front: left, right andcentre, and behind: left surroundand right surround) of full-bandwidth audio (20 Hz to20 kHz)– sixth channel at times contain additional bass information to

maximise the impact of scenes such as explosions, etc.– This channel has a narrow freq. response (3 Hz to 120 Hz), thus

sometimes referred to as the ".1" channel.

Page 35: Lectures on Virtual Environment Development L6

What about surround sound?

Surround sound systems NOTtrue 3D audio systems - justcollection of more speakers.

Various commercialsurround sound formats - forhome entertainment, Dolbyis big name.– Dolby Surround Digital.

Lots of other proprietaryapproaches - e.g. theBattleChair (pictured).

Page 36: Lectures on Virtual Environment Development L6

Dolby Headphone

Dolby Headphone: based proprietary algorithm,presumably similar to HRTFs, originally developed byAustralian company Lake Technology– attempts to produce convincing surround-sound effects through

ordinary stereo headphones.

Technology originally developedfor VR or tele-conferencingapplications but not marketedfor consumer applications.

A more genuine 3Daudio system developed byUK company Sensaura.

Page 37: Lectures on Virtual Environment Development L6

Voice interaction

Voice input for control– Continuous? Discrete?

Voice output for information– Positional - alerts– Non-positional - ‘voice over’– Character-based - social channel

Page 38: Lectures on Virtual Environment Development L6

Voice output Voice synthesis

– Computer strings together set of phonemes (basiclanguage sound units)

– Problems with articulation: sounds robotic Unit selection voices

– Uses large database created from real voices– Plus sophisticated algorithm for putting bits

together– Good results but need very large memory (1

gigabyte) to hold database– Takes lots of time and expertise to create ‘voice’