44
CS1315: Introduction to Media Computation Sound Encoding and Manipulation

CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Embed Size (px)

Citation preview

Page 1: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

CS1315:Introduction to Media Computation

Sound Encoding and Manipulation

Page 2: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Sound is made when something vibrates

The vibration disturbs the air around it and makes changes in air pressure

These changes in air pressure move through the air as sound waves

The sound waves cause pressure changes against our ear drum sending nerve impulses to our brain

Page 3: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Sound waves are pressure waves

Each air molecule oscillates back and forth, affecting their neighbors…air molecules themselves don’t travel from vibration source to your ear

Creates areas of high and low pressure, called compressions and rarefactions

Page 4: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Sound waves are longitudinal

Representation of sound by a sine wave is merely an attempt to illustrate the sinusoidal nature of the pressure-time fluctuations

Page 5: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Properties of sound waves

Pressure fluctuation comes in cycles frequency of wave is number of cycles per second (cps), or Hertz

(Complex sounds have more than one frequency in them.) amplitude is maximum height of the wave (aka maximum pressure

fluctuation)

Each repetition of a waveform is called a cycle

Page 6: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Volume and pitch: Psychoacoustics, the psychology of sound

Our perception of pitch is related (logarithmically) to frequency Higher frequencies are perceived as higher pitches A above middle C is 440 Hz

Our perception of volume is related (logarithmically) to changes in amplitude If the amplitude doubles, it’s about a 3 decibel (dB) change

Page 7: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Humans and Pitch

In general we can hear frequencies between 20 Hz and 20,000 Hz

Hz = hertz = cycles per second 20,000 Hz = 20 kHz (kilohertz) The older we get, the less we can perceive really high

frequencies – recall those commercials for ‘silent’ ring-tones for teens

Many animals can make sounds and hear frequencies that are beyond what we can hear – hence the creation of dog whistles

Page 8: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

“Logarithmically?”

It’s strange, but our hearing works on ratios not differences, e.g., for pitch. We hear the difference between 200 Hz and 400 Hz, as

the same as 500 Hz and 1000 Hz Similarly, 200 Hz to 600 Hz, and 1000 Hz to 3000 Hz

Intensity (volume) is measured as watts per meter squared A change from 0.1W/m2 to 0.01 W/m2, sounds the same

to us as 0.001W/m2 to 0.0001W/m2

Page 9: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Decibel is a logarithmic measure

A decibel is a ratio between two intensities: 10 * log10(I1/I2) As an absolute measure, it’s in comparison to threshold

of audibility 0 dB can’t be heard. Normal speech is 60 dB. A shout is about 80 dB

Page 10: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Digitizing Sound: How do we get that into numbers? Waves are analog (continuous) Remember in calculus,

estimating the curve by creating rectangles?

We can do the same to estimate the sound curve Analog-to-digital conversion

(ADC) will give us the amplitude at an instant as a number: a sample

How many samples do we need?

Remember…pictures are continuous. How do we represent them digitally?

Page 11: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Understanding how computers represent sound Consider how a film represents motion…

Movie is made by taking still photos in rapid sequence at a constant rate, usually twenty-four frames per second

When the photos are displayed in sequence at that same rate, it fools us into thinking we are seeing continuous motion, even though we are actually seeing twenty-four discrete images per second

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 12: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

http://en.wikipedia.org/wiki/Animation

A collection of still frames

Page 13: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 14: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 15: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 16: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 17: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 18: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 19: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 20: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 21: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 22: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 23: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 24: CS1315: Introduction to Media Computation Sound Encoding and Manipulation
Page 25: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

How computers represent sound

Digital recording of sound works on the same principle We take many discrete samples of the sound wave's

instantaneous amplitude, store that information, then later reproduce those amplitudes at the same rate to create the illusion of a continuous wave

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 26: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

How often should we take samples?

Many many samples must be taken per second--many more than are necessary for filming a visual image

In fact, we need to take more than twice as many samples as the highest frequency we wish to record…enter Nyquist Theorem

The number of samples taken per second is known as the sampling rate

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 27: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Nyquist Theorem* (will be on exam)

We need twice as many samples as the maximum frequency in order to represent (and recreate, later) the original sound.

The number of samples recorded per second is the sampling rate If we capture 8000 samples per second, the highest frequency

we can capture is 4000 Hz That’s how phones work

CD quality is 44,100 samples per second…what is the max frequency a CD can represent?

Page 28: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Nyquist Theorem

Think back to the example of a film camera, which shoots 24 frames per second If we're shooting a movie of a car, and the car wheel spins at a

rate greater than 12 revolutions per second, it's exceeding half the "sampling rate" of the camera… the wheel completes more than 1/2 revolution per frame.

If, for example it actually completes 18/24 of a revolution per frame, it will appear to be going backward at a rate of 6 revolutions per second

In other words, if we don't witness what happens between samples, a 270-degree revolution of the wheel is indistinguishable from a -90-degree revolution. The samples we obtain in the two cases are precisely the same

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 29: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Nyquist Theorem For audio sampling, the phenomenon is

practically identical Consider a graph of a 4,000 Hz cosine wave,

being sampled at a rate of 22,050 Hz Sample is taken every 0. 045 milliseconds

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 30: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Nyquist Theorem Consider same 4,000 Hz cosine wave sampled at

6,000 Hz Sample is taken every 0. 167 milliseconds Wave completes more than 1/2 cycle per sample, and

the resulting samples are indistinguishable from those that would be obtained from a 2,000 Hz cosine wave

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 31: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Nyquist Theorem

The simple lesson to be learned from the Nyquist theorem: digital audio cannot accurately represent any frequency greater than half the sampling rate

If we want to record frequencies as high as 20,000 Hz, how many times per second would we need to sample the sound?

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 32: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Digitizing sound with the computer: bit depth Each sample is a numerical value representing the

instantaneous amplitude of the signal at the moment it was sampled

The range of possible numbers used by a computer depends on the number of binary digits (bits) used to store each number

As the number of bits increases, the range of possible numbers they can express increases by a power of two

How many bits did we use per color channel per pixel? How many values could each color channel take on?

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 33: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Digitizing sound with the computer

Each sample is stored as a number (two bytes=16 bits) What’s the range of available combinations?

16 bits, 216 = 65,536 But we want both positive and negative values

To indicate compressions and rarefactions. What if we use one bit to indicate positive (0) or negative (1)? That leaves us with 16 - 1 = 15 bits 15 bits, 215 = 32,768 One of those combinations will stand for zero

We’ll use a “positive” one, so that’s one less pattern for positives

Page 34: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

+/- 32K

Each sample can be between -32,768 and 32,767

Compare this to 0..255 for light intensity

(i.e. 8 bits or 1 byte giving us 256 different values)

Why such a bizarre number?

Because 32,768 + 32,767 + 1 = 216

i.e. 16 bits, or 2 bytes< 0 > 0 0

Page 35: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Sounds as arrays (called lists in jython) Samples are just stored one right after the other in the

computer’s memory That’s called an array

It’s an especially efficient (quickly accessed) memory structure

(Like pixels in a picture)

Page 36: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Working with sounds

We’ll use pickAFile and makeSound. We want .wav files (NOTE: not all .wav files will work)

We’ll use getSamples to get all the sample objects out of a sound

We can also get the value at any index with getSampleValueAt

Can get a sound’s length (getLength(sound)) Can get a sound’s sampling rate

(getSamplingRate(sound)) Can save sounds with writeSoundTo(sound,"file.wav")

Page 37: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Demonstrating Working with Sound in JES

>>> filename = pickAFile()>>> print filename/Users/monica/Desktop/MediaSources/preamble.wav>>> sound = makeSound(filename)>>> print soundSound of length 421109>>> samples = getSamples(sound)>>> print samplesSamples, length 421109>>> print getSampleValueAt(sound, 1)36>>> print getSampleValueAt(sound, 2)29

Page 38: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Demonstrating working with samples>>> print getLength(sound)220568>>> print getSamplingRate(sound)22050.0>>> print getSampleValueAt(sound, 220568)68>>> print getSampleValueAt(sound, 220570) # note this is too highI wasn't able to do what you wanted.The error java.lang.ArrayIndexOutOfBoundsException has occuredPlease check line 0 of >>> print getSampleValueAt(sound, 1)36>>> setSampleValueAt(sound,1, 12)>>> print getSampleValueAt(sound, 1)12

Page 39: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Working with Samples

We can get sample objects out of a sound with getSamples(sound) or getSampleObjectAt(sound, index)

A sample object it still part of the sound object, so if you change the sample object, the sound changes.

Sample objects understand getSampleValue(sample) and setSampleValue(sample, value)

Page 40: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Example: Manipulating Samples>>> soundfile=pickAFile()>>> sound=makeSound(soundfile)>>> sample=getSampleObjectAt(sound, 1)>>> print sampleSample at 1 value at 59>>> print soundSound of length 387573>>> print getSound(sample)Sound of length 387573>>> print getSampleValue(sample)59>>> setSampleValue(sample, 29)>>> print getSampleValue(sample)29

Page 41: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

“But there are thousands of these samples!” How do we do something to these samples to

manipulate them, when there are thousands of them per second?

We use a loop and get the computer to iterate in order to do something to each sample.

An example loop that just gets and reassigns the same value:

for sample in getSamples(sound): value = getSampleValue(sample) setSampleValue(sample, value)

Page 42: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

N.B.: function name changes

setSampleValue(sample, 200) used to be called setSample(sample, 200)

getSampleValue(sample) used to be called getSample(sample)

So if you are using an old version of JES (pre-3.1) you may have to use setSample and getSample.

Page 43: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Properties of sound

Frequency Amplitude Wavelength

http://music.arts.uci.edu/dobrian/digitalaudio.htm

Page 44: CS1315: Introduction to Media Computation Sound Encoding and Manipulation

Properties of digital sound

Sampling rate/frequency – not to be confused with the frequency of the sound we are trying to capture!

Bit depth

http://music.arts.uci.edu/dobrian/digitalaudio.htm