58
Multimedia Systems Chapter 3: Audio and video Technology

Multimedia Systems Chapter 3: Audio and video Technology

Embed Size (px)

Citation preview

Multimedia Systems

Chapter 3: Audio and video Technology

Audio• Audio is a wave resulting from air

pressure disturbance that reaches our eardrum generating the sound we hear.– Humans can hear frequencies in the range

20-20,000 Hz.• ‘Acoustics’ is the branch of physics that

studies sound

Facsimile Technology

All modes of mass communication are based on the process of facsimile technology. That is, sounds from a speaker and pictures on a TV screen are merely representations, or facsimiles, of their original form.

In general, the more faithful the reproduction

or facsimile is to the original, the greater is

its fidelity. High-fidelity audio, or hi-fi, is a

close approximation of the original speech or

music it represents. And a videocassette

recorder marketed as high fidelity boasts

better picture quality than a VCR without hi-

fi (known as H-Q, to distinguish video high

fidelity from its audio counterpart).

The second point about facsimile technology is that in creating their facsimiles, radio and TV are not limited to plaster of Paris, crayon, oils, or even photographic chemicals and film. Instead, unseen elements such as radio waves, beams of light, and digital bits and bytes are utilized in the process

Bear in mind that the engineer’s goal in radio, TV, and cable is to

• create the best possible facsimile of our original sound or image, to transport that image without losing too much fidelity (known as signal loss), and to:

• re-create that sound or image as closely as possible to its original form. Today, engineers use both analog and digital systems to transport images and sounds, but more and more we are switching to digital transmission.

Transduction

• Another basic concept is transduction, the process of changing one form of energy into another: When the telephone operator says “the number is 555-2796” and you write it down on a sheet of notepaper.

Why does this matter?

getting a sound or picture from a TV studio or

concert hall to your home usually involves at

least three or four transductions. At each phase

loss of fidelity is possible and must be

controlled. With our current system of

broadcasting, it is possible that with each phase

the whole process may break down into noise

—unwanted interference—rendering the

communication impossible.

Sound and audio• While sound refers to the ability of

vibrations to pass through a medium and reflect off a medium. Audio is the ability to digitally create sound using electronic equipment.

Sound is a continuous wave that travels through air. The wave itself is comprised of pressure difference. Detection of sound is accomplished by measuring these pressure levels and their succession in time. The human ear does this detection naturally when the wave with its pressure differences impinges on the

• The properties of sound include: Frequency, Wavelength, Wave number, Amplitude, Sound pressure, Sound intensity, Speed of sound and Direction. The speed of sound is an important property that determines the speed at which sound travels. The speed of sound differs depending on the medium through which it travels.

The frequency refers to the rate at which the

wave repeats. It is expressed as cycles per

second or by the units hertz. The human hear is

capable of perceiving wave frequencies in the

range 20Hz and 20KHz, which is audio in

nature. The amplitude is a measure of the

displacement of the wave from the mean. For

human perception this is related but not the

same as loudness.

Time

Amplitude

One Period

Air

Pre

ssur

e

One particular frequency component

The wavelength of a sound is the distance the disturbance travels in one cycle and is related to the sound’s speed and frequency.

However, in order to store this input in a computer

one has to convert it to a digital form, that is into

0s and 1s. Further a continuous wave has infinite

resolution which cannot be represented in a

computer.

Waveform Representation

Audio Capture

Sampling & Digitization

Storage or Transmission

ReceiverDigital to Analog

Playback (speaker)

Audio Source

Human Ear

Audio Generation and Playback

SIGNAL GENERATION

• This step involves the creation of the necessary oscillations, or detectable vibrations of electrical energy, which correspond to the frequencies of their original counterparts in nature. In plain language, signal generation involves getting the sound vibrations into a microphone, or the bits and bytes onto a CD, a DVD, or an MP3 player.

Audio Signal Generation

• Sound signals are generated by

two main transduction processes:

mechanical and electronic.

Mechanical methods, like

microphones, phonograph records,

and tape recorders, have been in

use for many years.

Mechanical Methods: Mechanical generation

• Mechanical means are used to translate sound waves into a physical form, one you can hold in your hand, like a phonograph record or an audiocassette.

Inside the microphone

• One place where speech or music is mechanically re-created to produce electrical signals is inside a microphone. There are three basic types of microphones: dynamic, velocity, and condenser. Each produces the waveforms required for transmission in a different manner.

dynamic microphone

• In the center of the microphone is a coil of electrical wire, called a voice coil. Sound pressure vibrates the diaphragm, which moves the voice coil up and down between the magnetic poles.

Digitization

• Digitization is achieved by recording or sampling the continuous sound wave at discrete points. The more frequently one samples the closer one gets to capturing the continuity of the wave

23

Principles of Digitization• Sampling: Divide the horizontal axis (time) into discrete

pieces

• The other aspect of digitization is the

measurement of the voltages at these

discrete sampling points. As it turns out

these values may be of arbitrary

precision, that is we could have values

containing small fractions or decimal

numbers that take more bits to represent.

• Quantization: Divide the vertical axis (signal

strength - voltage) into pieces. For example, 8-

bit quantization divides the vertical axis into

256 levels. 16 bit gives you 65536 levels.

Lower the quantization, lower the quality of

the sound

Coding

• The process of representing quantized values digitally

28

Sampling and Quantization

Time

Sam

ple

Time

Sam

ple

Sampling 3-bit quantization

• Sampling rate: Number of samples per second (measured in Hz)

• E.g., CD standard audio uses a sampling rate of 44,100 Hz (44100 samples per second)

3-bit quantization gives 8 possible sample values

E.g., CD standard audio uses 16-bit quantization giving 65536 values.

Why Quantize? To Digitize!

Quantizing

• Instead of sending the actual sample, first the sampled signal was put into a known number of levels, which is informed to the receiver.

• Suppose instead of sending a whole range of voltages, the source informs the destination that it is going to send only 4 voltage levels, say 0-3V. For example if the sample is 2.7V, first, source will convert it into a 3V sample. Then it will be sent through the transmission medium. Destination gets a sample of 3.3V. Then immediately he knows that this is not an agreed level, hence the sent value has been changed. destination converts 3.3V sample back into a 3V.

linear quantization

• With linear quantization every increment in the sampled value corresponds to a fixed size analogue increment. E.g. an 8 bit A-D or D-A with a 0 - 1 V analogue range has 1 / 256 = 3.9 mV per bit, regardless of the actual signal amplitude.

Non-linear quantization• With non-linear quantization you

normally have some sort of logarithmic encoding, so that the increment for small sample values is much smaller than the increment for large sample values. Ideally the step size should be roughly proportional to the sample size

Quality of voice

• The quality of voice transmission is measured by signal to noise ratio. That is the division of the original signal value by the change made when quantizing.

• The following S/N ratios were calculated

Linear quantizing is not used. Why....?

• It is noticeable that even though the noise is the same for both these signals the S/N ratio is highly different. It seems that Linear Quantizing gives high S/N ratios for high signals and low S/N ratios for low signals

Nyquist Theorem• Any analog signal consists of

components at various frequencies. The simplest case is the sine wave, in which all the signal energy is concentrated at one frequency. In practice, analog signals usually have complex waveforms, with components at many frequencies. The highest frequency component in an analog signal determines the bandwidth of that signal. The higher the frequency, the greater the bandwidth, if all other factors are held constant.

• Suppose the highest frequency

component, in hertz, for a given analog

signal is fmax. According to the Nyquist

Theorem, the sampling rate must be at

least 2fmax, or twice the highest analog

frequency component. The sampling in an

analog-to-digital converter is actuated by a

pulse generator (clock). If the sampling

rate is less than 2fmax, some of the highest

frequency components in the analog input signal

will not be correctly represented in the digitized

output.

38

Nyquist Theorem

• For Lossless digitization, the sampling rate should be at least twice the maximum frequency responses

Sampling once a cycle

Sampling 1.5 times each cycle

Appears as a constant signal

Appears as a low frequency

sine signal

Consider a sine wave

Characteristics of Audio

• Audio has normal wave properties– Reflection– Refraction– Diffraction

• A sound wave has several different properties:– Amplitude (loudness/intensity)– Frequency (pitch)– Envelope (waveform)•Refraction occurs when a wave crosses a boundary from one

medium to another. A wave entering a medium at an angle will change direction. •Diffraction refers to the "bending of waves around an edge" of an object. Diffraction depends on the size of the object relative to the wavelength of the wave

Decibel (dB)

• The decibel (dB) is a logarithmic unit used to describe a ratio. The ratio may be power, or voltage or intensity or several other things

• Suppose we have two loudspeakers, the first playing a sound with power P1, and another playing a louder version of the same sound with power P2, but everything else (how far away, frequency) kept the same.

• The difference in decibels between the two is given by 10 log (P2/P1) dB

• If the second produces twice as much power than the first, the difference in dB is 10 log (P2/P1) =

10 log 2 = 3 dB.

• If the second had 10 times the power of the first, the difference in dB would be 10 log (P2/P1)= 10 log 10 = 10 dB.

• If the second had a million times the power of the first, the difference in dB would be 10 log (P2/P1) = 10 log 1000000 = 60 dB.

What happens when you halve the sound power?

• The log of 2 is 0.3, so the log of 1/2 is -

0.3. So, if you halve the power, you

reduce the power and the sound level

by 3 dB. Halve it again (down to 1/4 of

the original power) and you reduce the

level by another 3 dB. That is exactly

what we have done in the first graphic

and sound file below.

The first sample of sound is white noise (a mix of all audible frequencies, just as white light is a mix of all visible frequencies). The second sample is the same noise, with the voltage reduced by a factor of the square root of 2. 2-0.5 is approximately 0.7, so -3 dB corresponds to reducing the voltage or the pressure to 70% of its original value.

How big is a decibel?

• One decibel is close to the Just Noticeable Difference (JND) for sound level. As you listen to these files, you will notice that the last is quieter than the first, but it is rather less clear to the ear that the second of any pair is quieter than its predecessor. 10*log10(1.26) = 1, so to increase the sound level by 1 dB, the power must be increased by 26%, or the voltage by 12%.

Standard reference levels ("absolute" sound level)

• When the decibel is used to give the sound level for a single sound rather than a ratio, then a reference level must be chosen. For sound intensity, the reference level (for air) is usually chosen as 20 micropascals, or 0.02 mPa

• Cont…..

• This is very low: it is 2 ten billionths of an atmosphere. Nevertheless, this is about the limit of sensitivity of the human ear, in its most sensitive range of frequency. Usually this sensitivity is only found in rather young people or in people who have not been exposed to loud music or other loud noises.

• Personal music systems with in-ear speakers ('walkmans') are capable of very high sound levels in the ear, and are believed by some to be responsible for much of the hearing loss in young adults in developed countries.

• So if you read of a sound intensity level of 86 dB, it means that

20 log (p2/p1) = 86 dB

• where p1 is the sound pressure of the reference level, and p2 that of the sound in question. Divide both sides by 20:

log (p2/p1) = 4.3

p2/p1 = 104.3

p2/p1 = 104.3

• 4 is the log of 10 thousand, 0.3 is the log of 2, so this sound has a sound pressure 20 thousand times greater than that of the reference level (p2/p1 = 20,000). 86 dB is a loud but not dangerous level of sound, if it is not maintained for very long.

What does 0 dB mean?

• This level occurs when the measured intensity is equal to the reference level. i.e., it is the sound level corresponding to 0.02 mPa. In this case we have sound level = 20 log (pmeasured/preference) = 20 log 1 = 0 dB

• So 0 dB does not mean no sound, it means a sound level where the sound pressure is equal to that of the reference level. This is a small pressure, but not zero. It is also possible to have negative sound levels: - 20 dB would mean a sound with pressure 10 times smaller than the reference pressure, i.e. 2 micropascals.

Audio Amplitude• In microphones, audio is captured as

analog signals (continuous amplitude and time) that respond proportionally to the sound pressure, p.

• The power in a sound wave, all else equal, goes as the square of the pressure.– Expressed in dynes/cm2.

• The difference in sound pressure level between two sounds with p1 and p2 is therefore 20 log10 (p2/p1) dB

• The “acoustic amplitude” of sound is measured in reference to p1 = pref = 0.0002 dynes/cm2.– The human ear is insensitive to sound pressure

levels below pref.

Audio AmplitudeIntensity Typical Examples

0 dB Threshold of hearing20 dB Rustling of paper25 dB Recording studio (ambient level)40 dB Resident (ambient level)50 dB Office (ambient level)

60 - 70 dB Typical conversation80 dB Heavy road traffic90 dB Home audio listening level

120 - 130 dB

Threshold of pain

140 dB Rock singer screaming into microphone

Audio Frequency• Audio frequency is the number of high-to-

low pressure cycles that occurs per second.– In music, frequency is referred to as pitch.

• Different living organisms have different abilities to hear high frequency sounds– Dogs: up to 50KHz– Cats: up to 60 KHz– Bats: up to 120 KHz– Dolphins: up to 160KHz– Humans:

• Called the audible band.• The exact audible band differs from one to another

and deteriorates with age.

Audio Frequency• The frequency range of sounds can be

divided into– Infra sound 0 Hz – 20 Hz– Audible sound 20 Hz– 20 KHz– Ultrasound 20 KHz – 1 GHz– Hypersound 1 GHz – 10 GHz

• Sound waves propagate at a speed of around 344 m/s in humid air at room temperature (20 C)– Hence, audio wave lengths typically vary

from 17 m (corresponding to 20Hz) to 1.7 cm (corresponding to 20KHz).

• Sound can be divided into periodic (e.g. whistling wind, bird songs, sound from music) and nonperiodic (e.g. speech, sneezes and rushing water).

Audio Frequency

• Most sounds are combinations of different frequencies and wave shapes. Hence, the spectrum of a typical audio signal contains one or more fundamental frequency, their harmonics, and possibly a few cross-modulation products.– Fundamental frequency– Harmonics

• The harmonics and their amplitude determine the tone quality or timbre.

Audio Envelope

• When sound is generated, it does not last forever. The rise and fall of the intensity of the sound is known as the envelope.

• A typical envelope consists of four sections: attack, decay, sustain and release.

Audio Envelope

• Attack: The intensity of a note increases from silence to a high level

• Decay: The intensity decreases to a middle level.

• Sustain: The middle level is sustained for a short period of time

• Release: The intensity drops from the sustain level to zero.

Audio Envelope

• Different instruments have different envelope shapes– Violin notes have slower attacks but a

longer sustain period.– Guitar notes have quick attacks and a

slower release