Upload
rajannaabhshek
View
218
Download
0
Embed Size (px)
Citation preview
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 1/64
Basics of Digital Audio
Dr. Shashidhara H LDr. Shashidhara H L
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 2/64
Chapter 6
Basics of Digital Audio
6.1 Digitization of Sound
6.2 MIDI: Musical Instrument DigitalInterface
6.3 Quantization and Transmission of
Audio
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 3/64
6.1 Digitization of Sound
What is Sound?Sound is a wave phenomenon like light, but ismacroscopic and involves molecules of air
being compressed and expanded under theaction of some physical device.(a) For example, a speaker in an audio systemvibrates back and forth and produces alongitudinal pressure wave that we perceive assound.(b) Since sound is a pressure wave, it takes oncontinuous values, as opposed to digitizedones.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 4/64
Digitization of Sound
(c) Even though such pressure waves arelongitudinal, they still have ordinary waveproperties and behaviors, such asreflection (bouncing), refraction (changeof angle when entering a medium with adifferent density) and diffraction (bendingaround an obstacle).
(d) If we wish to use a digital version of sound waves we must form digitizedrepresentations of audio information.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 5/64
Digitization
Digitization means conversion to astream of numbers, and preferably
these numbers should be integers forefficiency.
Fig. 6.1 shows the 1-dimensionalnature of sound: amplitude values
depend on a 1D variable, time. (Andnote that images depend instead ona 2D set of variables, x and y ).
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 6/64
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 7/64
Digitization
The graph in Fig. 6.1 has to be madedigital in both time and amplitude. To
digitize, the signal must be sampled ineach dimension: in time, and inamplitude.
(a) Sampling means measuring the
quantity we are interested in, usually atevenly-spaced intervals.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 8/64
Digitization
(b) The first kind of sampling, usingmeasurements only at evenly spaced timeintervals, is simply called, sampling. The rate at
which it is performed is called the samplingfrequenc y (see Fig. 6.2(a)).
(c) For audio, typical sampling rates are from 8kHz (8,000 samples per second) to 48 kHz.
This range is determined by Nyquist theoremdiscussed later.
(d) Sampling in the amplitude or voltagedimension is called quantization. Fig. 6.2(b)shows this kind of sampling.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 9/64
Digitization
Fig. 6.2: Sampling and Quantization.(a): Sampling the analog signal in the timedimension. (b): Quantization is sampling theanalog signal in the amplitude dimension.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 10/64
Digitization
Thus to decide how to digitize audiodata we need to answer the followingquestions:
1. What is the sampling rate?2. How finely is the data to bequantized, and is quantizationuniform?3. How is audio data formatted? (fileformat)
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 11/64
Nyquist Theorem
Signals can be decomposed into a sumof sinusoids.
Fig. 6.3 shows how weighted sinusoidscan build up quite a complex signal.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 12/64
Nyquist Theorem
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 13/64
Nyquist Theorem
Whereas frequency is an absolute measure,pitch is generally relative - a perceptualsubjective quality of sound.
(a) Pitch and frequency are linked by settingthe note A above middle C to exactly 440 Hz.
(b) An octave above that note takes us toanother A note. An octave corresponds to
doubling the frequenc y . Thus with the middle ³A´ on a piano (³A4´ or ³A44´) set to 440 Hz,the next ³A´ up is at 880 Hz, or one octaveabove.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 14/64
Nyquist Theorem
(c) Harmonics: any series of musical toneswhose frequencies are integral multiples of the frequency of a fundamental tone: Fig. 6.
(d) If we allow non-integer multiples of thebase frequency, we allow non - ³A´ notes andhave a more complex resulting sound.
The Nyquist theorem states how frequently we
must sample in time to be able to recover theoriginal sound.
(a) Fig. 6.4(a) shows a single sinusoid: it is asingle, pure, frequency (only electronic
instruments can create such sounds).
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 15/64
Nyquist Theorem
(b) If sampling rate just equals the actualfrequency, Fig. 6.4(b) shows that a false signalis detected: it is simply a constant, with zerofrequency.
(c) Now if sample at 1.5 times the actualfrequency, Fig. 6.4(c) shows that we obtain anincorrect (alias) frequency that is lower thanthe correct one | it is half the correct one (the
wavelength, from peak to peak, is double thatof the actual signal).(d) Thus for correct sampling we must use asampling rate equal to at least twice thema x imum frequenc y content in the signal. This
rate is called the Nyquist rate.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 16/64
Nyquist Theorem
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 17/64
Nyquist Theorem
Fig.6.4: Aliasing. (a): A single frequency.(b): Sampling at exactly the frequency
produces a constant. (c): Sampling at 1.5times per cycle produces an alias perceived
frequency.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 18/64
Nyquist Theorem
Nyquist Theorem: If a signal is band-limited, i.e., there is a lower limit f 1 and anupper limit f 2 of frequency components in the
signal, then the sampling rate should be atleast 2(f 2 í f 1).
Nyquist frequency: half of the Nyquist rate.
- Since it would be impossible to recover
frequencies higher than Nyquist frequency inany event, most systems have an antialiasingf ilter that restricts the frequency content in theinput to the sampler to a range at or belowNyquist frequency.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 19/64
Nyquist Theorem
The relationship among the SamplingFrequency, True Frequency, and the AliasFrequency is as follows:
falias = f sampling í ftrue; for ftrue <f sampling < 2 ftrue (6:1)
In general, the apparent frequency of asinusoid is the lowest frequency of a sinusoidthat has exactly the same samples as the inputsinusoid. Fig. 6.5 shows the relationship of the
apparent frequency to the input frequency.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 20/64
Nyquist Theorem
Fig. 6.5: Folding of sinusoid frequencywhich is sampled at 8,000 Hz. Thefolding frequency, shown dashed, is4,000 Hz.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 21/64
Signal to Noise Ratio (SNR)
The ratio of the power of the correct signaland the noise is called the signal to noi seratio (SNR ) - a measure of the quality of
the signal. The SNR is usually measured in decibels
(dB), where 1 dB is a tenth of a bel. TheSNR value, in units of dB, is dened in
terms of base-10 logarithms of squaredvoltages, as follows:
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 22/64
Signal to Noise Ratio (SNR)
a) The power in a signal is proportional to thesquare of the voltage. For example, if thesignal voltage Vsignal is 10 times the noise,
then the SNR is 20 log10(10)=
20dB. b) In terms of power, if the power from ten
violins is ten times that from one violin playing,then the ratio of power is 10dB, or 1B.
c) T o know: Power - 10; Signal Voltage - 20.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 23/64
Signal to Noise Ratio (SNR)
The usual levels of sound we hear around
us are described in terms of decibels, as aratio to the quietest sound we are capable
of hearing. Table 6.1: Magnitude levels of common sounds, in decibels
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 24/64
Signal to Quantization Noise
Ratio (SQNR) Aside from any noise that may have beenpresent in the original analog signal, there isalso an additional error that results fromquantization.
(a) If voltages are actually in 0 to 1 but wehave only 8 bits in which to store values, theneffectively we force all continuous values of voltage into only 256 different values.
(b) This introduces a round off error. It is notreally ³noise´. Nevertheless it is calledquantization noise (or quantization error).
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 25/64
Signal to Quantization Noise
Ratio (SQNR) The quality of the quantization ischaracterized by the Signal toQuantization Noise Ratio (SQNR ).
(a) Quantization noise: the differencebetween the actual value of the analogsignal, for the particular sampling time,and the nearest quantization intervalvalue.
(b) At most, this error can be as much ashalf of the interval.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 26/64
Signal to Quantization Noise
Ratio (SQNR) (c) For a quantization accuracy of N
bits per sample, the SQNR can besimply expressed:
Notes:(a) We map the maximum signal to 2N í1 í 1 (' 2N í1) and the most negative signal to í2N í1 .(b) Eq. (6.3) is the Peak signal-to-noise ratio,PSQNR: peak signal and peak noise.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 27/64
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 28/64
Linear and Non-linear
Quantization
Linear f ormat: samples are typicallystored as uniformly quantized values.
Non-unif orm quantization: set up
more finely-spaced levels where humanshear with the most acuity.
- Weber's Law stated formally says thatequally perceived differences have valuesproportional to absolute levels:
¨Response ¨Stimulus/Stimulus
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 29/64
Linear and Non-linear
Quantization
Inserting a constant of proportionality k , we have adifferential equation that states:
dr = k (1 / s) d s (6:6)
with response r and stimulus s.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 30/64
Linear and Non-linear
Quantization
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 31/64
Linear and Non-linear
Quantization
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 32/64
Linear and Non-linear
Quantization
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 33/64
Quantization and Transmission
of Audio
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 34/64
Quantization and Transmission
of Audio
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 35/64
Pulse Code Modulation
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 36/64
Pulse Code Modulation
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 37/64
Pulse Code Modulation
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 38/64
Pulse Code Modulation
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 39/64
Pulse Code Modulation
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 40/64
Pulse Code Modulation
2. A discontinuous signal contains not justfrequency components due to the originalsignal, but also a theoretically infinite setof higher-frequency components:
(a) This result is from the theory of Fourier analysis, in signal processing.
(b) These higher frequencies areextraneous.
(c) Therefore the output of the digital-to-analog converter goes to a low-passf ilter that allows only frequencies up tothe original maximum to be retained.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 41/64
MIDI:MIDI: Musical Instrument Digital
Interface
Use the sound card's defaults for sounds: =>use a simple scripting language and hardwaresetup called MIDI.
MIDI Overview(a) MIDI is a scripting language - it codes ³events´ that stand for the production of sounds. E.g., a MIDI event might include
values for the pitch of a single note, itsduration, and its volume.
(b) MIDI is a standard adopted by theelectronic music industry for controllingdevices, such as synthesizers and sound cards,that roduce music.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 42/64
Musical Instrument Digital
Interface
(c) The MIDI standard is supported bymost synthesizers, so sounds created onone synthesizer can be played and
manipulated on another synthesizer andsound reasonably close.
(d) Computers must have a special MIDIinterface, but this is incorporated into
most sound cards. The sound card mustalso have both D/A and A/D converters.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 43/64
MIDI Concepts
MIDI channels are used to separatemessages.
(a) There are 16 channels numbered from 0to 15. The channel forms the last 4 bits (theleast significant bits) of the message.
(b) Usually a channel is associated with aparticular instrument: e.g., channel 1 is thepiano, channel 10 is the drums, etc.
(c) Nevertheless, one can switch instrumentsmidstream, if desired, and associate anotherinstrument with any channel.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 44/64
System messages
(a) Several other types of messages, e.g.general message for all instruments indicatingchange in tuning or timing.
(b) If the first 4 bits are all 1s, then the messagis interpreted as a system common message.
The way a synthetic musical instrumenresponds to a MIDI message is usually by simpl
ignoring any play sound message that is not foits channel.
- If several messages are for its channel, thethe instrument responds, provided it is multivoice, i.e., can play more than a single note aonce.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 45/64
System messages
It is easy to confuse the term voice with theterm timbre - the latter is MIDI terminologyfor just what instrument that is trying to beemulated, e.g. a piano as opposed to a violin: it
is the quality of the sound. (a) An instrument (or sound card) that is
multi-timbral is one that is capable of playingmany different sounds at the same time, e.g.,piano, brass, drums, etc.
(b) On the other hand, the term voice, whilesometimes used by musicians to mean thesame thing as timbre, is used in MIDI to meanevery different timbre and pitch that the tone
module can produce at the same time.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 46/64
System messages
Different timbres are produced digitally byusing a patch - the set of control settingsthat define a particular timbre. Patches
are often organized into databases, calledbanks.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 47/64
General MIDI: A standard mapping specifying what instrumen
will be associated with what channels. (a) In General MIDI, channel 10 is reserved f
percussion instruments, and there are 1patches associated with standard instruments.
(b) For most instruments, a typical messamight be a Note On message (meaning, e.g.,key press and release), consisting of whchannel, pitch and ³velocity´ (i.e., volume).
(c) For percussion instruments, however, tpitch data means which kind of drum. (d) A Note On message consists of ³status´ byte
which channel, what pitch - followed by two dabytes. It is followed by a Note off message, whi
also has a pitch (which note to turn o) and
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 48/64
Status ByteStatus Byte
The data in a MIDI status by te is between128 and 255; each of the data by tes isbetween 0 and 127. Actual MIDI bytes are
10-bit, including a 0 start and 0 stop bit.
Fig. 6.8: Stream of 10-bit bytes
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 49/64
MIDIMIDI
A MIDI device often is capable of programmability, and also can changethe envelope describing how the
amplitude of a sound changes over time.
Fig. 6.9: Stages of amplitude versus time for a music note
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 50/64
Hardware Aspects of MIDI
The MIDI hardware setup consists of a31.25 kbps serial connection. Usually,MIDI-capable units are either Input
devices or Output devices, not both.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 51/64
MIDI HardwareMIDI Hardware
The physical MIDI ports consist of 5-pinconnectors for IN and OUT, as well as a thirdconnector called THRU.
(a) MIDI communication is half-duplex.
A communications channel allowing alternatingtransmission in two directions, but not in bothdirections simultaneously.
(b) MIDI IN is the connector via which the
device receives all MIDI data.(c) MIDI OUT is the connector through whichthe device transmits all the MIDI data itgenerates itself.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 52/64
MIDI HardwareMIDI Hardware
(d) MIDI THRU is the connector by whichthe device echoes the data it receivesfrom MIDI IN. Note that it is only the MIDI
IN data that is echoed by MIDI THRU - allthe data generated by the device itself issent via MIDI OUT.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 53/64
MIDI SET UPMIDI SET UP
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 54/64
Structure of MIDI Messages
MIDI messages can be classified into two types:channel messages and system messages.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 55/64
A. Channel messages:
Can have up to 3 bytes:
a) The first byte is the status byte has itsmost significant bit set to 1.
b) The 4 low-order bits identify whichchannel this message belongs to (for 16possible channels).
c) The 3 remaining bits hold the message.For a data byte, the most significant bit is
set to 0.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 56/64
A.1. Voice messages:
a) This type of channel message controls avoice, i.e., sends information specifyingwhich note to play or to turn o, andencodes key pressure.
b) Voice messages are also used tospecify controller effects such as sustain,vibrato, tremolo, and the pitch wheel.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 57/64
A.2. Channel mode messages:
a) Channel mode messages: special case of theControl Change message í> opcode B (themessage is &HBn, or 1011nnnn).
b) However, a Channel Mode message has itsfirst data byte in 121 through 127 (&H79-7F).
c) Channel mode messages determine how aninstrument processes MIDI voice messages:
respond to all messages, respond just to thecorrect channel, don't respond at all, or go overto local control of the instrument.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 58/64
B. System Messages:
a) System messages have no channel number± commands that are not channel specific, suchas timing signals for synchronization,positioning information in pre-recorded MIDI
sequences, and detailed setup information forthe destination device.
b) Opcodes for all system messages start with&HF.
c) System messages are divided into three
classifications, according to their use:
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 59/64
B.3. System exclusive message:
a) After the initial code, a stream of anyspecific messages can be inserted thatapply to their own product.
b) A System Exclusive message issupposed to be terminated by aterminator byte &HF7.
c) The terminator is optional and the datastream may simply be ended by sendingthe status byte of the next message.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 60/64
General MIDI General MIDI is a scheme for standardizing theassignment of instruments to patch numbers. a) A standard percussion map species 47
percussion sounds. b) Where a ³note´ appears on a musical score
determines what percussion instrument is beingstruck: a bongo drum, a cymbal.
c) Other requirements for General MIDIcompatibility: MIDI device must support all 16
channels; a device must be multitimbral (i.e.,each channel can play a differentinstrument/program); a device must bepolyphonic (i.e., each channel is able to playmany voices); and there must be a minimum of
24 dynamically allocated voices.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 61/64
General MIDI Level2: An extendedgeneral MIDI has recently been dened,with a standard .smf ³Standard MIDI File´
format defined - inclusion of extracharacter information, such as karaokelyrics.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 62/64
MIDI to WAV Conversion
Some programs, such as early versions of Premiere, cannot include .mid les -instead, they insist on .wav format les.
a) Various shareware programs exist for
approximating a reasonable conversionbetween MIDI and WAV formats.
b) These programs essentially consist of large lookup files that try to substitutepre-defined or shifted WAV output
for MIDI messages, with inconsistentsuccess.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 63/64
6.3 Quantization and
Transmission of Audio
Coding of Audio: Quantization andtransformation of data are collectively knownas coding of the data.
a) For audio, the -law technique forcompanding audio signals is usually combinedwith an algorithm that exploits the temporalredundancy present in audio signals.
b) Differences in signals between the presentand a past time can reduce the size of signalvalues and also concentrate the histogram of pixel values (differences, now) into a muchsmaller range.
8/2/2019 Audio MM Org Part3
http://slidepdf.com/reader/full/audio-mm-org-part3 64/64
Quantization and Transmission
of Audio
c) The result of reducing the variance of values is that lossless compressionmethods produce a bit stream with shorter
bit lengths for more likely values In general, producing quantized sampled
output for audio is called PCM (Pulse CodeModulation). The differences version is
called DPCM (and a crude but efficientvariant is called DM). The adaptiveversion is called ADPCM.