5
Slide 27 ITNP80: Multimedia Sound-II Slide 28 Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent than for video data rate for CD-quality audio is much less than for video, but still exceeds the capacity of dial-up Internet connections Data rate is 44100*2*2 bytes/sec=176400bytes/s=1.41Mbits/sec 3 minute song recorded in stereo occupies 31Mbytes Sound is difficult to compress using lossless methods complex and unpredictable nature of sound waveforms Different requirements depending on the nature of the sound – speech Music (for a pub or a bar, or for an audiophile?) natural sounds …and on nature of the application Slide 29 Sound compression (II) A simple lossless compression method is to record the length of a period of silence no need to record 44,100 samples of value zero for each second of silence form of run-length encoding in reality this is not lossless, as silence virtually never corresponds to sample values of exactly zero; rather some threshold value is applied Difference between how we perceive sounds and images results in different lossy compression techniques for the two media high spatial frequencies can be discarded in images high sound frequencies, however, are highly significant So what can we discard from sound data? Slide 30 Sound compression (III) Our perception of loudness is essentially logarithmic in the amplitude of a sound Nonlinear quantization techniques provide compression by requiring a smaller sample size (i.e. number of bits) to cover the full range of input than a linear quantization technique Remember.. Linear: levels to which a signal is quantised are linearly spaced logarithmic: provides more resolution at lower levels - idea is to use non-linearly spaced quantisation levels, with higher levels spaced further apart than the low ones, so quieter sounds are represented in greater detail than louder ones This matches the way in which we perceive differences in volume Two main non-linear quantisation schemes: mu-law (μ-law) or A-law This is also a form of data compression -see next! LINEAR quantization SIGNAL value SAMPLE (quantized value) LOGARITHMIC quantization SIGNAL value SAMPLE (quantized value)

ITNP80: Multimedia Sound-II...Slide 27 ITNP80: Multimedia! Sound-II! Slide 28 Sound compression (I) • Compression of sound data requires different techniques from those for graphical

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

  • Slide 27

    ITNP80: Multimedia!Sound-II!

    Slide 28

    Sound compression (I)

    •  Compression of sound data requires different techniques from those for graphical data

    •  Requirements are less stringent than for video –  data rate for CD-quality audio is much less than for video, but still exceeds

    the capacity of dial-up Internet connections •  Data rate is 44100*2*2 bytes/sec=176400bytes/s=1.41Mbits/sec

    –  3 minute song recorded in stereo occupies 31Mbytes •  Sound is difficult to compress using lossless methods

    –  complex and unpredictable nature of sound waveforms

    •  Different requirements depending on the nature of the sound –  speech –  Music (for a pub or a bar, or for an audiophile?) –  natural sounds –  …and on nature of the application

    Slide 29

    Sound compression (II)

    •  A simple lossless compression method is to record the length of a period of silence –  no need to record 44,100 samples of value zero for each second of silence –  form of run-length encoding –  in reality this is not lossless, as silence virtually never corresponds to

    sample values of exactly zero; –  rather some threshold value is applied

    •  Difference between how we perceive sounds and images results in different lossy compression techniques for the two media –  high spatial frequencies can be discarded in images –  high sound frequencies, however, are highly significant

    •  So what can we discard from sound data?

    Slide 30

    Sound compression (III) •  Our perception of loudness is essentially logarithmic in the amplitude of a sound •  Nonlinear quantization techniques provide compression by requiring a smaller sample size

    (i.e. number of bits) to cover the full range of input than a linear quantization technique •  Remember..

    –  Linear: levels to which a signal is quantised are linearly spaced –  logarithmic: provides more resolution at lower levels - idea is to use non-linearly spaced quantisation

    levels, with higher levels spaced further apart than the low ones, so quieter sounds are represented in greater detail than louder ones

    –  This matches the way in which we perceive differences in volume –  Two main non-linear quantisation schemes: mu-law (µ-law) or A-law –  This is also a form of data compression -see next!

    LINEAR quantization

    SIGNAL value S

    AM

    PLE

    (qua

    ntiz

    ed v

    alue

    )

    LOGARITHMIC quantization

    SIGNAL value

    SA

    MP

    LE (q

    uant

    ized

    val

    ue)

  • Slide 31

    Companding

    •  Non-linear quantization developed by telephone companies –  known as companding (compressing/expanding) –  (Equations below are NOT examinable)

    •  mu-law (µ-law)

    •  A-law

    •  Telephone signals are sampled at 8KHz. At this rate, µ-law compression is able to squeeze a dynamic range of 12 bits into just 8 bits, giving a one-third reduction in data-rate.

    y = log(1+µx)log(1+µ)

    for x ≥ 0

    1||/1log1log1

    /1||0log1

    ≤≤+

    +=

    ≤≤+

    =

    xAforAAx

    y

    AxforA

    Axy

    Slide 32

    Adaptive Differential Pulse Code Modulation •  ADPCM: Based on storing the difference between data samples

    –  related to interframe compression of video BUT less straightforward

    •  Audio waveforms change rapidly so no reason to assume that differences between samples are small –  unlike video, where two consecutive frames may be very similar

    •  Differential Pulse Code Modulation (DPCM) computes a predicted value for a sample based on preceding samples –  stores difference between predicted and actual sample –  difference will be small if prediction is good

    •  ADPCM extends this by using an adaptive step (sample) size to obtain further compression –  large sample differences are quantized using large steps (small sample size)

    while small differences are quantized using small steps (large sample size) –  Hence, amount of detail that is preserved, scales with size of difference, and

    like companding, aim is to make efficient use of bits, taking account of rate of change of signal

    Slide 33

    Linear Predictive Coding

    •  Radical approach to compression of speech •  Uses mathematical model of the vocal tract •  Instead of transmitting speech as audio samples, the

    parameters describing the state of the vocal tract are sent •  At the receiving end these parameters are used to reconstruct

    the speech by applying them to the same model •  Achieves very low data rates: 2.4kbps (usable on a poor quality

    telephone line!) •  Speech has a machine-like quality

    –  suitable for accurate transmission of content, but not faithful rendition of a particular voice

    •  Similar in concept (but rather more complicated!) to vector-coding of 2D and 3D graphics

    Slide 34

    Perceptually based compression (I)

    •  Is there data corresponding to sounds we do not perceive in a sound sample? –  If so, then we can discard it, thereby achieving compression

    •  Sound may be too quiet to be heard •  One sound may be obscured by another sound •  Threshold of hearing

    –  minimum level at which sound can be heard –  varies nonlinearly with frequency –  very low or high frequency sounds must be much louder than mid-range

    sounds to be heard –  we are most sensitive to sounds in the frequency range corresponding to

    human speech

    •  Sounds below the threshold can be discarded –  compression algorithm uses a psycho-acoustical model that describes how

    the threshold of hearing varies with frequency

  • Slide 35

    Perceptually based compression (II) •  Loud tones can obscure softer tones that occur roughly at same time

    –  not just a function of relative loudness, but also of the relative frequencies of the two tones

    •  This is known as masking –  modification (raising) of the threshold of hearing curve in the region of a loud

    tone –  hence, sounds normally above unmodified threshold are now no longer heard

    •  Masking can also hide noise as well as other tones –  coarser quantization (smaller sample size = fewer bits) can be used in regions of

    loud sounds (since any resulting quantization noise can be hidden under the loud masking sound)

    •  Actually making use of masking in a compression algorithm is very complicated –  MPEG standards use this sort of compression for the audio tracks of video –  MP3, which is MPEG-1 Layer 3 audio, achieves 10:1 compression, and is

    particularly suitable for compressing songs (which have characteristics of both speech and music)

    –  AAC, from MPEG 4, is even better: approx 128Kbit/second •  Used by Itunes and QuickTime 6. Slide 36

    Sound files and formats

    •  There are many sound file formats Windows PCM waveform (.wav)

    a form of RIFF specification; basically uncompressed data

    Windows ADPCM waveform (.wav) another form of RIFF file, but compressed to 4 bits/channel

    CCITT mu-law (A-law) waveforms (.wav) another form using 8 bit logarithmic compression

    NeXT/SUN file format (.snd, or .au) actually many different varieties: header followed by data data may be in many forms, linear, or mu-law, etc.

    RealAudio (.ra) used for streaming audio; a compressed format.

    MPEG format (includes MP3, AAC) has various different forms of compression

    QuickTime, AVI and Shockwave Flash can include audio as well as video

    Slide 37

    Examples of sizes

    •  A 2.739 second stretch of sound, digitised at 22050 samples/second, 16 bits, mono takes –  120856 bytes as a .snd –  120876 bytes as an uncompressed .wav –  60474 bytes as A-law .wav –  30658 bytes as an compressed .wav –  5860 bytes as RealAudio –  321736 bytes as ASCII text

    •  …and if you listen hard you can hear the difference!

    Slide 38

    Another example: speech •  Original (64000 bps) This is the original speech signal

    sampled at 8000 samples/second and u-law quantized at 8 bits/sample.

    •  ADPCM (32000 bps) This is speech compressed using the Adaptive Differential Pulse Coded Modulation (ADPCM) scheme. The bit rate is 4 bits/sample (compression ratio of 2:1).

    •  LPC10 (2400 bps) This is speech compressed using the Linear Predictive Coding (LPC10) scheme. The bit rate is 0.3 bits/sample (compression ratio of 26.6:1).

  • Slide 39

    Music (I)

    •  So far we have considered digitizing sounds recorded from the real world

    •  To store and transmit natural sounds it is generally necessary to use digitized recordings of the real sounds

    •  But we have seen that speech can be specified as states of the vocal tract

    •  Music can also be specified: musical scores •  We can send a piece of music to someone as either

    –  a recording of an actual performance –  or some notation of the score, provided the receiver has some means of

    recreating the music from the score e.g. they can play it on a piano etc

    •  This is akin to bitmapped versus vector-based graphics

    Slide 40

    Music (II)

    •  Automated music production –  pianolas: player pianos

    •  Synthesizers –  electronic instruments for producing many different sounds, usually

    controlled via a keyboard

    •  Many sound cards in PCs can synthesize sounds •  Automation requires an appropriate language for specifying

    sounds –  require software which can specify and interpret the music scores (e.g.

    note and its duration etc) –  also need some means of producing sounds that correspond to the

    appropriate musical instruments

    •  MIDI does this and more...

    Slide 41

    MIDI (I) •  Musical Instruments Digital Interface (MIDI) •  Enables people to use multimedia computers and electronic

    musical instruments to create, enjoy and learn about music •  Resynthesis instead of reproduction

    –  like a player piano, instead of a CD player

    •  Consists of commands which result in notes being played (synthesised)

    Slide 42

    MIDI (II)

    •  MIDI files are a means of communicating music •  Messages sent can define

    –  note on/off –  pitch of note –  pitch bend –  other messages include selecting which electronic musical instrument to

    play, mixing and panning sound etc. –  control change –  timbre select

    •  Synthesisers which use FM or wavetables can also be controlled •  For more information see http://www.midi.org •  Data rate is typically 0.1% of high quality digitised sound

    –  but no guarantees that the music produced is exactly as the composer intended

    –  instrument voices will vary with quality of sound hardware

    •  Not useful for speech or vocal music

  • Slide 43

    End of Lecture

    •  Next lecture (last one on sound) will consider the use of sound in multimedia and HCI