Digital sampling and voice compression 20 th / Jun/2015

Digital sampling and voice compression

20th/ Jun/2015

2

Department: IT

Project Name: Digital sampling and voice compression

Presenters: 1. Mohammad Ali “Yawari”

2. Mohammad Asif “Dawlatnazar”

3. Aminullah “Fayaz”

4. Sayed Ali Reza “Morowat”

6/20/2015

3

Table of contents: What is sampling? Nyquist Sampling Rate Sampling Theorem Digitizing and compressing speech The CODEC, the VoIP compression workhorse ITU-T standards G.711, G.726 and G.729 A-law and µ-law PCM Summary

6/20/2015

4

What is Sampling?

Sampling is the process of recording the values of a signal at given points in time. For A/D converters, these points in time are equidistant.

The number of samples taken during one second is called the sample rate. Keep in mind that these samples are still analogue values. The mathematic description of the ideal sampling is the multiplication of the signal with a sequence of Dirac pulses.

In real A/D converters the sampling is carried out by a sample-and-hold buffer. The sample-and-hold buffer splits the sample period in a sample time and a hold time.

In case of a voltage being sampled, a capacitor is switched to the input line during the sample time. During the hold time it is detached from the line and keeps its voltage.

5

Nyquist Sampling Rate

The Nyquist Theorem

The number of samples taken per second during the sampling stage, also called the sampling rate, has a significant impact on the quality of digitized signal. The higher the sampling rate is, the better quality it yields; however, a higher sampling rate also generates higher bits per second that must be transmitted. Based on the Nyquist theorem, a signal that is sampled at a rate at least twice the highest frequency of that signal yields enough samples for accurate reconstruction of the signal at the receiving end.

6/20/2015

6

Continue…

The human ear can sense sounds within a frequency range of 20 to 20,000 Hz. Telephone lines were designed to transmit analog signals within the frequency range of 300 to 3400 Hz. The top and bottom frequency levels produced by a human speaker cannot be transmitted over a phone line.

However, the frequencies that are transmitted allow the human on the receiving end to recognize the speaker and sense his/her tone of voice and inflection.

Nyquist proposed that the sampling rate must be twice as much as the highest frequency of the signal to be digitized. At 4000 Hz, which is higher than 3400 Hz (the maximum frequency that a phone line was designed to transmit), based on the Nyquist theorem, the required sampling rate is 8000 samples per second.

5/26/2015

7

Continue..

6/20/2015

Figure 1 - schematic of a Collpits oscillator

Figure 1-1 Effect of Higher Sampling Rate

http://what-when-how.com/wp-content/uploads/2012/03/tmp976_thumb1.jpg

8

Sampling Theorem

6/20/2015

In this section we discuss the celebrated Sampling Theorem, also called the

Shannon Sampling Theorem or the Shannon-Whitaker-Kotelnikov Sampling

Theorem, after the researchers who discovered the result. This result gives

conditions under which a signal can be exactly reconstructed from its samples.

The basic idea is that a signal that changes rapidly will need to be sampled

much faster than a signal that changes slowly, but the sampling theorem for-

malizes this in a clean and elegant way. It is a beautiful example of the power

of frequency domain ideas.

9

Digitizing and compressing speech

The first step of every VoIP connection is digitizing analogue signals to digital packets. This can be done in a number of ways.

The easiest way is to take a fixed sampling rate which is high enough to capture all needed frequencies and divide the signal strength in a number of levels. 8000 Hz and 256 sampling levels are common seen settings.

In this way the signal is scanned by a normal analog digital converter or ADC which samples the data at the fixed frequency with a depth of 8 bits. The data is uncompressed sent to the other party and decoded by a digital to analog converter or DAC.

The combination of 8 kilohertz and a sampling depth of 8 bits is good to replace normal telephony conversations.

10

Continue..

6/20/2015

But good quality comes at a price. Uncompressed sampling of data at this rate generates a continuous stream of data of 8 Kbytes/sec.

This is not a big deal for broadband connections, but it can be too much for remote locations with slower internet connections—or even worse—via a mobile internet connection. Therefore several attempts have been made to reduce the number of kilobytes needed per second to achieve acceptable voice quality.

This can in principle be achieved in several ways. You can reduce the sampling frequency somewhat, but this has a negative effect because higher frequencies get filtered.

According to the Nyquist theorem—which dates back to 1928, long before there was any VoIP or even internet—it is impossible to digitize signals at a sample rate lower than 2 times the highest frequency in the spectrum.

11

Continue.. Lowering the sampling rate to 4000 Hz for example would reduce the maximum

allowed frequency in the analog signal to 2000 Hz, which is way below frequencies which are common in speech, especially from women and children.

So reducing the sampling rate may help somewhat in reducing the bandwidth allocation of the VoIP application, it will only help a small fraction.

Another approach is therefore to reduce the number of bits necessary to store one data sample. As mentioned earlier, 8 bits will give a reasonably high quality encoding of a speech signal.

Reducing the number to 4 would reduce the bandwidth by 50%. Unfortunately this reduction also comes at a price.

With 8 bits, there are 256 possible signal levels. Decoding such a signal back to analog gives a smooth signal where the step from one level to the next is less than 0.5% of the peak-to-peak signal value.

6/20/2015

12

Continue…

6/20/2015

With a 4 bit encoding depth of the signal, only 16 different levels are available. That is not much. Every step in the digital to analog conversion will be 7% which is linearly divided between the maximum possible signal strength peaks.

Imagine that someone is speaking softly on the telephone where his signal strength won't be more than 25% of the peak. In that case the digital encoding is almost binary, giving a Donald Duck alike sound at the receiver's side. Reducing the sampling depth with 50% from 8 to 4 bits gives a quality degradation of a factor 16. That is also not what we want.

13

Continue..

One solution to battle the problem of bad voice quality with low signal volumes is to not divide the signal strength graph in 16 equal levels, but to have more levels around the zero line, and less levels near maximum volume.

A common approach is to use a logarithmic scale rather than a linear scale. Logarithmic scales are not strange in this application, because our ears roughly hear volume difference on a logarithmic scale. Ten times more volume in terms of energy is heard as about two times louder by the human ear.

6/20/2015

14

Continue.. If someone is speaking at low volume, these algorithms

automatically boost the signal and the error of signal quantization is never much more than 7% at 4 bits sample depth.

The best way to reduce the bandwidth needs of a VoIP application is to use a proprietary low-loss compression protocol.

We all know compression from our computer. Applications like ZIP reduce the size of files by analyzing bit patterns and calculating alternative bit patterns and conversion tables that take up less space than the original file.

15

Continue.. Compression techniques as used in ZIP are called no-loss

compression, because it is possible to extract the original files from the compressed version without any loss of information.

Other techniques are low-loss and accept some information to be lost with the gain of extra compression.

Low-loss compressions are often used in picture compression as with the JPG format. The uncompressed version looks like the original, but at close observation you may see artefacts caused by the compression algorithm. This type of algorithm works best if they have been developed with knowledge of the data to compress. There haven been developed specific compression algorithms for voice compression which feature low-loss combined with a very small bandwidth allocation. Compression in mobile phones is one example of them.

6/20/2015

16

The CODEC, the VoIP compression workhorse

With so many different ways digitized speech can be encoded to be sent over a digital line, VoIP applications must know which encoding method is supported by the other party, in order to make a successful connection. This is achieved by letting the encoding and decoding be performed by a standardized piece of hard- or software, the CODEC, or coder decoder. Codecs are used in many applications including video, but we will now focus solely on

codecs that can be used with VoIP .

6/20/2015

17

Continue..

Common codecs in VoIP applications

Name Compression Bitrate (kbps) Application

G.711 A-law and µ-law PCM 64 General telephony

G.726 ADPCM 16, 24, 32, 40International telephony,

DECT

GSM 06.10 FR RPE-LTP 13.2 Original GSM codec

G.729 CS-ACELP 8 VoIP over low speed connection

6/20/2015

ITU-T standards G.711, G.726 and G.729

Standardization is important to let two VoIP applications communicate with each other.

Fortunately the telecommunications sector has always felt the need to standardize protocols and information exchange and the first official organization for this goes back to 1865, the ITU or International Telegraph Union.

This organization became an official United Nations agency in 1947.

The standardization agency of the ITU evolved to the CCITT or Comité Consultative International Téléphonique et Télégraphique in 1956 and was finally renamed to ITU-T in 1993

19

Continue…

The abbreviation CCITT is still used in many places, for example when talking about specific CRC calculation Algorithms.

The ITU-T has defined a number of speech compression algorithms which are used in national and international telephony communications.

All these compression standards are named by the character G followed by a number.

As a rule of thumb you could say that the numbering of the standard gives the sequence of the standards, and that higher numbers in general define standards with more complex compression techniques which require more computational effort than the lower number standards but have a better speech quality to bandwidth ratio.

20

A-law and µ-law PCM

The compression standard G.711 allows two ways of compressing incoming voice data.

These two compression formats are often referred to as A-law and µ-law. Both compression standards use PCM or pulse-code modulation as the base

data sampling method. With PCM the data is sampled at a regular interval. G.711 uses a PCM

frequency of 8 kHz which results in 8000 samples per seconded. Each sample has a bit depth of 13 bits (A-law) or 14 bits (µ-law), which

gives an initial high quality with only small errors present due to the quantization of the signal.

The use of A-law and µ-law compression is mainly geographically defined. In North America and Japan mainly µ-law is used, where A-law is the

standard for the rest of the world. There are also slight algorithmically differences which make A-law easier to implement and less computational resource intensive that its counterpart µ-law.

6/20/2015

21

Summary Summary

Techniques to compress general digital audio signals include-law and adaptive differential pulse code modulation.

These simple approaches apply low-complexity, low-compression, and medium audio quality algorithms to audio signals.

A third technique, the MPEG/audio compression algorithm, is an ISO standard for high-fidelity audio compression.

The MPEG/audio standard has three layers of successive complexity for improved compression performance.

6/20/2015

22

Continue…

6/20/2015

What is sampling?Sampling is the process of recording the values of a signal at given points in time. For A/D converters, these points in time are equidistant.

Nyquist Sampling Rate Quantization process has a set of quantization intervals associated in a one-to-one fashion with a binary code word. Binary codeword corresponds to a discrete amplitude Quantization process introduces a certain amount of error or distortion into the signal samples.

Sampling TheoremIn this section we discuss the celebrated Sampling Theorem, also called theShannon Sampling Theorem or the Shannon-Whitaker-Kotelnikov SamplingTheorem, after the researchers who discovered the result.

23

Continue..

Digitizing and compressing speech

The CODEC, the VoIP compression workhorse

ITU-T standards G.711, G.726 and G.729

A-law and µ-law PCM

24

Thanks for your Attention

6/20/2015

References:1- A. Oppenheim and R. Schafer, Discrete Time Signal Processing (Englewood Cliffs, NJ: Prentice-Hall, 1989): 80-87.2- K. Pohlman, Principles of Digital Audio (Indi-anapolis, IN: Howard W. Sams and Co., 1989).3- http://www.rs-met.com

4- http://www.springer.com/978-1-4471-4904-0

http://www.rs-met.com/



Documents

Digital sampling and voice compression 20 th / Jun/2015