Converting Voice to Packets and Bandwidth Calculation

Converting voice to packets and Bandwidth calculation

Cvoice

Jim Nechleba

Nyquist theorem

The Bell Systems Corporation was trying to find a way to deploy more voice circuits with less wire, because analog voice technology required one pair of wires for each voice line. For organizations requiring many voice circuits, this meant running bundles of cable.

Long ago, Dr. Harry Nyquist (and many others) created a process that allows equipment to convert analog signals (flowing waveforms) into digital format (1s and 0s).

Nyquist found that he could accurately reconstruct audio streams by taking samples that numbered twice the highest audio frequency used in the audio.

• The average human ear is able to hear frequencies from 20–20,000 Hz.• Human speech uses frequencies from 200–9000 Hz.• Telephone channels typically transmit frequencies from 300–3400 Hz.• The Nyquist theorem is able to reproduce frequencies from 300–4000 Hz.

Studies have found that telephone equipment can accurately transmit understandable human conversation by sending only a limited range of frequencies. The telephone channel frequency range (300–3400 Hz) gives you enough sound quality to identify the remote caller and sense their mood.

Nyquist believed that you could accurately reproduce an audio signal by sampling at twice the highest frequency. Because he was after audio frequencies from 300–4000 Hz, it would mean sampling 8000 times (2 * 4000) every second.

A sample is a numeric value. More specifically, in the voice realm, a sample is a numeric value that consumes a single byte of information.

Converting Analog Voice signals to Digital

The sampling device puts an analog waveform against a Y-axis lined with numeric values. This process of converting the analog wave into digital, numeric values is known as quantization.

Because 1 byte of information is only able to represent values 0–255, the quantization of the voice scale is limited to values measuring a maximum peak of +127 and a maximum low of –127.

Positive and negative values are not evenly spaced. This is by design. To achieve a more accurate numeric value (and thus, a more accurate reconstructed signal at the other end), the frequencies more common to voice are tightly packed with numeric values, whereas the “fringe frequencies” on the high and low end of the spectrum are more spaced apart.

The sampling device breaks the 8 binary bits in each byte into two components: a positive/ negative indicator and the numeric representation. The first bit indicates positive or negative, and the remaining seven bits represent the actual numeric value.

Because the first bit is a 1, you read the number as positive. The remaining seven bits represent the number 52. This would be the digital value used for one voice sample.

Nyquist theorem dictates that you need to take 8000 of those samples every single second. Doing the math, figure 8000

samples a second times the 8 bits in each sample, and you get 64,000 bits per second.

The quantization method

G.711 mu-law, G.711 a-law, G.729 codec

There are two forms of the G.711 codec: μ-law (used primary in the United States and Japan) and a-law (used everywhere else).. The quantization method described in the preceding paragraph represents G.711 a-law. G.711 μ-law codes in exactly the opposite way.If you were to take all the 1 bits and make them 0s and take all the 0 bits and make them 1s, you would have the G.711 μ-law equivalent.

Advanced codecs, such as G.729, allow you to compress the number of samples sent and thus use less bandwidth.

This is possible because sampling human voice 8000 times a second produces many samples that are very similar or identical.

The process G.729 (and most other compressed codecs) uses to compress this audio is to send a sound sample once and simply tell the remote device to continue playing that sound for a certain time interval .

This is often described as “building a codebook” of the human voice traveling between the two endpoints. Using this process, G.729 is able to reduce bandwidth down to 8 kbps for each call; a fairly massive reduction in bandwidth.

Unfortunately, chopping the amount of bandwidth down comes with a price.

Quality is usually impacted by the compression process.

Choosing a Voice Codec

When selecting a voice codec for your network, you should ask the following questions regarding the codec:

•How many Digital Signal Processor (DSP) resources does it take to code audio using the codec?

•How much bandwidth does the codec consume?

•How does the codec handle packet loss?

•Does the codec support multiple sample sizes? What are the ramifications of using them?

Calculating Codec bandwidth Requirements

Step 1. Determine the audio bandwidth required for the audio codec itself.Step 2. Determine data link, network, and transport layer overhead.Step 3. Add any additional overhead amounts.Step 4. Add it all together.Step 5. Subtract bandwidth savings measures.


Step 1: Determine the Audio Bandwidth Required for the Audio Codec Itself

To find the amount of bandwidth required for the audio codec, you need to determine the size (in bytes) of audio contained in each packet.This size is directly impacted by the audio sample size contained in each packet.The sample size is a specific time interval of audio. For most audio codecs, the sample size is 20 milliseconds (ms), by default.

Increasing the sample size gives you a bandwidth savings benefit because the router sends fewer packets overall (and fewer packets mean less header information).

The drawback to increasing the sample size is that the overall delay in building the packet is increased. If the two devices communicating already have significant delay between them (due to distance, traffic sharing the link, and so on), the additional coding delay could cause quality of service (QoS) issues.

You can use the following formula to determine the voice payload size:

Bytes_Per_Packet = (Sample_Size * Codec_Bandwidth) / 8

The Sample_Size variable in the formula uses a unit value of seconds and the Codec_Bandwidth variable uses a unit value of bits per second (bps). So, if you had a G.729 call using a 20-ms sample size, the formula would calculate like this:Bytes_Per_Packet = (.02 * 8000) / 8Bytes_Per_Packet = 160 / 8Bytes_Per_Packet = 20


Step 2: Determine Data Link, Network, and Transport Layer Overhead

After you’ve found the amount of voice contained in each packet, you then need to calculate the amount of data contained in the header in each packet. The following values represent the amount of overhead for common data link layer network technologies:

• Ethernet: 20 bytes• Frame Relay: 4–6 bytes• Point-to-Point Protocol (PPP): 6 bytes

At the network and transport layers of the OSI model, the values are fixed amounts:

• IP: 20 bytes• UDP: 8 bytes• Real-time Transport Protocol (RTP): 12 bytes

Step 3: Add Any Additional Overhead Amounts

Additional overhead gets added into the equation primarily if you are using VoIP over a VPN connection. The following are common overhead values based on the type of VPN used:

•GRE/L2TP: 24 bytes• MPLS: 4 bytes• IPsec: 50–57 bytes

Calculating Codec bandwidth RequirementsStep 4: Add It All TogetherWhen you have all the values from the first three steps, you can add them together in a final equation:

Total_Bandwidth = Packet_Size * Packets_Per_Second

Now remember, you’re after the total bandwidth per call. So, first you need to add together the values from Steps 1–3 to form the packet size. For example, if you were using the G.729 codec with a 20-ms sample size over an Ethernet network, the packet size would be as follows:+ 20 bytes (voice payload)+ 20 bytes (IP header)+ 8 bytes (UDP header)+ 12 bytes (RTP header)+ 20 bytes (Ethernet header)-----------------------80 bytes per packet

That gives you one piece of the equation, the packet size with overhead. To find the number of packets per second, some simple reasoning can come into play. Remember, each packet contains a 20-ms sample size, and 1 second is 1000 milliseconds. So, if you take 1000 ms / 20 ms = 50 ms, this helps you find that it will take 50 packets per second to generate the full second of audio. This now give you all the pieces you need to find the final amount of bandwidth per call:

Total_Bandwidth = Packet_Size * Packets_Per_SecondTotal_Bandwidth = 80 bytes * 50 Packets_Per_SecondTotal_Bandwidth = 4000 bytes per secondBecause network engineers do not usually assess network speed in bytes per second, you might want to multiply the final answer by 8 to find the bits per second (because there are 8 bits in a byte):

4000 * 8 = 32,000 bits per second (more commonly written 32 kbps); G729 Codex, one call

Documents

Converting Voice to Packets and Bandwidth Calculation