INTERNET PROTOCOL (IP) SPEAKERPHONE REFERENCE DESIGN

INTERNET PROTOCOL (IP) SPEAKERPHONE

REFERENCE DESIGN

Khosrow Mossarmen-Amini

A REPORT SUBMllTED IN PARTIAL FULLFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF ENGINEERING

In the School of

Engineering Science

O Khosrow Mossannen-Amini 2006

SIMON FRASER 1JNlVERSlTY

Spring 2006

All rights reserved. This work may not be reproduced in whole or in part, by photocopy

or other means, without the permission of the author.

SIMON FRASER V ~ ~ l v E , t d i brary &&

DECLARATION OF PARTIAL COPYRIGHT LICENCE

The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its user:;.

The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.

The author has further agreed that permission for multiple copying of this work .for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.

It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.

Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.

The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.

Simon Fraser University Library Burnaby, BC, Canada

Approval

Name: Khosrow Mossannen-Amini

Title of Project: Internet Protocol (IP) Speakerphone Reference Design

Degree: Master of Engineering

Examining Committee

Chair: Dr. Bonnie Gray Assistant Professor of School of Engineering Science

Dr. Stephen Hardy Senior Supervisor Professor of School of Engineering Science

Warren Tam Supervisor CPD Applications Leader of PMC-Sierra, Inc.

Dr. Tejinder S. Randhawa Internal Examiner Adjunct Professor of School of Engineering Science

Date DefendedIApproved: 300. /6p6

IP SPEAKERPHONE REFERENCE DESIGN ii

Abstract

This engineering project, undertaken in PMC-Sierra, Inc., is a paper

reference design that describes the scope and the deliverables required for a

wired lnternet Protocol (IP) phone with speakerphone-enabled functionality. This

reference design can assist engineers in designing an IP phone and therefore

allows them to more quickly bring their designs to market.

Currently a wireless (WiFi) IP phone kit has been built in a PMC-Sierra

design centre in China and will be production released by January 2006.

This report includes a set of schematics and BOM as a paper reference

design of wired lnternet Protocol (IP) phone with speakerphone-enabled

functionality based on the PMC-Sierra MSP2020 multi-service microprocessor.

The schematics and BOM, with an optional interface to an analog FAX machine

to support Fax over IP (FolP), can be downloaded from PMC-Sierra's web site

(www.pmc-sierra.com) with an advanced permission from PMC-Sierra, Inc.

Keywords: Codec, Echo Canceller, FolP, Internet, VolP

IP SPEAKERPHONE REFERENCE DESIGN iii

Acknowledgments

I would like to thank the following irldividuals for their support, feedback,

and guidance throughout this project:

Warren Tam (technical supervisor) CPD Applications, Leader PMC-Sierra, Inc.

Rob ltcush CPD Applications, Manager PMC-Sierra, Inc.

Dr. Stephen Hardy (academic supervisor) Professor, School of Engineering Science, SFU

Dr. Tejinder S. Randhawa (committee member) Adjunct Professor School of Engineering Science, SFU

Hassen Karaa Co-op Applications Engineer PMC-Sierra, Inc.

--

IP SPEAKERPHONE REFERENCE DESIGN iv

Table of Contents

.. Approval .............................................................................................................. 11

... Abstract .............................................................................................................. III

Acknowledgments ............................................................................................. iv

Table of Contents ................................................................................................ v

.. List of Figures ............................................ ........................................................ VII

... List of Tables .................................................................................................... VIII

Introduction ..................................................................................................... 1

......................................................................................... . 1 1 I P Telephony 1

.............................................................................. 1.2 Voice over I P (Vol P) 1

............................................................................................ 1.3 FaxoverIP 3

1.4 Voice Processing Module ....................................................................... 4

1.5 Latency in VolP Networks .................................................................... 25 1.6 Jitter Buffer .......................................................................................... 26

Features ........................................................................................................ 31

2.1 Hardware Features for the IP Phone ................................................... 31

IP Phone Design with MSP2020 .................................................................. 32

Block Diagram ............................................................................................... 35

Functional Description ................................................................................ 36

5.1 MSP2020 ............................................................................................. 36

................................................................................................ 5.2 Memory 37 5.3 ATH3100 ............................................................................................. 37

5.4 FXSlnterface .............................. ., ...................................................... 38 .......................................................................................... 5.5 WAN Uplink 39

...................................................................................... 5.6 Power Supply 39

Circuit Design Considerations ................................................................... 40

6.1 MSP2020 Circuit Design ...................................................................... 40 6.1.1 Power Requirements and Supply Filtering (Page 2 of

Schematics) ................................................................................. 40

... IP SPEAKERPHONE REFERENCE DESIGN V

TDM interface to ATH3100 (Pages 2. 3 and 4 of schematics) ....... 42 ELB lnterface to Flash Memory (Pages 2 and 5 of

.................................................................................. schematics) -44 SDRAM Interface (Pages 2 and 5 of schematics) .......................... 47 MI1 (1 011 00 Ethernet) Interface (Pages 2 and 7 of schematics) ..... 49 TDM Interface to SLICISLAC (Pages 2 and 6 of schematics) ....... 52 SPIIMPI Interface (Pages 2 and 6 of schematics) ......................... 52 Serial Interface (Page 1 of schematics) ......................................... 53 JTAG Interface (Page 1 of schematics) ......................................... 54 GPlO Allocation (Page 1 of schematics) ....................................... 54 Keypad Interface (Page 1 of schematics) ...................................... 55 LCD Interface (Page 1 of schematics) ......................................... 56

................................................................................. Unused Pins -57 ................... Power Supply Circuit Design (Page 8 of schematics) 58

................................................................... Thermal Management 58 ......................................................................... Simulation Models 58

................................................................... 7 Layout Design Considerations 59 ................................................................................ 7.1 MSP2020 Layout 59

...................................................................................... 7.1 . 1 Placement 59 ................................................................ 7.1.2 SDRAM Interface Layout 60

.................................................................. 7.1.3 Flash Memory Interface 62 .................................. 7.1.4 MI1 (1 011 00 Fast Ethernet) Interface Layout 63

..................................................................... 7.1.5 TDM Interface Layout 63 ...................................................................... 7.1.6 FXS Interface Layout 64

............................................................................. 7.1.7 PLL Filter Layout 64 ................................................................... 7.1.8 Audio Interface Layout -64

8 Conclusion ................................................................................................... 65

9 Disclaimer ...................................................................................................... 67

Appendix: Jitter Buffer Performance Results ................................................. 68

Acronym List ...................................................................................................... 75

......................................................................................................... References 77

-.

lP SPEAKERPHONE REFERENCE DESIGN vi

List of Figures

Figure 1 :

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Figure 9:

VPM Block Diagrams ..................................................................................... 5

Echo Cancellation .......................................................................................... 8

Typical Fax Call Sequence of Events ........................................................... 21

IP Phone ...................................................................................................... 32

IP Speakerphone Block Diagram ................................................................. 35

MSP2020 PLL Decoupling Circuitry ............................................................. 41

MSP2020 External Clock Circuitry ............................................................... 41

Connection between the MSP2020 and the ATH3100 ................................. 43

Connection Between MSP2020 and (x8) Flash Memory Devices ................. 44

Figure 10: MSP2020 and One 256Mbit (x32) SDRAM Memory Device ......................... 47

Figure 1 1 : . MSP2020 and Ethernet PHY Connection ..................................................... 50

Figure 12: SLICISLAC Interface with TDM and SPIIMPI Interface ............................... 52

Figure 13: Serial Port Interface to MSP2020 ................................................................. 53

Figure 14: MSP2020 and LCD Connection ................................................................... 56

Figure 15: Top View of MSP2020 with Locations of Signal Groups ............................... 60

Figure 16: SDRAM Clock with 50 ohm Termination ...................................................... 61

Figure 17: SDRAM Clock min results ........................................................................... 62

Figure 18: SDRAM Clock max results ........................................................................... 62

...

lP SPEAKERPHONE REFERENCE DESIGN vii

List of Tables

Table 1 :

Table 2:

Table 3:

Table 4:

Table 5:

Table 6:

Table 7:

Table 8:

Table 9:

Table 10:

Table 1 1 :

Table 12:

Table 13:

Echo Canceller Disabling Tones ................................................................... 11

..................................... Codec Standards Supported by the VPM Firmware 14

................. High and Low Frequency Tone Combinations for Keypad Digits 16

Parameters for Generating DTMF Digits ...................................................... 16

......................................................................... Fax Session Ending Events 20

....................................................... Ring Cadences in Regions of the World 23

Buffer Chain Size Restrictions ...................................................................... 24

................................................................... Maximum Current Consumption 39

Flash Read Timing ....................................................................................... 45

....................................................................................... Flash Write Timing 46

SDRAM Timing ............................................................................................ 48

.................................................................................................... MI I Timing 50

.............................. GPlO Allocation for IP Phone Design with FAX Support 54

...

IP SPEAKERPHONE REFERENCE DESIGN viii

1 Introduction

1.1 IP Telephony

An IP phone is a broadband hard phone, a self contained IP telephone

that looks just like a conventional phone but instead of a conventional phone

jack, it has an Ethernet port through which it communicates directly with a Voice

over Internet Protocol (VolP) server, VolP gateway, VolP Analog Terminal Adaptor

(ATA) or another VolP phone.

IP telephony is the technology for transmitting voice communications over

a network using open-standards-based IP. IP phones combine the functions of a

traditional telephone with an Ethernet connection. Since an IP phone

communicates directly with a VolP based system, neither it does require any

personal computer nor any software running on a personal computer to make or

receive VolP phone calls. It can be used independently; all that is required is an

internet connection.

1.2 Voice over IP (VolP)

Voice over IP (VolP) uses the IP to transmit voice as packets over an IP

network, so VOlP can be achieved on any data network that uses IP, like the

Internet, lntranets and Local Area Networks (LAN).

In VolP, the analog voice signal is digitized, compressed and converted to

IP packets and then transmitted over the IP network. Signaling protocol:; are

IP SPEAKERRPHONE REFERENCE DESIGN 1

used to set up and tear down calls, carry information required to locate users and

negotiate capabilities. To setup a VolP phone system, the following main steps

should be done:

1. A low pass filter filters out the high frequency components from

speech spectrum. The human speech spectrum contains

frequencies beyond 12 KHz. The narrow band telephones are

designed to eliminate frequencies above 3.4 KHz although

nominally the voice band is 4 KHz. The wideband telephones

eliminate frequencies above 7 KHz.

2. An ADC (analogue to digital converter) converts analogue voice

signals to digital signals using Pulse Code Modulation (PCM).

Analog speech signals are sampled at 8 KHz for narrow band PCM

applications and 16 KHz for wideband PCM. The digitization

process measures the analog signal at each sample time and

produces a digital binary code value representing the in~t~antaneous

amplitude (quantization). The quantization error can be reduced

easily using a sample and store circuit prior to the ADC. For most

telephony applications, speech coders are designed to have a

signal-to-noise ratio (SNR) above 30 dB over most of their range.

3. The bits are compressed in a standard format (to reduce ithe bit rate

while keeping voice quality good at an acceptable level) for


transmission using various ITIJ-T CODEC protocols such as G.711,

G.723, G.726, or G.729.

4. The voice packets are inserted in data packets using a real-time

protocol, typically RTP over UDP over IP.

5. A signaling protocol is used to activate and coordinate the various

components to complete a call. Signaling is accomplished by the

exchange of IP datagram messages between the components. The

format of these messages is covered by any number of standard

protocols such as IETF SIP or ITU-T H323.

6. At the receive end, packets are disassembled and data is extracted,

then converted to analogue voice signals and sent to the receive IP

phone's speaker.

1.3 Fax over IP

Currently, there are two ways to implement Fax over IP (FolP). The first

method is based on ITU T.37 standard and is used mainly for store-and-forward

faxing. It defines elements of how Internet email can be adapted to support a

facsimile service and specifies the format in which fax is to be delivered as an e-

mail attachment.

The second method is based on ITU T.38 standard, a protocol for real-time

delivery of FolP. With T.38 real-time FolP, faxes are delivered in real-time exactly

like a regular fax call. Two fax machines should first establish a connection


(synch up) and then send data over a local telephone connection, with ,an IP

network between the two local connections. If the fax is busy, the caller gets a

busy signal and the user has the option to retry sending later or to revert to store-

and-forward mode as a transportation mechanism. A key point is that thle

confirmation takes place during the T.38 fax session, not at a later point.

The fax sent by a fax machine will be T.30 end to end. When that fax hits

the IP subsystem of the phone, the MSP2020 subsystem of the IP phone (the

hardware, the firmware and the software) will encapsulate the T.30 protlocol data

in the T.38 packets, which are then sent to the IP network through the Elhernet

connection. At the other end, the IP phone will extract the T.30 protocol data from

the T.38 packets. Thus, the fax call is T.30 end-to-end. This is different from G.711

pass-through, which is an option on many IP gateways, ATAs, and phones.

The current implementations for real-time FolP via T.38 supports a

maximum of V.17 (14.4 Kbps) fax. An updated version of T.38 supports V.34

(33.6 Kbps) fax operations over IP has beer) standardized, but with no major

implementations to date.

1.4 Voice Processing Module

Beside the above steps in VolP systems, so much work is needed in

firmware and software to ensure an acceptable voice quality transmission over IP

network. Figure 1 illustrated the major firmware functions used in a VolP system

known as Voice Processing Module (VPM).


Controls GIPSlnon-GIPS)

I I

I Packetiratlon Gelwrallm

i Controls (C.P-TG)

I Packrt TDM

data

Gam Out - (GR)

Recewe Path

cross Connects . Gam In -

Voic* Procrssinp Engine : Softwarr Archit*cturr

Intelligent Signal

Classifier

Figure 1 : VPM Block Diagrams

Some of the functions in the VPM that relates to the operation of an IP

phone are described in the sub-sections berow:

Echo Canceller

-- IP SPEAKERRPHONE REFERENCE DESIGN 5

Echo is a delayed, slightly altered version of a speech signal that is

reflected back to the speaker. This reflection in the line echo (LE) is due to

impedance mismatches at the hybrid circuit, the interface between the four-wire

network between central off ices and the two-wire network that connects

individual subscribers to the central office. In the acoustic echo (AE), which

occurs in speakerphone applications, the echo is generated by feeding back of

the reflections of the audio signal coming out of the speaker to the microphone.

In any telephony application, echo can degrade or even prevent effective

communication if it is not effectively cancelled.

Echo cancellation is a process that removes the echo from the sugnal that

is transmitted back to the speaker. According to ITU G.168 [6] standard, the echo

cancellers are devices or modules that use adaptive signal processing to reduce

or eliminate echoes. They cancel or reduce the echo by subtracting an estimate

of the echo from the returned echo signal. The echo cancellers commonly use

one of the adaptive algorithms such as Least Mean Square (LMS) or Normalized

Mean Square (NLMS).

PSTN specifications require echo to be cancelled in any network where

the round-trip delay from source to destination and back again is longer than 50

ms. Since VolP networks almost always introduce more delay than this in

packetization and transmission, most VolP gateways that connect to the PSTN

network will need to provide echo cancellation.


Echo cancellers are typically placed as close as possible to the hybrid that

causes the echo; see Figure 2. The echo canceller stores samples of tlhe

incoming (received) speech signal, then uses the samples to calculate i3n

estimate of the echo signal that will be reflected back to the far-end speaker.

This estimated echo is subtracted from the local signal that is transmitted to the

far end. In this way, normal speech from the near-end speaker is transmitted to

the far end, but the echo of the far-end speaker's voice is removed.

Echo Tail Length

The tail circuit shown in Figure 2 is all the equipment between thle voice

gateway and the telephone: all the switches, multiplexers, cabling, and so on. To

effectively cancel the echo produced by the hybrid, the echo canceller rnust be

able to process samples stored over a period of time at least as long as the

round-trip delay through this tail circuit. This processing period is called the echo

canceller tail length, and is an important parameter in the performance of the

echo canceller. If the echo tail length is too short (too few samples are stored),

then the echo canceller cannot adequately remove the entire echo from1 the

received signal. However, setting the echo tail length too high wastes memory

and processor cycles and can degrade the overall performance of the processor.

The echo tail length should be set to the round-trip delay through the tail

circuit, plus 4 to 6 ms. In residential and SOH0 VolP gateway applications, the

echo tail length is typically set to 8 ms or 16 ms. In speakerphone applications,

the echo tail length is typically set to 64 ms.


. T ~ I aruit (~cho P ~ V I ) ~

Figure 2: Echo Cancellation

The number of the adaptive filter taps used in an echo canceller algorithm

is tail-length dependent:

1 6 ms tail-length, 128 taps

32 ms tail-length, 256 taps

64 ms tail-length, 512 taps

Non-Linear Processing and Comfort Noise Generation

Echo cancellers do not perfectly remove the entire echo from the

transmitted signal; some residual echo remains after echo cancellation. To

maximize echo cancellation, ITU-T G.168 [6] specifies non-linear processing

(NLP) to completely suppress the signal sent to the far end when the near-end

speaker is silent. When NLP is enabled, the echo canceller classifies short

segments of the near-end speech signal as either voice or background noise.

When it determines that the signal is background noise, the echo canceller turns


on the non-linear processor to eliminate the residual echo that would olherwise

be reflected back to the far-end speaker.

Muting the residual echo also mutes the background noise from the near

end. As a result, the far-end speaker will hear the background noise pulsing on

and off as the non-linear processor activates and deactivates. These transitions

degrade the perceived quality of the call, so echo cancellers use comfort noise

generation (CNG) in conjunction with NLP. When the non-linear processor is

active, a comfort noise generator replaces the muted residual echo with a

synthesized noise signal of the same level and similar spectral content as the

background noise.

The codecs also use voice activity detection and comfort noise generation

to suppress packets when a speaker is silent. These codec-based functions are

independent of the echo canceller functions described here, and there are subtle

differences in operation:

Echo-canceller NLP and CNG: When the near-end speaker is

silent, the near-end echo canceller supresses the transmitted echo

signal, and replaces the background noise (which is also

suppressed) with comfort noise. The near-end codec sends

packets filled with comfort noise to the IP network.

Codec-based VAD and CNG: When the near end speaker is silent,

the near-end codec sends no packets at all to the IP network. The

far-end codec generates comfort noise toward the TDM interface to

replace the packets.


Double-Talk Detection

Most echo cancellers continually adapt their internal digital filters to the

conditions present in the tail circuit to provide an accurate estimate of the echo.

They continually compare the estimated echo signal with the echo that is actually

reflected from the near end of the connection, and refine their internal settings

accordingly.

However, in order for the echo canceller to converge on internal settings

that produce an accurate echo estimate, the near-end speaker must be silent

while the far-end speaker is talking. If the near-end speaker is talking at the

same time, a situation known as double-talk, the near-end speech disrupts the

adaptation process. The estimated echo diverges and the echo canceller

performs poorly, which degrades the perceived quality of the call.

To prevent problems, echo cancellers use a double-talk detector to

indicate when both parties are speaking at the same time. When this occurs, the

echo canceller effectively suspends the adaptive algorithm and "freezes;" its

internal settings so that it cannot diverge during the double-talk condition. When

the double-talk condition goes away, the adaptive algorithm starts up again to

keep the echo canceller converged to an accurate estimate of the echo, and to

respond to any changes in the tail circuit.

Echo Canceller Tone Disabling

Echo canceller tone disabling refers to special tones that are used to

automatically disable the echo canceller for fax or modem transmissions. Echo


cancellation can interfere with fax and modem transmissions, and in many cases

must be disabled when a connection is carrying fax or modem data. To

accomplish this automatically, fax machines and modems transmit special tones

at the beginning of their transmission. When the echo canceller detects these

tones, it automatically disables itself so that the fax or data transmission can

proceed unhindered.

Table 1 shows the three types of tones used to automatically disable the

echo canceller.

Table 1 : Echo Canceller Disabling Tones

I Tone I Description I 2100 Hz without The echo canceller is automatically disabled when it detects a phase reversal 2100-Hz tone transmitted for 3 s I 2100 Hz reversal

with phase

2100 Hz with or without phase reversal

The echo canceller is automatically disabled when it detects a 21 00-Hz tone transmitted for 4 s, with a 180" phase reversal every 450 ms - The echo canceller is disabled when it detects a 2100-Hz: -4 tone with phase reversals. I The nonlinear processor is disabled when it detects a 21 00-Hz tone without phase reversals.

Calibration

Calibration stands for calibrating or normalizing the analog front and signal

gain stages. A signal represented by a digital sequence of numbers can mean

saturation at the analog stage of the signal path, or it can represent a very analog

small signal with insufficient SNR.

The calibration function is a digital signal generator on a TDM output path.

When it is turned on, a digital sequence defining a 1 KHz sine wave signal at 0


dBm nominal level, as specified in the ITU-T recommendation G.711, is sent out

on the TDM output port. Calibration is performed in both input and output paths

of a TDM port.

Calibrations on the TDM output path: The designer can turn on the

calibration mode, with the appropriate loading in the circuit set up on pclrt-0

output, measurements and adjustments can then be made to ensure that the

analog signal ended up at a known and desired level.

Calibrations on the TDM input path: The designer must first completed the

calibration on the output path to know the gain at the output. Then, the calibration

mode is then turned off so no calibration is sent out from the processor. The TDM

loop-back mode is set up such that the digital signal input from TDM port 0 is

transferred directly back to the output. With a known analog signal injected at the

input, the measurements and adjustments can then be made along the input path

to ensure that the analog signal ended up at a known and desired level at the

measurement point.

Voice Activity Detection and Comfort Noise

On average, up to 50% of human speech may be periods of silence. If an

application transmits packets continuously during a call, even when a speaker is

not talking, it uses up a lot of bandwidth sending packets that do not contain any

speech information. Suppressing packet transmission while a caller is not

speaking can therefore realize significant improvements in bandwidth efficiency.


Codec-based voice activity detection (VAD) allows the gateway to

suppress packets when the near-end speaker is silent. It classifies short

segments of the voice signal as either speech or background noise, based on the

level and spectral content of the signal. When VAD indicates backgrouind noise,

the application does not send any packets, .so it does not waste bandwidth

transmitting packets that do not contain any useful information.

However, suppressing the packets means that background noise is not

being transmitted. This results in silence on the line, which can cause a listener

at the far end to believe that the line has gone dead. Since this can be

disconcerting and degrades the perceived call quality, the gateway at the

listener's end generates comfort noise whenever it is not receiving packets from

the IP network. Comfort noise is a synthesized noise signal of the same level

and similar spectral content as background noise. To the listener, comfort noise

generation (CNG) results in no noticeable transition between speech and silence

at the speaker's end.

Codecs

Codecs compress a digitized voice signal into a lower-bandwidth format

that can be transported across the IP network. The output of a codec is a data

stream that is placed into packets and transported across the IP network. At the

receiving end, a codec performs the reverse process to decompress the data and

extract the digitized voice signal.


Table 2 summarizes the codec standards that are supported by Ihe VPM

firmware. Codecs with a lower output bit rate (output bandwidth) typically require

more time and processing power to convert the analog voice signal into a digital

signal, which adds to the latency in VolP communications, and generally produce

a lower-quality speech signal after reconverting the digital signal.

Table 2: Codec Standards Supported by the VPM Firmware

Codec Standard Encoding

I G . 7 2 9 ~ ~ ~ I CS-ACELP 1 8 kbitfs 1 10ms

G.711 p-Law and A-L~W'S*

~ . 7 2 6 ~

1 G.723-53 and G.723-63 Multirate 5.3 kbitfs or 6.3 I CELP I kbitfs

Output Bandwidth

Note:

1. G.711 p-law encoding is used in North America and Japan; A-law is used in

Minimurn Supported Frame Size

PCM - ADPCM

Europe and the rest of the world.

2. G.711 is the only standard that can tre used with T.38 FAX Relay.

64 kbitfs

32 kbitfs

3. G.726 codecs are supported on the MSP4200 device only.

10 ms

10 ms

4. G.729NB uses a relatively low-complexity conversion algorithm and includes

voice-activity detection and comfort-noise generation.

TDM Companding

The VPM firmware supports both p-law and A-law companding for

compressing and decompressing voice traffic on the TDM interface. The two are

very similar; both are logarithmic compandirig schemes defined by ITU-T G.711


that compress 16-bit linear data into eight-bit logarithmic data. Logarithmic

companding breaks the amplitude of a voice signal into 16 segments and

encodes each segment as an eight-bit value. The four most significant bits

identify the segment, and the four least significant bits quantize the value of the

amplitude within the segment.

Each segment is twice the size of the segment below it. As a result the

lower amplitudes, which contain most of the information in speech, are split into

smaller segments (i.e. have higher bit resolution) than higher amplitudes, but the

dynamic range is wide enough to encode high-amplitude signals. Logarithmic

companding provides 2:1 bit compression without requiring too much processing

power to decode.

The differences between the two schemes are in the actual coding levels

and in bit inversion. p-law encoding is used in North America and Japan for voice

traffic; a-law is used in Europe and the rest of the world.

DTMF Digits

Dual-Tone Multi-Frequency (DTMF) digits use a set of four high-frequency

tones and a set of four low-frequency tones to uniquely identify each of the 16

digits on a telephone keypad. Each keypad digit is represented by two tones,

one from each set. When a telephone user presses the digit on the keypad, the

telephone generates a sinusoidal signal comprising the high-frequency tone and

the low-frequency tone that represent that digit. Table 3 shows the combinations

of high- and low-frequency tones for each keypad digit.


Table 3: High and Low Frequency Tone Combinations for Keypad Digits

High-Frequency Tones

1209 Hz 1336 Hz I 1477 Hz I 1633 Hz

A VolP application needs to collect DTMF digits that are dialed by a

telephone user and convert the dialed number to an IP address for the call. In

the opposite direction, the VolP application must be able to generate DTMF

tones toward the TDM interface to control an end system, such as an Interactive

Voice Response (IVR), voice mail, or calling-card system.

Certain applications, such as Interactive Voice Response systern~s and

calling card systems, may need to identify DTMF digits that persist for more than

two seconds. Such events are referred to as "long" DTMF digits.

Generating (Playing) DTMF Digits

An application may need to generate DTMF tones toward the TDM

interface to control an end system, such as an lnteractive Voice Resporlse (IVR),

voice mail, or calling-card system. The tones must be generated in such a way

that a DTMF detector in the end system can correctly interpret them. This

scheme requires setting the four parameters shown in Table 4.

Table 4: Parameters for Generating DTMF Digits

I Definition I Min I Max I Units 1 Tone duration: Length of time that a digit persists 1 65' 1 NIA I ms I

I Inter-digit pause: Silent period between digits -- 1 65' I NIA I 1


Power level of the digit's low-frequency tone -- -1 2 1+12 1dB I

Definition -

Power level of the digit's high-frequency tone

Note:

1. As specified in ITU-T (2.23 ([7])

Detecting and Collecting DTMF Digits

When a user presses digits on a telephone keypad, the application

software must detect the event and collect the digits for further processing. The

VPM firmware handles DTMF digits as unsolicited events. Each DTMF digit is

processed as two events, one for the start of the digit and one for the end event.

If the DTMF digit persists for more than 2 seconds, a third event indicates the

end of the long DTMF digit.

Min

-1 2

Enabling DTMF Relay

DTMF relay provides an out-of-band signaling mechanism for carrying

DTMF digits across a VolP infrastructure. This is important in applicatiolns that

use a low bit-rate codec but that must send DTMF digits across the Vol13

network. If the DTMF tones are compressed by a low bit-rate codec such as

G.723 or G.729, the tones are distorted to the point that digits may be lost.

When DTMF relay is enabled, the local gateway listens for DTMF digits

during a call then sends them uncompressed as either RTP or H.425 packets to

the remote gateway, which regenerates thern. This method prevents digit loss

due to compression in low bit-rate codecs.

Max

+12


Units

dB

T.38 Fax Relay

In T.38 Fax Relay, a call manager protocol such as SIP connects two

media endpoints together in a fax call. The endpoint in a fax call that transmits

the fax is referred to as the sending fax relay; the endpoint that receive:; the fax

is the receiving fax relay.

Every fax session starts out as a voice call and then switches to a fax call

when the call classifier at the sending fax relay detects that the call is a fax

transmission. The sequence of events is outlined below and shown in Figure 3.

The call manager protocols at both ends of the call connect the two

endpoints using a voice connection.

The call classifier on the receiving fax relay detects the start of a fax call'

The call managers shut down the voice call and set up a fax call using

G.711 and different TCPIIP ports than those used by the voice connection. It is

important to use different ports because the voice and fax data streams use

different packet formats and cannot be mixed. The call manager at each

endpoint negotiates the parameters for the fax call, including port numbers,

connection rate, and connection mode, based on the capabilities of the endpoint

and passes the negotiated parameters to the T.38 module.

The T.38 module receives PCM data from the fax machine and

repackages it as T.38 packets that can be sent over an IP connection. In the

1 This is the most common procedure. However, either endpoint can detect the start of a fax call.


reverse direction, the T.38 module receives data packets from the IP network and

converts them to PCM data that the fax machine can understand. The fax

connection remains active until the receiving fax relay detects EOF. To avoid

data loss, both call managers wait until the receiving fax relay detects EOF

before tearing down the connection.

Both the T.38 module and the TCPIIP network stack buffer data internally,

which means the call managers need to be careful not to shut down the fax

session too early. If the call manager at either end of the connection sh~uts down

the fax session while there is still data in a buffer, the receiving fax relay will not

be able to receive the fax correctly. To avoid problems, the receiving fax relay is

responsible for telling the sending fax relay that the fax session has ended, which

it does via the call manager protocol. The sending fax relay never tells the

receiving fax relay that the session has ended, even under error conditilons.

The sending and receiving fax relays process ending events as shown in

Table 5.


Table 5: Fax Session Ending Events

Event

On-Hook Event

Fax EOP Event

Network Socket Disconnect Event

Call Manager Disconnect Event

When Received by the Sending Fax Relay:

Indicates that the sending fax machine has gone on-hook after completing a fax transmission.

There may still be fax data buffered in the T.38 module and the network stack. The call manager at the sending fax relay therefore does nothing. Eventually the receiving fax relay will detect the end of session and send a Call Manager End of Call Event back to the sending fax relay.

lndicates that the sending fax relay has finished processing data and the fax session has ended.

There may still be fax data buffered in the T.38 module and the network stack. The call manager at the sending fax relay therefore does nothing. Eventually the receiving fax relay will detect the end of session and send a. Call Manager End of Call Event back, to the sending fax relay.

lndicates that the TCP has detected (a network disconnect from the receiving fax relay.

The call manager discards any T.38 packets it has buffered, because the receiving side has terminated the connection and therefore cannot process more T.38 data. However, it does not disable T.38 because some fax data may still be in the T.38 module. Eventually the receiving fax relay will detect the end of session and send a Call Manager End of Call Event back to the sending fax relay.

Note: A UDP connection has no way to detect a network disconnect from the receiving fax relay.

lndicates that the receiving fax relay wishes to terminate the call.

The call manager tears down the call because the receiving side has indicated that the fax session is over.

This is the normal way that a fax session should end.

When Received by the Receiving Fax Relay:

lndicates that the receiving fax machine has gone on-hook after completing a fax reception.

The call manager records that the on- hook event has been received and starts a long-term timer (> 10 sec) to ensure that the call is eventually torn down even in error conditions.

Otherwise the receiving side should do nothing with the on-hook event because there may be data buffered in the T.38 module and the network stack.

lndicates that the receiving fax relay has finished processing data and fax session has ended.

There may still be fax data buffered in the networks stack, so the call manager waits at least 500ms then tears down the call and sends a Call Manager End of Call event to the sending fax relay.

This is the normal way that a fax session should end.

lndicates that the TCP has detected a network disconnect from thle sending fax relay.

The call manager discards any T.38 packets it has buffered for transmission over the network, because the sending side has terminated the connection and therefore cannot process more T.38 data. However, it does not disable T.38 because some fax data may still be in the T.38 module. When the T.38 protocol times out it will generate an EOF, which the receiving fa.x relay detects as the end of session. The call manager can then disable T.38.

Note: A UDP connection ha.s no way to detect a network disconnect from the sending fax relay.

lndicates that the sending fax relay wishes to terminate the call.

There may still be fax data buffered in the T.38 module and the network stack. The call manager waits until the receiving fax relay detects the end of session.


Sending Fax Relay

VPM Connection Firmware Manager

Receiving Fax Relay

Connection VPM Manager Firmware

+ Dial tone

DTMF digit t--------

Dial tone stop -_____+

DTMF digit(s)

+ Start codec

Stop codec

Enable T.38

Negotiate network voice connection

Start voice flow over network

Shut down network connection

Negotiate network fax connection

Start fax flow over network -

Start G.711

Fax Transmission in Progress

Fax EOF I -- 4- 1 Stop G.711

/ + Fax Disable

Shut down network connection after Fax EOF is detected at

receiving end

Figure 3: Typical Fax Call Sequence of Events

_____+

Ring

r

Start codec t--------

Fax detect

Stop codec

______+

Enable T.38 - Start G.711

- On-hook

Fax EOF

_____+

Stop G.711 ______+

Fax Disable

IP SPEAKERRPHONE REFERENCE DESIGN 2 1

Call Classification

All connections are initially set up as voice calls. Call classification

provides a mechanism for identifying those calls that are actually fax

transmissions so that the connection can be reconfigured as a fax call.

At the beginning of a fax transmission, the sending fax relay transmits a

21 00-Hz tone. This tone identifies the transmission as a fax and distinguishes it

from a voice or modem transmission. At the far end, the receiving fax relay

identifies this tone in the received data stream and uses it for two distinct

purposes:

To automatically disable the echo canceller and/or non-lin'ear

processor.

To initiate the process of changing the channel from a voice

connection to a fax connection.

These two purposes are independent. Echo cancellation tone disabling

does not affect the operation of the fax call classifier, and vice versa.

When the call classifier identifies the 21 00-Hz tone and classifies an

incoming call as a fax, it generates an unsolicited event to alert the VPM. The

call manager (e.g. SIP or H.323) must then do the following:

Tear down or suspend the voice channel and change the operating

mode to T.38 fax relay.

Negotiate a set of parameters for the fax session.


Ringing

The VPM firmware allows an application to make a TDM port ring with any

ring cadence required. Ring cadence refers to the sequence of ringing and

silence, including the duration of each ring and each pause between rin~gs. Table

6 shows standard ring cadences for different regions of the world.

Table 6: Ring Cadences in Regions of the World

I country I Standard Ring Cadence

United States

Japan

Caller ldentification

- - - - - - - - - - -

Two seconds of ringing,, four seconds of silence. - One second of ringing, two seconds of silence (NTT reg~~lar ring)

United Kingdom

Other European countries

Caller ldentification is a feature that sends information about a caller to the

telephone being called. The type and format of the information depends on the

country. The VPM firmware should at least support the following caller IlD

formats:

0.25 seconds of ringing, 0.2 seconds of silence, 0.25 seconds of ringing, 2.3 seconds of silence (NTT non-regular ring) - 0.4 seconds of ringing, 0.2 seconds of silence, 0.4 seconds of ringing, 2 seconds of silence -- Varies from country to country

US Caller ID

Japanese Number Display

European Calling Line Identity

--


Generating Caller ID Information

If caller ID information is available, it is generated (sent to the destination

phone) during a non-ringing period in the first ring. Normally, applicatiori software

should allow a delay of at least 0.5 seconds between the falling edge of the ring

envelope and the start of the caller ID transmission. Shorter delays than this may

prevent the attached telephone from reliably decoding the caller ID information.

Security Buffer Chains

Security buffer chain functions, which are included in the Security Module,

allow encryption and decryption operations on memory blocks no larger than

4088 bytes. However, it is possible to build chains of buffers that exceed 4088

bytes in total. These buffer chains allow encryption and decryption operations on

memory blocks greater than 4088 bytes and on non-contiguous memory.

The total size of a buffer chain is the sum of the sizes of each buffer in the

chain. The total size of a buffer chain must comply with the restrictions shown in

Table 7.

Table 7: Buffer Chain Size Restrictions

r

Security Operation Type

Hashing, padding disabled

Hashing, padding enabled

- Total Size of Buffer Chain I Must be a multiple of 8 I

--

Must be a multiple of 64

- No restrictions


After a security operation is performed on a particular buffer chain, the

buffers are no longer associated with the chain and the chain ID becomes invalid.

1.5 Latency in VolP Networks

Latency is the delay that a voice signal experiences as it travels from a

speaker at one end of a connection to a listener at the other end of the

connection. If the latency in a network is too large, it will severely impact the

ability of users to maintain a two-way conversation. ITU-T G.114 recormmends

that the one-way delay through a network be less than 150 ms for acceptable

voice quality.

Propagation delay, the time a voice signal takes to travel across the

network, is unavoidable in any telephony application. It is compounded1 in IP

networks, however, because packets may be buffered in switches, routers, and

other network elements en route to their destination. This can be mitigated with

efficient VolP gateway and network design that, for example, prioritizes voice

packets to minimize the switching and routing delays they experience.

Delays are also incurred by the process of sampling voice data, encoding

it, and placing it in packets for transmission over the IP network. At the receiving

end, of course, the reverse operations also contribute to delay, as do processes

such as echo cancellation, noise suppression, and filtering. These delays

depend on a number of factors, including:


The capabilities of the processor and the speed of the media.

These factors must be considered in the design of the IP phone,

gateway, and of the network itself.

The type of speech codec.

The size of the packets. This parameter must be controlled by the

application software.

1.6 Jitter Buffer

A major challenge in supporting interactive audio over any WAN networks,

including IP networks, is the need to provide synchronous playout of audio

packets in the face of stochastic end-to-end network delays. This support

typically achieved by delaying the received audio packets' playouts through

buffering the packets for sufficient time so that most of the packets will have been

received before their scheduled playout times. The additional artificial delay until

playout can either be fixed throughout the duration of a call or vary adaptively

during a call's lifetime. Packets that are not received before their schedded

playout time are considered lost. Depending upon the codec type with which

voice is encoded and missing packets are masked, packet loss ratio of between

1 and 10% can be tolerated in most VolP systems.

Since in IP networks, end-to-end delays may fluctuate rapidly and

significantly over small intervals of time, adaptive playout algorithms which adjust

rapidly to these changing delays can achieve a lower rate of lost packets for both

a given average playout delay and a given maximum buffer size are cornmonly

used in the VolP systems to adaptively respond to the variable delays.


If both the propagation delay and the distribution of the variable

component of network delay are known, a fixed playout delay can be computed

such that no more than a given fraction of arriving packets are lost due to late

arrival. In this approach, the playout delay is fixed either for the duration of the

audio call or is recalculated at the beginning of each talkspurt. One potential

problem with this approach is that the propagation delay is not known adthough it

can be estimated and typically remains fixed throughout the duration of the call. A

more serious problem is that the end-to-end delay distribution of packets within a

talkspurt is not known and can alter over relatively short time scales.

A better approach to deal with the unknown nature of the delay distribution

is to estimate the delays and adaptively respond to their change by dynamically

adjusting the playout delay. The adaptive playout algorithms determine a playout

delay on a per talkspurt basis. Within a talkspurt, packets are played out in a

periodic manner, thus reproducing their periodic generation at the source. But,

the algorithms may alter the playout delay from one talkspurt to the next, thus the

silence periods between two consecutive talkspurts at the receive end may be

artificially expanded or compressed with respect to the original length of the

corresponding silence period at the sender. The change of silence periods by

small amount is not noticeable in the played-out speech according to many

studies.

Among the available adaptive playout algorithms, the one develclped by R.

Ramjee has been deployed and implemented on either microprocessors or DSPs

by many VolP system designers. This algorithm relates every other del,ay


parameter to the delay of the first packet in talkspurt while makes no assumption

about the synchronization of the host sender and receiver clocks. A surnmary of

this algorithm is given below:

Given Ri as the receive time of packet i and Ti as the transmit time of

packet i (time stamp), the end-to-end delay for packet is computed as

and the average delay is

where U is a constant (e.g., 111 00).

Standard Deviation of delay is calculated as

Vi = (1 - U) (Vi- I + U)(abs[Di - di])

The execution (playout) time of the first packet is

Pi = Ti + di + KVi

where K is a positive constant.

The time between transmission and playout of the first packet is

Qi= Pi- Ti

Next packet's ( i+ l ) playout time is a sort of a displacement compare with

the first packet:


Pi+ I = T i+ I + (2

If Ti - Ti-I > 20 ms (silence limit) Then,

the packet i should be the first packet of the talkspurt unless that packet is 10s

due to packet loss.

This algorithm can be improved to check and adjust for a spike

characterized by a sudden large increase in packet delay.

GIPS NetEQ Jitter Buffer

The MSP2020 uses the Global IP Sound NetEQ jitter buffer to overcome

the delay jitter experienced from IP networks. The GIPS NetEQ implements a

similar adaptive playout mechanism as described above with supporting 10-60

ms speech packet frames at the input to the buffer and 10 ms of playout at the

output of the buffer. This higher jitter buffer resolution reduces excessive packet

and systems delays and increase speech quality. The physical size of the buffer

is programmable, but by default it is set to 300 ms. No limitation required on

using of any codec decoders with the NetEC1.

Appendix A shows the voice quality measurements on three different

Analog Telephone Adaptor (ATA) platforms: PMC-Sierra's Mckinley and Stein

reference designs using MSP2015 or MSP2020 processors and Japanese Yahoo

BB (YBB) ISP gateway1ATA. All three platforms were undertaken the same tests

using G.711 and G.729 codecs, the same delay distribution (delay between

consecutive packets), Gaussian with different mean and standard deviations and


uniform with different boundary values and under packet loss ratios of 0%, 2%,

and 3%. As seen from the plots of Perceptual Evaluation of Speech Quislity

(PESQ) versus Reference Voice Files, the Mean Opinion Score (MOS) obtained

from Mckinley and Stein, on average, is higher than that from YBB. In some tests

Mckinley, showed higher score than Stein for G.711 codec.


2 Features

2.1 Hardware Features for the IP Phone

The MSP2020 IP Phone Reference Design provides the following

hardware features:

Handset speaker and microphone

Hands-free speaker and microphone (Speakerphone)

Line-In and Line-Out interfaces

One FXS port for FAX

One RJ45 for a WAN port

Keypad interface

Off -hook/On-hook switch

LCD module interface

12V DC power input

Four LEDs for power indication and status information

Push-button reset switch

RS323 interface

Header for spare GPlO pins for user specific applications

JTAG interface to the processor (MSP2020)


acoustic and line echo. The sampling rate for both ADC and DAC is generally

about 8 KHz to achieve an audio bandwidth of 4 KHz for human voice.

As an option, one Foreign exchange Subscriber (FXS) channel is also

connected to MSP2020 TDM bus. This FXS channel is connected to an analog

FAX machine. Fax transmissions can be sent in clear channel or T.38 based on

bandwidth requirements.

The FXS circuit is made up of two main parts: A CODEC and a Subscriber

Line Interface Circuit (SLIC). A CODEC consists of an ADC, which converts the

analog signal from the analog fax machine into a digital signal, and DAC, which

converts digital signals to analog ones to drive the fax machine. The sampling

rate for both ADC and DAC is generally about 8 KHz. In ATA applications, the

SLIC device also emulates PSTN voltage levels, must detect if the phone is off-

hook or on-hook and generate a ringing voltage up to 120 Volts.

On the packet side, one of the three independent MSP2020 1011 00

Ethernet MAC controllers, configured in MI1 mode, is connected to a 10,1100

Mbps Ethernet WAN PHY, for the Internet connection.

MSP2020 receives digitized voice and fax data from the TDM bus,

converts it to data packets and uses a variety of internet and voice related

protocols to send to it to a local device or across the IP network to a destination.

In the packet receive direction, the MSP2020 reconverts the data packets

received from MI1 interface to digital voice or fax signal and sends it out to the

vocoder or the FXS port via the TDM bus. The vocoder converts the digital voice

- -


signal to analog voice signal and then the analog signal is amplified and output

from the handset speaker or hands-free speaker. The FXS port converts the

digital fax signal to analog format and sends it out to the fax machine.


5 Functional Description

5.1 MSP2020

The MSP2020 is a multi-service processo Ir capabl e of numero bus end use

applications. The MSP2020 includes a glueless interface to 133MHz SIDRAMs,

an ELB interface for Flash memory, three MIIIRMII Ethernet interfaces for direct

connection to external Ethernet PHY devices, a TDM interface for vocotler or

speakerphone ICs and SLICISLAC devices and several other peripheral device

interfaces not used in this design.

For this application, the MSP2020 is interfaced with the ATH3100, a

speakerphone IC from Acoustic Technology, to provide two acoustic transducer

interfaces (a base speaker and a base microphone), a handset interface

(handset speaker and microphone), acoustic echo cancellation, gain adjustment,

noise reduction, and optional DTMF and ring tones generation. These tones can

be also generated via the MSP2020 Voice Processing Module (VPM) firmware.

For the FolP option, the MSP2020 is interfaced with, the LE88221, a duel

channel SLICISLAC device from Legerity although only one channel is used for

one line fax support.

The data is transfer between the MSP2020 and both the ATH3100 and

LE88221 through the TDM (PCM) interface. The 2-Wire interface is used as a

serial microprocessor access tolfrom the ATH3100, The SPIIMPI interface is

used as the signaling and microprocessor access tolfrom the LE88221.


In this application, ATH3100 is configured to supply the TDM clocks and

frame pulse to both the MSP2020 and the L.E88221. Some of the GPlCl pins of

the MSP2020 are used as control (reset and chip select) and interrupt purpose

for the ATH3100 and LE88221.

Also some of the MSP2020 GPlO pins are used to interface a standard

12-key keypad and a text base LCD module.

5.2 Memory

The MSP2020 has a dedicated SDRAM interface and is connected to a

32-bit 133MHz SDRAM device for a 128Mbits of RAM space.

The boot code will reside on a Flash memory device. The MSP2020 has a

dedicated ELB interface to the Flash memory devices and in this application is

connected to 32Mbits of Flash.

5.3 ATH3100

The ATH3100 is the next generation Full-Duplex Speakerphone SoC from

Acoustic Technologies, Inc. This device builds on a patented core full-duplex

echo cancellation, noise reduction, and sound enhancement technology with

added features and enhanced functionality (:compared with the older generation

ATH3000) for improving the audio quality and providing phone management

capabilities for digital PBX, standard PSTN telephony terminals and VolP (well-

suited for IP applications). The added features and enhanced performance,


shown below, provide improved sound quality, full duplex performance and

natural communication for all speakerphone-enabled applications.

ATH3100 enhancements are

Integrated Caller-ID

Acoustic Echo Cancellation of 65dB with a 64ms adaptive filter tail

Noise Reduction up to 18dB

Network Echo Cancellation of 45dB with a 16ms adaptive filter tail

Automatic Gain Control for Microphone and Line-Input

Low Power dissipation of 65mW

Virtually Pin-for-Pin compatible with the older generation, the

ATH3000

Green Packaging Option Available

5.4 FXS Interface

The FXS interface is designed using an off-the-shelf integrated

SLIC/SLAC device. The Legerity LE88221 performs all line functions and is

programmable for global usage. In this application, the codec receives the PCM

fax stream from the MSP2020 via its TDM interface. The MSP2020 will control

the operation of the codec over its MPVSPI interface.


5.5 WAN Uplink

The WAN uplink supports 10/100Mbps Ethernet and provides the logical

connection of traffic to the Internet. An off-the-shelf PHY transceiver is

connected to the MSP2020 via its MI1 interface.

5.6 Power Supply

The power will be derived from a 12V AC/DC wall adapter to the board.

The major components consuming the majority of the total board power are listed

below in Table 8.

Table 8: Maximum Current Consumption

3.3V Rail 1.8V Rail 5V Rail 1 1 1 12V Rail

Device No of devices Current (mA) current (mA) Current (mA) Current (mA)

SLIC/SLAC' 1 78 120

Audio Amplifier 1 6

ATH3100

Ethernet PHY

Note:

1. SLICISLAC Power during ringing condition is estimated.

1

1


20 1 - 148

6 Circuit Design Considerations

The following sections comment on the schematic circuit design for the

MSP2020 and the rest of the IP phone's circuit design. Refer to the schematics

page number listed in the section title for the circuit connections described in that

section.

6.1 MSP2020 Circuit Design

6.1.1 Power Requirements and Supply Filtering (Page 2 of Schematics)

The MSP2020 requires two power supplies, 1.8 V core and 3.3 Lf 110. It is

important to connect all power pins to the correct power supply as damage can

occur to the device if any are left unconnected. Refer to Table 8 for per rail

requirements and power consumption specifications.

6.1.1.1 Digital Power Pin Decoupling

It is recommended that digital power de-coupling capacitors be evenly

distributed around the device. Ideally there should be a 0.1 pF high frequency

capacitor as close as possible to each cluster of power pins with a 10 pF bulk

capacitor placed close to the device.

6.1.1.2 PLL1 and PLLO Power Pin Decoupling

The internal device clocking requires stable quiet power through the PLLO

and PLLl power pins. Figure 6 illustrates the required decoupling circuit: for the

MSP2020 internal PLL.


6.1.2 TDM interface to ATH3100 (Pages 2,3 and 4 of schematics)

Figure 8 shows the interface between the MSP2020 and the ATI-13100. In

this application, the ATH3100 is used as the master device that generates TDM

(PCM) clock (1.544MHz for T I and 2.048MHz for E l line rate) and 8 KHz frame

pulse to both the MSP2020 and SLICISLAC; device TDM interfaces.

The reset signal to the ATH3100 is driven from GP10-17 on the

MSP2020. The ATH3100 will come out of reset configured to pass audio in

speakerphone mode between the base acoustic interface and the PCM line

interface. Driving the reset signal from a GPlO pin will hold the ATH3100 in reset

until the MSP2020 configuration is completed.

Also, a segment of MSP2020 accessible memory can be used to store the

ATH3000 register values and must be alterable without needing to rebuild the

MSP2020 code and preferably without disturbing any other constants used by the

MSP2020. We can accomplish this in this reference design by placing al

separate 12C EEPROM on the 2-Wire serial bus. The ATH3100 SNV utility, LOcho,

will allow read and write the contents of this EEPROM without disturbing anything

else.

On the audio interface connected to the ATH3100, to avoid echo

generated by the electrical coupling from speaker to microphone at very low

signal levels, it is recommended that designer do not power the base speaker

amplifier and the microphone bias from the same supplies. There are several

options, but one is chosen in this reference design to solve the problem by


6.1 .XI Timing

Timing for both the read and write to the flash is shown in Table 9 and

Table 10. A 25.00MHz ELB output clock and AMDISpansion MBM29LLf320DB-90

was used for the timing analysis.

Table 9: Flash Read Timing

head Cycle Time 1 Trc (ns) 1 Trc (ns) I MarginIRemarks

hddress to Output Delay I Tacc (ns) I

\chip Enable to Output Delay 1 Tce (ns) 1 I

butput Enable to Output Delay I Toe (ns) I

( ~ a t a Set up Time I Toe (ns) I Ts (ns) I MarginIRemarks

1 1 rn; 1 max 1 ;-I 1 max 1 4511s --

Chip Enable to Output High-Z Tdf (ns) Tdf (ns) MarginIRemarks

I I min 1 max I min 1 max I 14ns

(Output Enable to Output ~ i g h - 4 Tdf (ns) 1 Tdf (ns) ( MarginIRemarks


min

30

max 14ns

Table 10: Flash Write Timing

l~arameter I Flash Specification I MSP2020 Specification I MarqinIFlemarks

I I TWC (ns) I Twc (ns) 1 MarginiRemarks

b r i te Cycle Time I rnin I rnax I min I rnax I

I I min 1 max I lnin I max I Address Setup

/Address Hold I Tah (ns) I Tah (ns) I MarginiFlemarks

l ~ a t a Setup I Tds (ns) I Tds (ns) I MarginiFlemarks

90

I I min I max I min I max I

185

I I min I max I min I max I

Tas (ns)

Data hold

Read Recover time Before Write Tghwl (ns) Tghwl (ns) MarginiFiemarks

I I min I max I rnin I max I

- Tas (ns)

45

Tdh (ns) Tdh (ns)

9511s

MarginiRemarks

60ns

MarginiFlemarks

Write Pulse width kgh 1 3 5 1 - 1 1 1 5 1 - 8011s

rite Pulse width Twph (ns) Twph (ns) MarginiRemarks

I I min rnax 1 min -

I rnin I rnax I rnin I rnax I

0

rnax

I rnin ( rnax I rnin I rnax I

1 lOns

2E Pulse width -ligh Tch (ns) MarginiRemarks

TWP (ns)

I rnin I rnax I rnin I rnax I

-- MarginiRemarks

I rnin I rnax I rnin I rnax I 2E Setup time

I rnin I rnax I rni~n I rnax I

25

Tcs (ns) Tcs (ns)

2E hold time


15ris

MarginiRemarks

Tch (ns) Tch (ns) - MarginiRemarks

6.1.4.1 Timing

Timing for both the read and write to the flash is shown in Table 11. A

133MHz DRAM clock and Micron MT48LCaM32B2-7 was used for the timing

analysis.

Table 11 : SDRAM Timing

I SDRAM Specification I MSP2020 Specification

SDR-CK-OUT Clock Ipeiiod , ( 1

tYP

- -

bvcle I Dclk (ns) I Dclk (ns)

MarginlRt marks

MarginIRe marks

MarginIRe marks

I j min I typ K i m i T max ,

Trclk (ns) Trclk (ns)

0

SDR-CK-OUT High Period

SDR-CK-OUT Low Period

IFall Time I Trclk (ns) I Trclk (ns) MarginIRe I marks

min min tY P max 2.75 4.1 3 0.63ns


Thclk (ns)

min

2.75

MarginIRe marks --

0.63ns -

MarginIHe marks

tYP fxaZ -

Thclk (ns)

Tlclk (ns)

3.38 min

Tlclk (ns)

ty P max

4.1 3

Parameter SDR AM Specification -- MSP2020 Specification

Processor Read 1 Data Setup time 1 T;rrtl Ri T;g) 1 T;irrt [:in; TzE) 1 M:%:e 5.5 - 0.3 1.7ns

l~rocessor Write 1 1

6.1.5 MI1 (101100 Ethernet) Interface (Pages 2 and 7 of schemalics)

From the three independent MI1 interfaces of the MSP2020, MACA

(MII-A) interface is used for WAN connectivity; MACB (MII-6) and MACC

(MII-C) interfaces are not used for this reference design. In this reference

design, the MSP2020 is connected to either the IC+ IP101A or Realtek RTL8201

PHY devices. The MSP2020 ELB-CLKO and GPIO-0 are used as PHI'-CLK

and device reset signal, respectively, for the IP101 A. The timing analysis below is

specific to these two devices but in general they can be applied to any 11 011 00 MI1

PHY. Figure 11 shows how the MSP2020 is connected to the IP1OIA in the MI1

mode of operation.


I Ma;t;rdfe- I Larameter I PHY S~ecification I MSP2020 S~ecification

Mll Interface DatdCntrl Hold I time Thtxclk (wrt Rising

rnin 0.5

Receive Timing

Processor Write

Management Interface

klargin1Rema rks

16ns MarginIRema

rks

13ns

DatdCntrl Setup time Tdrxclk (wrt Falling Tsrxclk (wrt Rising Edge) (ns)

min min 1 7

NlarginIRema rks

600ns > 100ns ---

ransmit I Receive MDC (ns) MDC (ns)

Data Setup time

-

l~rocessor Read I 1 1 1

tYPm=

min 280

Tds (wrt Rising Edge) (ns) m in tYPma 10 - -

Data Hold time

Clock to Output Tco (wrt Rising Edge) (ns)

tYPmax 600

Tdh (wrt Rising Edge) (ns) m in 10

MarginIRema

802.3 Specs

DataICntrl Hold time Thrxclk (wrt Rising Thrxclk (wrt Rising Edge) (ns)

min min 7

pata Setup time 1 o

Tds (wrt Rising Edge) (ns) ]e:":EEE 802.3 Specs

(0 - 1 Ons)

-

(10 - 300ns) MarginIRema

rats old time 1 1 Tdh (;!Rising - 1 -rks y ma Meets IEEE

802.3 Specs (0 - 1 Ons)

tYPm= -

--


6.1.9 JTAG Interface (Page 1 of schematics)

The MSP2020 supports the IEEE Boundary Scan Specification as

described in the IEEE 1149.1 standard. Refer to PMC-2021518 JTAG Test

Features Description application note for a description of the test features

included on the MSP2020. The JTAG interface is also used to load boot code

and burn it to the Flash.

The JTAG interface of the MSP4200 in this reference design is connected

to a 2x7 header so that if the JTAG interface is not used, the JTAG interface data

and clock signals can be pulled low or high and the JTAG reset signal ('TRST-N)

can be connected to the master reset signal (RESET--N).

6.1.1 0 GPlO Allocation (Page 1 of schematics)

Table 13 shows the lists of GPlO allocation for this reference Design.

Table 13: GPlO Allocation for IP Phone Deslgn with FAX Support

GPlO MSP Function IP Phone Available Function

GPIONOT MSP IP Phone Available Function Function

1 I TIMER-B I Flash button (I) 0 I TIMER-A I LHY--RESET-

15 I UARTI-SOUT I LCD Soft key 2 1 4 I TDM-RXD I

8

GPlO 1 ELB-CS7-N 1 ;;lurne HIILO 1 1; ;; 1 TDM-TXD 1 ELB-CS6-N Volume H IILO SMPI-SDO FXS Signaling

PCI-AD-8

s - b i i - : ~ i g n a ~ n g GPIO

30 1 ELB-CS4-N I Mute LED h 1 - - 1 SMPI-SCLK I FXS Signaling

Speaker (ONIOFF) (I)


2

-

TDM-RXCLK

GPlO I MSP Function IP Phone Available I Function

GPlO NOT Available

MSP IP Phone Function Function

31 I ELB-CS3-N I Hold (I) 11 2 - - 32 I ELB-CS2-N I Mute Button 1 13

HDR Spare I GPlO 1 -I7

HDR Spare I GPlO

Key Pad 2

44 I MII-C- I LED FAN ind. 1 35

42

43

Key Pad Exten.

GPlO

MI I-C-

MI I-C-

45

46

HDR Spare I GPlO I HDR Spare 1 GPlO

Key Pad 4

Key Pad 3

MI LC-

MI I-C-

INTI-N 1 PHY-..INT

24 - ;! 5

MI I-C- PWRAMP-SHU

OFF HOOK (1)

LCD-DATA

6.1.1 1 Keypad Interface (Page 1 of schematics)

For this IP Phone reference design, a standard 3x4 keypad (e.g., -1 02

model from Grayhill) that needs 7 input signals is used. 7 GPlO pins of the

MSP2020 are allocated for this keypad interface. Refer to Table 13 for the

mapping of the GPlO pins used as the keypad input signals. The keypad can be

36 - 54


The reason the shift register is preferred to a dedicated serial-in/'parallel-

out register is that data is latched on the rising edge of the clock, thus the timings

and clock polarity is not an issue here. Before data can be written to, the shift

register is cleared by loading every latch with zeros. Next to provide the "EM

gate, a high voltage (logic "1") is written followed by the "WS" bit and the four

data bits. Once the register is loaded in correctly, the LCD-Data signal is pulsed

to Strobe the "E" bit.

A voltage divider circuit is used to input a constant voltage to the LCD

controller's Contrast pin. The other alternative would be to use a potentiometer

wired as a voltage divider to provide an adjustable contrast function.

A 16-pin header is also provided to connect the LCD module directly or via

ribbon cable to the PCB.

6.1.13 Unused Pins

Terminate unused pins according to the instructions below:

If the JTAG port is not used, connect TRSTB to RSTB. The

boundary scan state machine must be reset prior to normal device

operation to prevent some or all device I10 pins being held in test

mode.

Depending on the interfaces used. The unused interface':;

active-high input pins should be grounded, and active-low input

pins should be tied high. Use pull-up and pull-down resistors when

it is possible. This increases the feasibility of future modification.


Unused output pins can be left floating. The Unused GPIO pins are

connected to a header in this reference design so that they can be

used for additional applications.

6.1.1 4 Power Supply Circuit Design (Page 8 of schematics)

The reference design board contains devices that require a three different

voltage levels. They are 12V, 5V, 3.3V and 1.8V. The input voltage to the board

is 12V supplied via an ACIDC adapter frorn a standard 100-240 VAC, 50-60Hz

wall outlet. The 12V is then converted to 5 'V and 3.3V through two dedicated

regulators. The 3.3V supply is then also converted to 1.8V for the core power to

the MSP2020.

A mater hardware reset circuitry with a push-bottom switch is also

provided in the power supply design schernatics page.

6.1.1 5 Thermal Management

The MSP2020 was designed to operate over a wide temperature range

when used with a heat sink. Refer to the device compact model, located in the

Package and Thermal section of [8], to determine if a heat sink is required for

your system.

6.1.1 6 Simulation Models

The MSP2020 IBIS simulation model is available for download from the

http://www.pmc-sierra.com website. The IBIS model can be used to simulate all

I10 except for high-speed 110, differential I/C), and other interfaces that cannot be

accurately modeled with IBIS.


7 Layout Design Considerations

7.1 MSP2020 Layout

7.1.1 Placement

This section describes some guidelines that can assist in the placement

and orientation of the MSP2020 and other external components in PCB designs.

It is important to know where the various signal groups are located on the

MSP2020 prior to placement and orientation of the chip. Figure 15 shovvs a top

view of the MSP2020 and the approximate physical locations of the signal

groups. (Note that although the center region of the chip shows power and

ground connections, these connections are also mixed in amongst the other

signal regions as well.)

The placement and orientation of the MSP2020 and other external

components should be done while paying attention to the guidelines presented in

the rest of section 7.1.

--A


Match trace lengths of address, data and control lines to within

0.05" to minimize skew from signal routing. Skew decreases the

timing margins resulting in incorrect memory accesses.

Minimize vias and via stub length on the interface.

For optimal signal integrity, sirnulations should be performled.

7.1.4 MI1 (101100 Fast Ethernet) Interface Layout

The following guidelines should be for routing the Ethernet interface

signals:

To keep the skew value in the timing analysis performed in section

6.1.5 as low as possible, trace length matching was performed to

keep the signals in the same transmit or receive direction of the MI1

interfaces to +/- 0.01 0".

Ethernet PHY layout should follow guidelines provided in the PHY

device Layout Guide. This document should describe requirements

between the PHY and the RJ-45 connector and to the magnetic

circuits. Some Ethernet PHY vendors recommend using separate

analog power and ground planes for each PHY on the board.

7.1.5 TDM Interface Layout

The following guidelines should be used for routing the TDM interface

signals:

Trace length matching should be performed to keep the signals in

the same transmit or receive direction of the TDM interfaces to +/-

0.01 0".

Signal integrity simulations on the clock line should be preformed to

determine if termination resistors are necessary.


Use series termination on for each TDM clock and frame lpulse to

overcome any glitch in the clock and frame pulse signals.

7.1.6 FXS Interface Layout

FXS requires Isolation around the connector and TIP/RING tracc !S (' 1.e. no

planes around connector and TIPIRING traces).

7.1.7 PLL Filter Layout

PLLO-AVDH and PLL1-AVDH RC filter circuits should be placed as close

as possible to the device pin to reduce noise picked up on the path to the device

pin.

7.1.8 Audio Interface Layout

Run a dedicated trace (fairly wide, approx 0.020") from the LM4950 audio

amplifier's ground pin to a point as close a possible to where the +12V :;upply

enters the board. The critical concern here is to keep the fairly substant~~al current

created through the speaker from creating even tiny voltages on the analog

ground especially near the return point for the base microphone.


8 Conclusion

VolP products such as ATAs, IP phor~es, wired and wireless routers and

gateways have gained significant popularity for last couple of years in the world.

While in Korea, Japan, Taiwan and some countries in Europe full deployment of

VolP networks has been started, in North America, China, and India this

technology is rapidly gaining market share and challenging the Internet Service

Providers (ISP) and traditional PSTN networks to reduce their prices, particularly,

for long distance phone calls.

The IP Speakerphone Reference Design will assist engineers in designing

low cost IP speakerphone boards using PMC-Sierra's PMC-Sierra MSP2020

multi-service processor and bring their designs to market more quickly. This

reference design supports most of major telephony applications in hardware

along with an optional T.38 FAX interface.

In this project, the board layout considerations such as PCB signaling

layers, the analogue and digital trace lengths, widths and impedance

were not deeply taken into account in details. However, like any other

high-speed board designs, specific requirements must be me? by the

physical layout designers based on the implemented technology.

Depending on the type and dimension of the enclosure selected to

contain the PCB, a software tuning is needed to configure the AEC

module of the ATH3100 for an optimal performance.


Also, this project did not include the software development phase;

however, for future need, the software for this Reference Design

should include the following modules:

Driver modules to configure and control the operation of MSP2020,

ATH3100, SLICISLAC device, Ethernet PHY device and LCD

controller.

Voice Processing firmware module running on MIPS core of the

MSP2020.

Application software for basic telephony functions such as on-hookloff-

hook, ring generation, busy tone, mute, speed call, hold, forward, etc.

A real time operating system with reliable multi-tasking and m~ulti-

scheduling capabilities to control and harmonize the devices'

operations and to satisfy the real time processing specifications.

SIP application or other signaling protocol to activate and coordinate

the various components to complete a call.

After the prototype board is built and the software is available, the

designer should debug the board by performing a feature test plan to test the

main functionality of the design.

Although currently, there are various versions of IP phones available in

market, design of the ones with high voice quality, with reduced power

consumption (for wireless applications), and with full features and less echo is

still one of the real challenges in today's VoIP world.


9 Disclaimer

This document is a paper reference design, and as such, has not been

built or tested as of this date.

Because the schematics and BOM are a part of PMC-Sierra's Intellectual

properties, they are not included in this report. The schematics and BOlM can be

downloaded from PMC-Sierra's web site (w_ww.pmc-sierra.com) with an

advanced permission from PMC-Sierra, Inc.


Acronym List ADC

AEC

ATA

CNG

DAC

ELB

FolP

FXO

FXS

IC

I P

LAN

LCD

LEC

LMS

MI1

MOS

NLMS

PCB

PCM

PESQ

PLL

RTP

SIP

SLAC

SLlC

s o c

TDM

UDT

Analog to Digital Converter

Acoustic Echo Canceller

Analog Telephone Adapter

Comfort Noise Generation

Digital to Analog Converter

External Logic Bus

Fax over IP

Foreign exchange Office

Foreign exchange Subscriber

Integrated Circuit

Internet Protocol

Local Area Network

Liquid Crystal Display

Line Echo Canceller

Least Mean Square

Media Independent Interface

Mean Opinion Score

Normalized Least Mean Square

Printed Circuit Board

Pulse Code Modulation

Perceptual Evaluation of Speech Quality

Phase Lock Loop

Real-time Transport Protocol

Session Initiation Protocol

Subscriber Line Audio-processing Circuit

Subscriber Line Interface Circuit

Systems on Chip

Time Division Multiplexing

Unstructured Data Transfer


UNI

VAD

Vol P

WAN

User-network Interface

Voice Activity Detection

Voice over Internet Protocol

Wide Area Network

Definitions

TDM (Time Division Multiplexing) -- A method of multiplexing by which a

transmission channel is divided into discrete time intervals

PESQ MOS - PESQ stands for Perceptual Evaluation of Speech Quality

and is an enhanced perceptual quality measurement for voice quality in

telecommunications. PESQ was specifically developed to be applicable to end-

to-end voice quality testing under real network conditions, like VolP, POTS, ISDN,

GSM etc. The PESQ MOS score as defined by the ITU recommendation P.862

ranges from 1 .O (worst) up to 4.5 (best).

--


References

1. Acoustic Technologies, August 2004, "ATH3000 Data sheet", Version 1.5.

2. Acoustic Technologies, July 2004, "ATt-13000 Use & Integration Guide", Version 1.5.

3. Acoustic Technologies, June 2005, "ATH3100 Preliminary Information", Version 0.1.

4. Balaji Kumar, 1995, Broadband Commugications, , McGraw-Hill, Inc.

5. Daniel Minoli & Emma Minoli, 2002, Dedjverinq Voice over IP Networks, 2nd Edition, Wiley.

6. ITU-T Recommendation G. 168, 08/2004,, "Digital Network Echo Cancellers".

7. ITU-T Recommendation (2.23, 1111 998, "Technical Features of Push-Button Telephone Sets".

8. PMC-Sierra Inc., October 2004, PMC-2041639, "MSP2020 Multi-Sewice Processor Data sheet", lssue 1.

9. PMC-Sierra Inc., October 2004, PMC-2041704, "MSP2020 Multi-Sewice Processor Data sheet Addendum", lssue 1.

10. PMC-Sierra Inc., October 2004, PMC-2041640, "MSP2020 Multi-Sewice Processor Hardware User's Manual", Issue 1.

1 1. PMC-Sierra Inc., November 2005, PMC-2041860, "MSP20xx Errata", lssue 2.

12. PMC-Sierra Inc., October 2004, PMC-2041641, "MSP2020 Multi-Sewice Processor Product Overview", lssue 1 .

13. Legerity, "Le8822ll226I24ll246 Dual AHS VoicePort Device Data sheet"


Documents

INTERNET PROTOCOL (IP) SPEAKERPHONE REFERENCE DESIGN