Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
INTERNET PROTOCOL (IP) SPEAKERPHONE
REFERENCE DESIGN
Khosrow Mossarmen-Amini
A REPORT SUBMllTED IN PARTIAL FULLFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF ENGINEERING
In the School of
Engineering Science
O Khosrow Mossannen-Amini 2006
SIMON FRASER 1JNlVERSlTY
Spring 2006
All rights reserved. This work may not be reproduced in whole or in part, by photocopy
or other means, without the permission of the author.
SIMON FRASER V ~ ~ l v E , t d i brary &&
DECLARATION OF PARTIAL COPYRIGHT LICENCE
The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its user:;.
The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection, and, without changing the content, to translate the thesislproject or extended essays, if technically possible, to any medium or format for the purpose of preservation of the digital work.
The author has further agreed that permission for multiple copying of this work .for scholarly purposes may be granted by either the author or the Dean of Graduate Studies.
It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.
Permission for public performance, or limited permission for private scholarly use, of any multimedia materials forming part of this work, may have been granted by the author. This information may be found on the separately catalogued multimedia material and in the signed Partial Copyright Licence.
The original Partial Copyright Licence attesting to these terms, and signed by this author, may be found in the original bound copy of this work, retained in the Simon Fraser University Archive.
Simon Fraser University Library Burnaby, BC, Canada
Approval
Name: Khosrow Mossannen-Amini
Title of Project: Internet Protocol (IP) Speakerphone Reference Design
Degree: Master of Engineering
Examining Committee
Chair: Dr. Bonnie Gray Assistant Professor of School of Engineering Science
Dr. Stephen Hardy Senior Supervisor Professor of School of Engineering Science
Warren Tam Supervisor CPD Applications Leader of PMC-Sierra, Inc.
Dr. Tejinder S. Randhawa Internal Examiner Adjunct Professor of School of Engineering Science
Date DefendedIApproved: 300. /6p6
IP SPEAKERPHONE REFERENCE DESIGN ii
Abstract
This engineering project, undertaken in PMC-Sierra, Inc., is a paper
reference design that describes the scope and the deliverables required for a
wired lnternet Protocol (IP) phone with speakerphone-enabled functionality. This
reference design can assist engineers in designing an IP phone and therefore
allows them to more quickly bring their designs to market.
Currently a wireless (WiFi) IP phone kit has been built in a PMC-Sierra
design centre in China and will be production released by January 2006.
This report includes a set of schematics and BOM as a paper reference
design of wired lnternet Protocol (IP) phone with speakerphone-enabled
functionality based on the PMC-Sierra MSP2020 multi-service microprocessor.
The schematics and BOM, with an optional interface to an analog FAX machine
to support Fax over IP (FolP), can be downloaded from PMC-Sierra's web site
(www.pmc-sierra.com) with an advanced permission from PMC-Sierra, Inc.
Keywords: Codec, Echo Canceller, FolP, Internet, VolP
IP SPEAKERPHONE REFERENCE DESIGN iii
Acknowledgments
I would like to thank the following irldividuals for their support, feedback,
and guidance throughout this project:
Warren Tam (technical supervisor) CPD Applications, Leader PMC-Sierra, Inc.
Rob ltcush CPD Applications, Manager PMC-Sierra, Inc.
Dr. Stephen Hardy (academic supervisor) Professor, School of Engineering Science, SFU
Dr. Tejinder S. Randhawa (committee member) Adjunct Professor School of Engineering Science, SFU
Hassen Karaa Co-op Applications Engineer PMC-Sierra, Inc.
--
IP SPEAKERPHONE REFERENCE DESIGN iv
Table of Contents
.. Approval .............................................................................................................. 11
... Abstract .............................................................................................................. III
Acknowledgments ............................................................................................. iv
Table of Contents ................................................................................................ v
.. List of Figures ............................................ ........................................................ VII
... List of Tables .................................................................................................... VIII
Introduction ..................................................................................................... 1
......................................................................................... . 1 1 I P Telephony 1
.............................................................................. 1.2 Voice over I P (Vol P) 1
............................................................................................ 1.3 FaxoverIP 3
1.4 Voice Processing Module ....................................................................... 4
1.5 Latency in VolP Networks .................................................................... 25 1.6 Jitter Buffer .......................................................................................... 26
Features ........................................................................................................ 31
2.1 Hardware Features for the IP Phone ................................................... 31
IP Phone Design with MSP2020 .................................................................. 32
Block Diagram ............................................................................................... 35
Functional Description ................................................................................ 36
5.1 MSP2020 ............................................................................................. 36
................................................................................................ 5.2 Memory 37 5.3 ATH3100 ............................................................................................. 37
5.4 FXSlnterface .............................. ., ...................................................... 38 .......................................................................................... 5.5 WAN Uplink 39
...................................................................................... 5.6 Power Supply 39
Circuit Design Considerations ................................................................... 40
6.1 MSP2020 Circuit Design ...................................................................... 40 6.1.1 Power Requirements and Supply Filtering (Page 2 of
Schematics) ................................................................................. 40
... IP SPEAKERPHONE REFERENCE DESIGN V
TDM interface to ATH3100 (Pages 2. 3 and 4 of schematics) ....... 42 ELB lnterface to Flash Memory (Pages 2 and 5 of
.................................................................................. schematics) -44 SDRAM Interface (Pages 2 and 5 of schematics) .......................... 47 MI1 (1 011 00 Ethernet) Interface (Pages 2 and 7 of schematics) ..... 49 TDM Interface to SLICISLAC (Pages 2 and 6 of schematics) ....... 52 SPIIMPI Interface (Pages 2 and 6 of schematics) ......................... 52 Serial Interface (Page 1 of schematics) ......................................... 53 JTAG Interface (Page 1 of schematics) ......................................... 54 GPlO Allocation (Page 1 of schematics) ....................................... 54 Keypad Interface (Page 1 of schematics) ...................................... 55 LCD Interface (Page 1 of schematics) ......................................... 56
................................................................................. Unused Pins -57 ................... Power Supply Circuit Design (Page 8 of schematics) 58
................................................................... Thermal Management 58 ......................................................................... Simulation Models 58
................................................................... 7 Layout Design Considerations 59 ................................................................................ 7.1 MSP2020 Layout 59
...................................................................................... 7.1 . 1 Placement 59 ................................................................ 7.1.2 SDRAM Interface Layout 60
.................................................................. 7.1.3 Flash Memory Interface 62 .................................. 7.1.4 MI1 (1 011 00 Fast Ethernet) Interface Layout 63
..................................................................... 7.1.5 TDM Interface Layout 63 ...................................................................... 7.1.6 FXS Interface Layout 64
............................................................................. 7.1.7 PLL Filter Layout 64 ................................................................... 7.1.8 Audio Interface Layout -64
8 Conclusion ................................................................................................... 65
9 Disclaimer ...................................................................................................... 67
Appendix: Jitter Buffer Performance Results ................................................. 68
Acronym List ...................................................................................................... 75
......................................................................................................... References 77
-.
lP SPEAKERPHONE REFERENCE DESIGN vi
List of Figures
Figure 1 :
Figure 2:
Figure 3:
Figure 4:
Figure 5:
Figure 6:
Figure 7:
Figure 8:
Figure 9:
VPM Block Diagrams ..................................................................................... 5
Echo Cancellation .......................................................................................... 8
Typical Fax Call Sequence of Events ........................................................... 21
IP Phone ...................................................................................................... 32
IP Speakerphone Block Diagram ................................................................. 35
MSP2020 PLL Decoupling Circuitry ............................................................. 41
MSP2020 External Clock Circuitry ............................................................... 41
Connection between the MSP2020 and the ATH3100 ................................. 43
Connection Between MSP2020 and (x8) Flash Memory Devices ................. 44
Figure 10: MSP2020 and One 256Mbit (x32) SDRAM Memory Device ......................... 47
Figure 1 1 : . MSP2020 and Ethernet PHY Connection ..................................................... 50
Figure 12: SLICISLAC Interface with TDM and SPIIMPI Interface ............................... 52
Figure 13: Serial Port Interface to MSP2020 ................................................................. 53
Figure 14: MSP2020 and LCD Connection ................................................................... 56
Figure 15: Top View of MSP2020 with Locations of Signal Groups ............................... 60
Figure 16: SDRAM Clock with 50 ohm Termination ...................................................... 61
Figure 17: SDRAM Clock min results ........................................................................... 62
Figure 18: SDRAM Clock max results ........................................................................... 62
...
lP SPEAKERPHONE REFERENCE DESIGN vii
List of Tables
Table 1 :
Table 2:
Table 3:
Table 4:
Table 5:
Table 6:
Table 7:
Table 8:
Table 9:
Table 10:
Table 1 1 :
Table 12:
Table 13:
Echo Canceller Disabling Tones ................................................................... 11
..................................... Codec Standards Supported by the VPM Firmware 14
................. High and Low Frequency Tone Combinations for Keypad Digits 16
Parameters for Generating DTMF Digits ...................................................... 16
......................................................................... Fax Session Ending Events 20
....................................................... Ring Cadences in Regions of the World 23
Buffer Chain Size Restrictions ...................................................................... 24
................................................................... Maximum Current Consumption 39
Flash Read Timing ....................................................................................... 45
....................................................................................... Flash Write Timing 46
SDRAM Timing ............................................................................................ 48
.................................................................................................... MI I Timing 50
.............................. GPlO Allocation for IP Phone Design with FAX Support 54
...
IP SPEAKERPHONE REFERENCE DESIGN viii
1 Introduction
1.1 IP Telephony
An IP phone is a broadband hard phone, a self contained IP telephone
that looks just like a conventional phone but instead of a conventional phone
jack, it has an Ethernet port through which it communicates directly with a Voice
over Internet Protocol (VolP) server, VolP gateway, VolP Analog Terminal Adaptor
(ATA) or another VolP phone.
IP telephony is the technology for transmitting voice communications over
a network using open-standards-based IP. IP phones combine the functions of a
traditional telephone with an Ethernet connection. Since an IP phone
communicates directly with a VolP based system, neither it does require any
personal computer nor any software running on a personal computer to make or
receive VolP phone calls. It can be used independently; all that is required is an
internet connection.
1.2 Voice over IP (VolP)
Voice over IP (VolP) uses the IP to transmit voice as packets over an IP
network, so VOlP can be achieved on any data network that uses IP, like the
Internet, lntranets and Local Area Networks (LAN).
In VolP, the analog voice signal is digitized, compressed and converted to
IP packets and then transmitted over the IP network. Signaling protocol:; are
IP SPEAKERRPHONE REFERENCE DESIGN 1
used to set up and tear down calls, carry information required to locate users and
negotiate capabilities. To setup a VolP phone system, the following main steps
should be done:
1. A low pass filter filters out the high frequency components from
speech spectrum. The human speech spectrum contains
frequencies beyond 12 KHz. The narrow band telephones are
designed to eliminate frequencies above 3.4 KHz although
nominally the voice band is 4 KHz. The wideband telephones
eliminate frequencies above 7 KHz.
2. An ADC (analogue to digital converter) converts analogue voice
signals to digital signals using Pulse Code Modulation (PCM).
Analog speech signals are sampled at 8 KHz for narrow band PCM
applications and 16 KHz for wideband PCM. The digitization
process measures the analog signal at each sample time and
produces a digital binary code value representing the in~t~antaneous
amplitude (quantization). The quantization error can be reduced
easily using a sample and store circuit prior to the ADC. For most
telephony applications, speech coders are designed to have a
signal-to-noise ratio (SNR) above 30 dB over most of their range.
3. The bits are compressed in a standard format (to reduce ithe bit rate
while keeping voice quality good at an acceptable level) for
IP SPEAKERRPHONE REFERENCE DESIGN 2
transmission using various ITIJ-T CODEC protocols such as G.711,
G.723, G.726, or G.729.
4. The voice packets are inserted in data packets using a real-time
protocol, typically RTP over UDP over IP.
5. A signaling protocol is used to activate and coordinate the various
components to complete a call. Signaling is accomplished by the
exchange of IP datagram messages between the components. The
format of these messages is covered by any number of standard
protocols such as IETF SIP or ITU-T H323.
6. At the receive end, packets are disassembled and data is extracted,
then converted to analogue voice signals and sent to the receive IP
phone's speaker.
1.3 Fax over IP
Currently, there are two ways to implement Fax over IP (FolP). The first
method is based on ITU T.37 standard and is used mainly for store-and-forward
faxing. It defines elements of how Internet email can be adapted to support a
facsimile service and specifies the format in which fax is to be delivered as an e-
mail attachment.
The second method is based on ITU T.38 standard, a protocol for real-time
delivery of FolP. With T.38 real-time FolP, faxes are delivered in real-time exactly
like a regular fax call. Two fax machines should first establish a connection
IP SPEAKERRPHONE REFERENCE DESIGN 3
(synch up) and then send data over a local telephone connection, with ,an IP
network between the two local connections. If the fax is busy, the caller gets a
busy signal and the user has the option to retry sending later or to revert to store-
and-forward mode as a transportation mechanism. A key point is that thle
confirmation takes place during the T.38 fax session, not at a later point.
The fax sent by a fax machine will be T.30 end to end. When that fax hits
the IP subsystem of the phone, the MSP2020 subsystem of the IP phone (the
hardware, the firmware and the software) will encapsulate the T.30 protlocol data
in the T.38 packets, which are then sent to the IP network through the Elhernet
connection. At the other end, the IP phone will extract the T.30 protocol data from
the T.38 packets. Thus, the fax call is T.30 end-to-end. This is different from G.711
pass-through, which is an option on many IP gateways, ATAs, and phones.
The current implementations for real-time FolP via T.38 supports a
maximum of V.17 (14.4 Kbps) fax. An updated version of T.38 supports V.34
(33.6 Kbps) fax operations over IP has beer) standardized, but with no major
implementations to date.
1.4 Voice Processing Module
Beside the above steps in VolP systems, so much work is needed in
firmware and software to ensure an acceptable voice quality transmission over IP
network. Figure 1 illustrated the major firmware functions used in a VolP system
known as Voice Processing Module (VPM).
IP SPEAKERRPHONE REFERENCE DESIGN 4
Controls GIPSlnon-GIPS)
I I
I Packetiratlon Gelwrallm
i Controls (C.P-TG)
I Packrt TDM
data
Gam Out - (GR)
Recewe Path
cross Connects . Gam In -
Voic* Procrssinp Engine : Softwarr Archit*cturr
Intelligent Signal
Classifier
Figure 1 : VPM Block Diagrams
Some of the functions in the VPM that relates to the operation of an IP
phone are described in the sub-sections berow:
Echo Canceller
-- IP SPEAKERRPHONE REFERENCE DESIGN 5
Echo is a delayed, slightly altered version of a speech signal that is
reflected back to the speaker. This reflection in the line echo (LE) is due to
impedance mismatches at the hybrid circuit, the interface between the four-wire
network between central off ices and the two-wire network that connects
individual subscribers to the central office. In the acoustic echo (AE), which
occurs in speakerphone applications, the echo is generated by feeding back of
the reflections of the audio signal coming out of the speaker to the microphone.
In any telephony application, echo can degrade or even prevent effective
communication if it is not effectively cancelled.
Echo cancellation is a process that removes the echo from the sugnal that
is transmitted back to the speaker. According to ITU G.168 [6] standard, the echo
cancellers are devices or modules that use adaptive signal processing to reduce
or eliminate echoes. They cancel or reduce the echo by subtracting an estimate
of the echo from the returned echo signal. The echo cancellers commonly use
one of the adaptive algorithms such as Least Mean Square (LMS) or Normalized
Mean Square (NLMS).
PSTN specifications require echo to be cancelled in any network where
the round-trip delay from source to destination and back again is longer than 50
ms. Since VolP networks almost always introduce more delay than this in
packetization and transmission, most VolP gateways that connect to the PSTN
network will need to provide echo cancellation.
IP SPEAKERRPHONE REFERENCE DESIGN 6
Echo cancellers are typically placed as close as possible to the hybrid that
causes the echo; see Figure 2. The echo canceller stores samples of tlhe
incoming (received) speech signal, then uses the samples to calculate i3n
estimate of the echo signal that will be reflected back to the far-end speaker.
This estimated echo is subtracted from the local signal that is transmitted to the
far end. In this way, normal speech from the near-end speaker is transmitted to
the far end, but the echo of the far-end speaker's voice is removed.
Echo Tail Length
The tail circuit shown in Figure 2 is all the equipment between thle voice
gateway and the telephone: all the switches, multiplexers, cabling, and so on. To
effectively cancel the echo produced by the hybrid, the echo canceller rnust be
able to process samples stored over a period of time at least as long as the
round-trip delay through this tail circuit. This processing period is called the echo
canceller tail length, and is an important parameter in the performance of the
echo canceller. If the echo tail length is too short (too few samples are stored),
then the echo canceller cannot adequately remove the entire echo from1 the
received signal. However, setting the echo tail length too high wastes memory
and processor cycles and can degrade the overall performance of the processor.
The echo tail length should be set to the round-trip delay through the tail
circuit, plus 4 to 6 ms. In residential and SOH0 VolP gateway applications, the
echo tail length is typically set to 8 ms or 16 ms. In speakerphone applications,
the echo tail length is typically set to 64 ms.
IP SPEAKERRPHONE REFERENCE DESIGN 7
. T ~ I aruit (~cho P ~ V I ) ~
Figure 2: Echo Cancellation
The number of the adaptive filter taps used in an echo canceller algorithm
is tail-length dependent:
1 6 ms tail-length, 128 taps
32 ms tail-length, 256 taps
64 ms tail-length, 512 taps
Non-Linear Processing and Comfort Noise Generation
Echo cancellers do not perfectly remove the entire echo from the
transmitted signal; some residual echo remains after echo cancellation. To
maximize echo cancellation, ITU-T G.168 [6] specifies non-linear processing
(NLP) to completely suppress the signal sent to the far end when the near-end
speaker is silent. When NLP is enabled, the echo canceller classifies short
segments of the near-end speech signal as either voice or background noise.
When it determines that the signal is background noise, the echo canceller turns
IP SPEAKERRPHONE REFERENCE DESIGN 8
on the non-linear processor to eliminate the residual echo that would olherwise
be reflected back to the far-end speaker.
Muting the residual echo also mutes the background noise from the near
end. As a result, the far-end speaker will hear the background noise pulsing on
and off as the non-linear processor activates and deactivates. These transitions
degrade the perceived quality of the call, so echo cancellers use comfort noise
generation (CNG) in conjunction with NLP. When the non-linear processor is
active, a comfort noise generator replaces the muted residual echo with a
synthesized noise signal of the same level and similar spectral content as the
background noise.
The codecs also use voice activity detection and comfort noise generation
to suppress packets when a speaker is silent. These codec-based functions are
independent of the echo canceller functions described here, and there are subtle
differences in operation:
Echo-canceller NLP and CNG: When the near-end speaker is
silent, the near-end echo canceller supresses the transmitted echo
signal, and replaces the background noise (which is also
suppressed) with comfort noise. The near-end codec sends
packets filled with comfort noise to the IP network.
Codec-based VAD and CNG: When the near end speaker is silent,
the near-end codec sends no packets at all to the IP network. The
far-end codec generates comfort noise toward the TDM interface to
replace the packets.
IP SPEAKERRPHONE REFERENCE DESIGN 9
Double-Talk Detection
Most echo cancellers continually adapt their internal digital filters to the
conditions present in the tail circuit to provide an accurate estimate of the echo.
They continually compare the estimated echo signal with the echo that is actually
reflected from the near end of the connection, and refine their internal settings
accordingly.
However, in order for the echo canceller to converge on internal settings
that produce an accurate echo estimate, the near-end speaker must be silent
while the far-end speaker is talking. If the near-end speaker is talking at the
same time, a situation known as double-talk, the near-end speech disrupts the
adaptation process. The estimated echo diverges and the echo canceller
performs poorly, which degrades the perceived quality of the call.
To prevent problems, echo cancellers use a double-talk detector to
indicate when both parties are speaking at the same time. When this occurs, the
echo canceller effectively suspends the adaptive algorithm and "freezes;" its
internal settings so that it cannot diverge during the double-talk condition. When
the double-talk condition goes away, the adaptive algorithm starts up again to
keep the echo canceller converged to an accurate estimate of the echo, and to
respond to any changes in the tail circuit.
Echo Canceller Tone Disabling
Echo canceller tone disabling refers to special tones that are used to
automatically disable the echo canceller for fax or modem transmissions. Echo
IP SPEAKERRPHONE REFERENCE DESIGN 10
cancellation can interfere with fax and modem transmissions, and in many cases
must be disabled when a connection is carrying fax or modem data. To
accomplish this automatically, fax machines and modems transmit special tones
at the beginning of their transmission. When the echo canceller detects these
tones, it automatically disables itself so that the fax or data transmission can
proceed unhindered.
Table 1 shows the three types of tones used to automatically disable the
echo canceller.
Table 1 : Echo Canceller Disabling Tones
I Tone I Description I 2100 Hz without The echo canceller is automatically disabled when it detects a phase reversal 2100-Hz tone transmitted for 3 s I 2100 Hz reversal
with phase
2100 Hz with or without phase reversal
The echo canceller is automatically disabled when it detects a 21 00-Hz tone transmitted for 4 s, with a 180" phase reversal every 450 ms - The echo canceller is disabled when it detects a 2100-Hz: -4 tone with phase reversals. I The nonlinear processor is disabled when it detects a 21 00-Hz tone without phase reversals.
Calibration
Calibration stands for calibrating or normalizing the analog front and signal
gain stages. A signal represented by a digital sequence of numbers can mean
saturation at the analog stage of the signal path, or it can represent a very analog
small signal with insufficient SNR.
The calibration function is a digital signal generator on a TDM output path.
When it is turned on, a digital sequence defining a 1 KHz sine wave signal at 0
IP SPEAKERRPHONE REFERENCE DESIGN 11
dBm nominal level, as specified in the ITU-T recommendation G.711, is sent out
on the TDM output port. Calibration is performed in both input and output paths
of a TDM port.
Calibrations on the TDM output path: The designer can turn on the
calibration mode, with the appropriate loading in the circuit set up on pclrt-0
output, measurements and adjustments can then be made to ensure that the
analog signal ended up at a known and desired level.
Calibrations on the TDM input path: The designer must first completed the
calibration on the output path to know the gain at the output. Then, the calibration
mode is then turned off so no calibration is sent out from the processor. The TDM
loop-back mode is set up such that the digital signal input from TDM port 0 is
transferred directly back to the output. With a known analog signal injected at the
input, the measurements and adjustments can then be made along the input path
to ensure that the analog signal ended up at a known and desired level at the
measurement point.
Voice Activity Detection and Comfort Noise
On average, up to 50% of human speech may be periods of silence. If an
application transmits packets continuously during a call, even when a speaker is
not talking, it uses up a lot of bandwidth sending packets that do not contain any
speech information. Suppressing packet transmission while a caller is not
speaking can therefore realize significant improvements in bandwidth efficiency.
IP SPEAKERRPHONE REFERENCE DESIGN 12
Codec-based voice activity detection (VAD) allows the gateway to
suppress packets when the near-end speaker is silent. It classifies short
segments of the voice signal as either speech or background noise, based on the
level and spectral content of the signal. When VAD indicates backgrouind noise,
the application does not send any packets, .so it does not waste bandwidth
transmitting packets that do not contain any useful information.
However, suppressing the packets means that background noise is not
being transmitted. This results in silence on the line, which can cause a listener
at the far end to believe that the line has gone dead. Since this can be
disconcerting and degrades the perceived call quality, the gateway at the
listener's end generates comfort noise whenever it is not receiving packets from
the IP network. Comfort noise is a synthesized noise signal of the same level
and similar spectral content as background noise. To the listener, comfort noise
generation (CNG) results in no noticeable transition between speech and silence
at the speaker's end.
Codecs
Codecs compress a digitized voice signal into a lower-bandwidth format
that can be transported across the IP network. The output of a codec is a data
stream that is placed into packets and transported across the IP network. At the
receiving end, a codec performs the reverse process to decompress the data and
extract the digitized voice signal.
IP SPEAKERRPHONE REFERENCE DESIGN 13
Table 2 summarizes the codec standards that are supported by Ihe VPM
firmware. Codecs with a lower output bit rate (output bandwidth) typically require
more time and processing power to convert the analog voice signal into a digital
signal, which adds to the latency in VolP communications, and generally produce
a lower-quality speech signal after reconverting the digital signal.
Table 2: Codec Standards Supported by the VPM Firmware
Codec Standard Encoding
I G . 7 2 9 ~ ~ ~ I CS-ACELP 1 8 kbitfs 1 10ms
G.711 p-Law and A-L~W'S*
~ . 7 2 6 ~
1 G.723-53 and G.723-63 Multirate 5.3 kbitfs or 6.3 I CELP I kbitfs
Output Bandwidth
Note:
1. G.711 p-law encoding is used in North America and Japan; A-law is used in
Minimurn Supported Frame Size
PCM - ADPCM
Europe and the rest of the world.
2. G.711 is the only standard that can tre used with T.38 FAX Relay.
64 kbitfs
32 kbitfs
3. G.726 codecs are supported on the MSP4200 device only.
10 ms
10 ms
4. G.729NB uses a relatively low-complexity conversion algorithm and includes
voice-activity detection and comfort-noise generation.
TDM Companding
The VPM firmware supports both p-law and A-law companding for
compressing and decompressing voice traffic on the TDM interface. The two are
very similar; both are logarithmic compandirig schemes defined by ITU-T G.711
IP SPEAKERRPHONE REFERENCE DESIGN 14
that compress 16-bit linear data into eight-bit logarithmic data. Logarithmic
companding breaks the amplitude of a voice signal into 16 segments and
encodes each segment as an eight-bit value. The four most significant bits
identify the segment, and the four least significant bits quantize the value of the
amplitude within the segment.
Each segment is twice the size of the segment below it. As a result the
lower amplitudes, which contain most of the information in speech, are split into
smaller segments (i.e. have higher bit resolution) than higher amplitudes, but the
dynamic range is wide enough to encode high-amplitude signals. Logarithmic
companding provides 2:1 bit compression without requiring too much processing
power to decode.
The differences between the two schemes are in the actual coding levels
and in bit inversion. p-law encoding is used in North America and Japan for voice
traffic; a-law is used in Europe and the rest of the world.
DTMF Digits
Dual-Tone Multi-Frequency (DTMF) digits use a set of four high-frequency
tones and a set of four low-frequency tones to uniquely identify each of the 16
digits on a telephone keypad. Each keypad digit is represented by two tones,
one from each set. When a telephone user presses the digit on the keypad, the
telephone generates a sinusoidal signal comprising the high-frequency tone and
the low-frequency tone that represent that digit. Table 3 shows the combinations
of high- and low-frequency tones for each keypad digit.
IP SPEAKERRPHONE REFERENCE DESIGN 15
Table 3: High and Low Frequency Tone Combinations for Keypad Digits
High-Frequency Tones
1209 Hz 1336 Hz I 1477 Hz I 1633 Hz
A VolP application needs to collect DTMF digits that are dialed by a
telephone user and convert the dialed number to an IP address for the call. In
the opposite direction, the VolP application must be able to generate DTMF
tones toward the TDM interface to control an end system, such as an Interactive
Voice Response (IVR), voice mail, or calling-card system.
Certain applications, such as Interactive Voice Response systern~s and
calling card systems, may need to identify DTMF digits that persist for more than
two seconds. Such events are referred to as "long" DTMF digits.
Generating (Playing) DTMF Digits
An application may need to generate DTMF tones toward the TDM
interface to control an end system, such as an lnteractive Voice Resporlse (IVR),
voice mail, or calling-card system. The tones must be generated in such a way
that a DTMF detector in the end system can correctly interpret them. This
scheme requires setting the four parameters shown in Table 4.
Table 4: Parameters for Generating DTMF Digits
I Definition I Min I Max I Units 1 Tone duration: Length of time that a digit persists 1 65' 1 NIA I ms I
I Inter-digit pause: Silent period between digits -- 1 65' I NIA I 1
IP SPEAKERRPHONE REFERENCE DESIGN 16
Power level of the digit's low-frequency tone -- -1 2 1+12 1dB I
Definition -
Power level of the digit's high-frequency tone
Note:
1. As specified in ITU-T (2.23 ([7])
Detecting and Collecting DTMF Digits
When a user presses digits on a telephone keypad, the application
software must detect the event and collect the digits for further processing. The
VPM firmware handles DTMF digits as unsolicited events. Each DTMF digit is
processed as two events, one for the start of the digit and one for the end event.
If the DTMF digit persists for more than 2 seconds, a third event indicates the
end of the long DTMF digit.
Min
-1 2
Enabling DTMF Relay
DTMF relay provides an out-of-band signaling mechanism for carrying
DTMF digits across a VolP infrastructure. This is important in applicatiolns that
use a low bit-rate codec but that must send DTMF digits across the Vol13
network. If the DTMF tones are compressed by a low bit-rate codec such as
G.723 or G.729, the tones are distorted to the point that digits may be lost.
When DTMF relay is enabled, the local gateway listens for DTMF digits
during a call then sends them uncompressed as either RTP or H.425 packets to
the remote gateway, which regenerates thern. This method prevents digit loss
due to compression in low bit-rate codecs.
Max
+12
IP SPEAKERRPHONE REFERENCE DESIGN 17
Units
dB
T.38 Fax Relay
In T.38 Fax Relay, a call manager protocol such as SIP connects two
media endpoints together in a fax call. The endpoint in a fax call that transmits
the fax is referred to as the sending fax relay; the endpoint that receive:; the fax
is the receiving fax relay.
Every fax session starts out as a voice call and then switches to a fax call
when the call classifier at the sending fax relay detects that the call is a fax
transmission. The sequence of events is outlined below and shown in Figure 3.
The call manager protocols at both ends of the call connect the two
endpoints using a voice connection.
The call classifier on the receiving fax relay detects the start of a fax call'
The call managers shut down the voice call and set up a fax call using
G.711 and different TCPIIP ports than those used by the voice connection. It is
important to use different ports because the voice and fax data streams use
different packet formats and cannot be mixed. The call manager at each
endpoint negotiates the parameters for the fax call, including port numbers,
connection rate, and connection mode, based on the capabilities of the endpoint
and passes the negotiated parameters to the T.38 module.
The T.38 module receives PCM data from the fax machine and
repackages it as T.38 packets that can be sent over an IP connection. In the
1 This is the most common procedure. However, either endpoint can detect the start of a fax call.
IP SPEAKERRPHONE REFERENCE DESIGN 18
reverse direction, the T.38 module receives data packets from the IP network and
converts them to PCM data that the fax machine can understand. The fax
connection remains active until the receiving fax relay detects EOF. To avoid
data loss, both call managers wait until the receiving fax relay detects EOF
before tearing down the connection.
Both the T.38 module and the TCPIIP network stack buffer data internally,
which means the call managers need to be careful not to shut down the fax
session too early. If the call manager at either end of the connection sh~uts down
the fax session while there is still data in a buffer, the receiving fax relay will not
be able to receive the fax correctly. To avoid problems, the receiving fax relay is
responsible for telling the sending fax relay that the fax session has ended, which
it does via the call manager protocol. The sending fax relay never tells the
receiving fax relay that the session has ended, even under error conditilons.
The sending and receiving fax relays process ending events as shown in
Table 5.
IP SPEAKERRPHONE REFERENCE DESIGN 19
Table 5: Fax Session Ending Events
Event
On-Hook Event
Fax EOP Event
Network Socket Disconnect Event
Call Manager Disconnect Event
When Received by the Sending Fax Relay:
Indicates that the sending fax machine has gone on-hook after completing a fax transmission.
There may still be fax data buffered in the T.38 module and the network stack. The call manager at the sending fax relay therefore does nothing. Eventually the receiving fax relay will detect the end of session and send a Call Manager End of Call Event back to the sending fax relay.
lndicates that the sending fax relay has finished processing data and the fax session has ended.
There may still be fax data buffered in the T.38 module and the network stack. The call manager at the sending fax relay therefore does nothing. Eventually the receiving fax relay will detect the end of session and send a. Call Manager End of Call Event back, to the sending fax relay.
lndicates that the TCP has detected (a network disconnect from the receiving fax relay.
The call manager discards any T.38 packets it has buffered, because the receiving side has terminated the connection and therefore cannot process more T.38 data. However, it does not disable T.38 because some fax data may still be in the T.38 module. Eventually the receiving fax relay will detect the end of session and send a Call Manager End of Call Event back to the sending fax relay.
Note: A UDP connection has no way to detect a network disconnect from the receiving fax relay.
lndicates that the receiving fax relay wishes to terminate the call.
The call manager tears down the call because the receiving side has indicated that the fax session is over.
This is the normal way that a fax session should end.
When Received by the Receiving Fax Relay:
lndicates that the receiving fax machine has gone on-hook after completing a fax reception.
The call manager records that the on- hook event has been received and starts a long-term timer (> 10 sec) to ensure that the call is eventually torn down even in error conditions.
Otherwise the receiving side should do nothing with the on-hook event because there may be data buffered in the T.38 module and the network stack.
lndicates that the receiving fax relay has finished processing data and fax session has ended.
There may still be fax data buffered in the networks stack, so the call manager waits at least 500ms then tears down the call and sends a Call Manager End of Call event to the sending fax relay.
This is the normal way that a fax session should end.
lndicates that the TCP has detected a network disconnect from thle sending fax relay.
The call manager discards any T.38 packets it has buffered for transmission over the network, because the sending side has terminated the connection and therefore cannot process more T.38 data. However, it does not disable T.38 because some fax data may still be in the T.38 module. When the T.38 protocol times out it will generate an EOF, which the receiving fa.x relay detects as the end of session. The call manager can then disable T.38.
Note: A UDP connection ha.s no way to detect a network disconnect from the sending fax relay.
lndicates that the sending fax relay wishes to terminate the call.
There may still be fax data buffered in the T.38 module and the network stack. The call manager waits until the receiving fax relay detects the end of session.
IP SPEAKERRPHONE REFERENCE DESIGN 20
Sending Fax Relay
VPM Connection Firmware Manager
Receiving Fax Relay
Connection VPM Manager Firmware
+ Dial tone
DTMF digit t--------
Dial tone stop -_____+
DTMF digit(s)
+ Start codec
Stop codec
Enable T.38
Negotiate network voice connection
Start voice flow over network
Shut down network connection
Negotiate network fax connection
Start fax flow over network -
Start G.711
Fax Transmission in Progress
Fax EOF I -- 4- 1 Stop G.711
/ + Fax Disable
Shut down network connection after Fax EOF is detected at
receiving end
Figure 3: Typical Fax Call Sequence of Events
_____+
Ring
r
Start codec t--------
Fax detect
Stop codec
______+
Enable T.38 - Start G.711
- On-hook
Fax EOF
_____+
Stop G.711 ______+
Fax Disable
IP SPEAKERRPHONE REFERENCE DESIGN 2 1
Call Classification
All connections are initially set up as voice calls. Call classification
provides a mechanism for identifying those calls that are actually fax
transmissions so that the connection can be reconfigured as a fax call.
At the beginning of a fax transmission, the sending fax relay transmits a
21 00-Hz tone. This tone identifies the transmission as a fax and distinguishes it
from a voice or modem transmission. At the far end, the receiving fax relay
identifies this tone in the received data stream and uses it for two distinct
purposes:
To automatically disable the echo canceller and/or non-lin'ear
processor.
To initiate the process of changing the channel from a voice
connection to a fax connection.
These two purposes are independent. Echo cancellation tone disabling
does not affect the operation of the fax call classifier, and vice versa.
When the call classifier identifies the 21 00-Hz tone and classifies an
incoming call as a fax, it generates an unsolicited event to alert the VPM. The
call manager (e.g. SIP or H.323) must then do the following:
Tear down or suspend the voice channel and change the operating
mode to T.38 fax relay.
Negotiate a set of parameters for the fax session.
IP SPEAKERRPHONE REFERENCE DESIGN 22
Ringing
The VPM firmware allows an application to make a TDM port ring with any
ring cadence required. Ring cadence refers to the sequence of ringing and
silence, including the duration of each ring and each pause between rin~gs. Table
6 shows standard ring cadences for different regions of the world.
Table 6: Ring Cadences in Regions of the World
I country I Standard Ring Cadence
United States
Japan
Caller ldentification
- - - - - - - - - - -
Two seconds of ringing,, four seconds of silence. - One second of ringing, two seconds of silence (NTT reg~~lar ring)
United Kingdom
Other European countries
Caller ldentification is a feature that sends information about a caller to the
telephone being called. The type and format of the information depends on the
country. The VPM firmware should at least support the following caller IlD
formats:
0.25 seconds of ringing, 0.2 seconds of silence, 0.25 seconds of ringing, 2.3 seconds of silence (NTT non-regular ring) - 0.4 seconds of ringing, 0.2 seconds of silence, 0.4 seconds of ringing, 2 seconds of silence -- Varies from country to country
US Caller ID
Japanese Number Display
European Calling Line Identity
--
IP SPEAKERRPHONE REFERENCE DESIGN 23
Generating Caller ID Information
If caller ID information is available, it is generated (sent to the destination
phone) during a non-ringing period in the first ring. Normally, applicatiori software
should allow a delay of at least 0.5 seconds between the falling edge of the ring
envelope and the start of the caller ID transmission. Shorter delays than this may
prevent the attached telephone from reliably decoding the caller ID information.
Security Buffer Chains
Security buffer chain functions, which are included in the Security Module,
allow encryption and decryption operations on memory blocks no larger than
4088 bytes. However, it is possible to build chains of buffers that exceed 4088
bytes in total. These buffer chains allow encryption and decryption operations on
memory blocks greater than 4088 bytes and on non-contiguous memory.
The total size of a buffer chain is the sum of the sizes of each buffer in the
chain. The total size of a buffer chain must comply with the restrictions shown in
Table 7.
Table 7: Buffer Chain Size Restrictions
r
Security Operation Type
Hashing, padding disabled
Hashing, padding enabled
- Total Size of Buffer Chain I Must be a multiple of 8 I
--
Must be a multiple of 64
- No restrictions
IP SPEAKERRPHONE REFERENCE DESIGN 24
After a security operation is performed on a particular buffer chain, the
buffers are no longer associated with the chain and the chain ID becomes invalid.
1.5 Latency in VolP Networks
Latency is the delay that a voice signal experiences as it travels from a
speaker at one end of a connection to a listener at the other end of the
connection. If the latency in a network is too large, it will severely impact the
ability of users to maintain a two-way conversation. ITU-T G.114 recormmends
that the one-way delay through a network be less than 150 ms for acceptable
voice quality.
Propagation delay, the time a voice signal takes to travel across the
network, is unavoidable in any telephony application. It is compounded1 in IP
networks, however, because packets may be buffered in switches, routers, and
other network elements en route to their destination. This can be mitigated with
efficient VolP gateway and network design that, for example, prioritizes voice
packets to minimize the switching and routing delays they experience.
Delays are also incurred by the process of sampling voice data, encoding
it, and placing it in packets for transmission over the IP network. At the receiving
end, of course, the reverse operations also contribute to delay, as do processes
such as echo cancellation, noise suppression, and filtering. These delays
depend on a number of factors, including:
IP SPEAKERRPHONE REFERENCE DESIGN 25
The capabilities of the processor and the speed of the media.
These factors must be considered in the design of the IP phone,
gateway, and of the network itself.
The type of speech codec.
The size of the packets. This parameter must be controlled by the
application software.
1.6 Jitter Buffer
A major challenge in supporting interactive audio over any WAN networks,
including IP networks, is the need to provide synchronous playout of audio
packets in the face of stochastic end-to-end network delays. This support
typically achieved by delaying the received audio packets' playouts through
buffering the packets for sufficient time so that most of the packets will have been
received before their scheduled playout times. The additional artificial delay until
playout can either be fixed throughout the duration of a call or vary adaptively
during a call's lifetime. Packets that are not received before their schedded
playout time are considered lost. Depending upon the codec type with which
voice is encoded and missing packets are masked, packet loss ratio of between
1 and 10% can be tolerated in most VolP systems.
Since in IP networks, end-to-end delays may fluctuate rapidly and
significantly over small intervals of time, adaptive playout algorithms which adjust
rapidly to these changing delays can achieve a lower rate of lost packets for both
a given average playout delay and a given maximum buffer size are cornmonly
used in the VolP systems to adaptively respond to the variable delays.
IP SPEAKERRPHONE REFERENCE DESIGN 26
If both the propagation delay and the distribution of the variable
component of network delay are known, a fixed playout delay can be computed
such that no more than a given fraction of arriving packets are lost due to late
arrival. In this approach, the playout delay is fixed either for the duration of the
audio call or is recalculated at the beginning of each talkspurt. One potential
problem with this approach is that the propagation delay is not known adthough it
can be estimated and typically remains fixed throughout the duration of the call. A
more serious problem is that the end-to-end delay distribution of packets within a
talkspurt is not known and can alter over relatively short time scales.
A better approach to deal with the unknown nature of the delay distribution
is to estimate the delays and adaptively respond to their change by dynamically
adjusting the playout delay. The adaptive playout algorithms determine a playout
delay on a per talkspurt basis. Within a talkspurt, packets are played out in a
periodic manner, thus reproducing their periodic generation at the source. But,
the algorithms may alter the playout delay from one talkspurt to the next, thus the
silence periods between two consecutive talkspurts at the receive end may be
artificially expanded or compressed with respect to the original length of the
corresponding silence period at the sender. The change of silence periods by
small amount is not noticeable in the played-out speech according to many
studies.
Among the available adaptive playout algorithms, the one develclped by R.
Ramjee has been deployed and implemented on either microprocessors or DSPs
by many VolP system designers. This algorithm relates every other del,ay
IP SPEAKERRPHONE REFERENCE DESIGN 27
parameter to the delay of the first packet in talkspurt while makes no assumption
about the synchronization of the host sender and receiver clocks. A surnmary of
this algorithm is given below:
Given Ri as the receive time of packet i and Ti as the transmit time of
packet i (time stamp), the end-to-end delay for packet is computed as
and the average delay is
where U is a constant (e.g., 111 00).
Standard Deviation of delay is calculated as
Vi = (1 - U) (Vi- I + U)(abs[Di - di])
The execution (playout) time of the first packet is
Pi = Ti + di + KVi
where K is a positive constant.
The time between transmission and playout of the first packet is
Qi= Pi- Ti
Next packet's ( i+ l ) playout time is a sort of a displacement compare with
the first packet:
IP SPEAKERRPHONE REFERENCE DESIGN 28
Pi+ I = T i+ I + (2
If Ti - Ti-I > 20 ms (silence limit) Then,
the packet i should be the first packet of the talkspurt unless that packet is 10s
due to packet loss.
This algorithm can be improved to check and adjust for a spike
characterized by a sudden large increase in packet delay.
GIPS NetEQ Jitter Buffer
The MSP2020 uses the Global IP Sound NetEQ jitter buffer to overcome
the delay jitter experienced from IP networks. The GIPS NetEQ implements a
similar adaptive playout mechanism as described above with supporting 10-60
ms speech packet frames at the input to the buffer and 10 ms of playout at the
output of the buffer. This higher jitter buffer resolution reduces excessive packet
and systems delays and increase speech quality. The physical size of the buffer
is programmable, but by default it is set to 300 ms. No limitation required on
using of any codec decoders with the NetEC1.
Appendix A shows the voice quality measurements on three different
Analog Telephone Adaptor (ATA) platforms: PMC-Sierra's Mckinley and Stein
reference designs using MSP2015 or MSP2020 processors and Japanese Yahoo
BB (YBB) ISP gateway1ATA. All three platforms were undertaken the same tests
using G.711 and G.729 codecs, the same delay distribution (delay between
consecutive packets), Gaussian with different mean and standard deviations and
IP SPEAKERRPHONE REFERENCE DESIGN 29
uniform with different boundary values and under packet loss ratios of 0%, 2%,
and 3%. As seen from the plots of Perceptual Evaluation of Speech Quislity
(PESQ) versus Reference Voice Files, the Mean Opinion Score (MOS) obtained
from Mckinley and Stein, on average, is higher than that from YBB. In some tests
Mckinley, showed higher score than Stein for G.711 codec.
-- IP SPEAKERRPHONE REFERENCE DESIGN 30
2 Features
2.1 Hardware Features for the IP Phone
The MSP2020 IP Phone Reference Design provides the following
hardware features:
Handset speaker and microphone
Hands-free speaker and microphone (Speakerphone)
Line-In and Line-Out interfaces
One FXS port for FAX
One RJ45 for a WAN port
Keypad interface
Off -hook/On-hook switch
LCD module interface
12V DC power input
Four LEDs for power indication and status information
Push-button reset switch
RS323 interface
Header for spare GPlO pins for user specific applications
JTAG interface to the processor (MSP2020)
IP SPEAKERRPHONE REFERENCE DESIGN 3 1
acoustic and line echo. The sampling rate for both ADC and DAC is generally
about 8 KHz to achieve an audio bandwidth of 4 KHz for human voice.
As an option, one Foreign exchange Subscriber (FXS) channel is also
connected to MSP2020 TDM bus. This FXS channel is connected to an analog
FAX machine. Fax transmissions can be sent in clear channel or T.38 based on
bandwidth requirements.
The FXS circuit is made up of two main parts: A CODEC and a Subscriber
Line Interface Circuit (SLIC). A CODEC consists of an ADC, which converts the
analog signal from the analog fax machine into a digital signal, and DAC, which
converts digital signals to analog ones to drive the fax machine. The sampling
rate for both ADC and DAC is generally about 8 KHz. In ATA applications, the
SLIC device also emulates PSTN voltage levels, must detect if the phone is off-
hook or on-hook and generate a ringing voltage up to 120 Volts.
On the packet side, one of the three independent MSP2020 1011 00
Ethernet MAC controllers, configured in MI1 mode, is connected to a 10,1100
Mbps Ethernet WAN PHY, for the Internet connection.
MSP2020 receives digitized voice and fax data from the TDM bus,
converts it to data packets and uses a variety of internet and voice related
protocols to send to it to a local device or across the IP network to a destination.
In the packet receive direction, the MSP2020 reconverts the data packets
received from MI1 interface to digital voice or fax signal and sends it out to the
vocoder or the FXS port via the TDM bus. The vocoder converts the digital voice
- -
IP SPEAKERRPHONE REFERENCE DESIGN 33
signal to analog voice signal and then the analog signal is amplified and output
from the handset speaker or hands-free speaker. The FXS port converts the
digital fax signal to analog format and sends it out to the fax machine.
-- IP SPEAKERRPHONE REFERENCE DESIGN 34
5 Functional Description
5.1 MSP2020
The MSP2020 is a multi-service processo Ir capabl e of numero bus end use
applications. The MSP2020 includes a glueless interface to 133MHz SIDRAMs,
an ELB interface for Flash memory, three MIIIRMII Ethernet interfaces for direct
connection to external Ethernet PHY devices, a TDM interface for vocotler or
speakerphone ICs and SLICISLAC devices and several other peripheral device
interfaces not used in this design.
For this application, the MSP2020 is interfaced with the ATH3100, a
speakerphone IC from Acoustic Technology, to provide two acoustic transducer
interfaces (a base speaker and a base microphone), a handset interface
(handset speaker and microphone), acoustic echo cancellation, gain adjustment,
noise reduction, and optional DTMF and ring tones generation. These tones can
be also generated via the MSP2020 Voice Processing Module (VPM) firmware.
For the FolP option, the MSP2020 is interfaced with, the LE88221, a duel
channel SLICISLAC device from Legerity although only one channel is used for
one line fax support.
The data is transfer between the MSP2020 and both the ATH3100 and
LE88221 through the TDM (PCM) interface. The 2-Wire interface is used as a
serial microprocessor access tolfrom the ATH3100, The SPIIMPI interface is
used as the signaling and microprocessor access tolfrom the LE88221.
IP SPEAKERRPHONE REFERENCE DESIGN 36
In this application, ATH3100 is configured to supply the TDM clocks and
frame pulse to both the MSP2020 and the L.E88221. Some of the GPlCl pins of
the MSP2020 are used as control (reset and chip select) and interrupt purpose
for the ATH3100 and LE88221.
Also some of the MSP2020 GPlO pins are used to interface a standard
12-key keypad and a text base LCD module.
5.2 Memory
The MSP2020 has a dedicated SDRAM interface and is connected to a
32-bit 133MHz SDRAM device for a 128Mbits of RAM space.
The boot code will reside on a Flash memory device. The MSP2020 has a
dedicated ELB interface to the Flash memory devices and in this application is
connected to 32Mbits of Flash.
5.3 ATH3100
The ATH3100 is the next generation Full-Duplex Speakerphone SoC from
Acoustic Technologies, Inc. This device builds on a patented core full-duplex
echo cancellation, noise reduction, and sound enhancement technology with
added features and enhanced functionality (:compared with the older generation
ATH3000) for improving the audio quality and providing phone management
capabilities for digital PBX, standard PSTN telephony terminals and VolP (well-
suited for IP applications). The added features and enhanced performance,
IP SPEAKERRPHONE REFERENCE DESIGN 37
shown below, provide improved sound quality, full duplex performance and
natural communication for all speakerphone-enabled applications.
ATH3100 enhancements are
Integrated Caller-ID
Acoustic Echo Cancellation of 65dB with a 64ms adaptive filter tail
Noise Reduction up to 18dB
Network Echo Cancellation of 45dB with a 16ms adaptive filter tail
Automatic Gain Control for Microphone and Line-Input
Low Power dissipation of 65mW
Virtually Pin-for-Pin compatible with the older generation, the
ATH3000
Green Packaging Option Available
5.4 FXS Interface
The FXS interface is designed using an off-the-shelf integrated
SLIC/SLAC device. The Legerity LE88221 performs all line functions and is
programmable for global usage. In this application, the codec receives the PCM
fax stream from the MSP2020 via its TDM interface. The MSP2020 will control
the operation of the codec over its MPVSPI interface.
IP SPEAKERRPHONE REFERENCE DESIGN 38
5.5 WAN Uplink
The WAN uplink supports 10/100Mbps Ethernet and provides the logical
connection of traffic to the Internet. An off-the-shelf PHY transceiver is
connected to the MSP2020 via its MI1 interface.
5.6 Power Supply
The power will be derived from a 12V AC/DC wall adapter to the board.
The major components consuming the majority of the total board power are listed
below in Table 8.
Table 8: Maximum Current Consumption
3.3V Rail 1.8V Rail 5V Rail 1 1 1 12V Rail
Device No of devices Current (mA) current (mA) Current (mA) Current (mA)
SLIC/SLAC' 1 78 120
Audio Amplifier 1 6
ATH3100
Ethernet PHY
Note:
1. SLICISLAC Power during ringing condition is estimated.
1
1
IP SPEAKERRPHONE REFERENCE DESIGN 39
20 1 - 148
6 Circuit Design Considerations
The following sections comment on the schematic circuit design for the
MSP2020 and the rest of the IP phone's circuit design. Refer to the schematics
page number listed in the section title for the circuit connections described in that
section.
6.1 MSP2020 Circuit Design
6.1.1 Power Requirements and Supply Filtering (Page 2 of Schematics)
The MSP2020 requires two power supplies, 1.8 V core and 3.3 Lf 110. It is
important to connect all power pins to the correct power supply as damage can
occur to the device if any are left unconnected. Refer to Table 8 for per rail
requirements and power consumption specifications.
6.1.1.1 Digital Power Pin Decoupling
It is recommended that digital power de-coupling capacitors be evenly
distributed around the device. Ideally there should be a 0.1 pF high frequency
capacitor as close as possible to each cluster of power pins with a 10 pF bulk
capacitor placed close to the device.
6.1.1.2 PLL1 and PLLO Power Pin Decoupling
The internal device clocking requires stable quiet power through the PLLO
and PLLl power pins. Figure 6 illustrates the required decoupling circuit: for the
MSP2020 internal PLL.
IP SPEAKERRPHONE REFERENCE DESIGN 40
6.1.2 TDM interface to ATH3100 (Pages 2,3 and 4 of schematics)
Figure 8 shows the interface between the MSP2020 and the ATI-13100. In
this application, the ATH3100 is used as the master device that generates TDM
(PCM) clock (1.544MHz for T I and 2.048MHz for E l line rate) and 8 KHz frame
pulse to both the MSP2020 and SLICISLAC; device TDM interfaces.
The reset signal to the ATH3100 is driven from GP10-17 on the
MSP2020. The ATH3100 will come out of reset configured to pass audio in
speakerphone mode between the base acoustic interface and the PCM line
interface. Driving the reset signal from a GPlO pin will hold the ATH3100 in reset
until the MSP2020 configuration is completed.
Also, a segment of MSP2020 accessible memory can be used to store the
ATH3000 register values and must be alterable without needing to rebuild the
MSP2020 code and preferably without disturbing any other constants used by the
MSP2020. We can accomplish this in this reference design by placing al
separate 12C EEPROM on the 2-Wire serial bus. The ATH3100 SNV utility, LOcho,
will allow read and write the contents of this EEPROM without disturbing anything
else.
On the audio interface connected to the ATH3100, to avoid echo
generated by the electrical coupling from speaker to microphone at very low
signal levels, it is recommended that designer do not power the base speaker
amplifier and the microphone bias from the same supplies. There are several
options, but one is chosen in this reference design to solve the problem by
IP SPEAKERRPHONE REFERENCE DESIGN 42
6.1 .XI Timing
Timing for both the read and write to the flash is shown in Table 9 and
Table 10. A 25.00MHz ELB output clock and AMDISpansion MBM29LLf320DB-90
was used for the timing analysis.
Table 9: Flash Read Timing
head Cycle Time 1 Trc (ns) 1 Trc (ns) I MarginIRemarks
hddress to Output Delay I Tacc (ns) I
\chip Enable to Output Delay 1 Tce (ns) 1 I
butput Enable to Output Delay I Toe (ns) I
( ~ a t a Set up Time I Toe (ns) I Ts (ns) I MarginIRemarks
1 1 rn; 1 max 1 ;-I 1 max 1 4511s --
Chip Enable to Output High-Z Tdf (ns) Tdf (ns) MarginIRemarks
I I min 1 max I min 1 max I 14ns
(Output Enable to Output ~ i g h - 4 Tdf (ns) 1 Tdf (ns) ( MarginIRemarks
IP SPEAKERRPHONE REFERENCE DESIGN 45
min
30
max 14ns
Table 10: Flash Write Timing
l~arameter I Flash Specification I MSP2020 Specification I MarqinIFlemarks
I I TWC (ns) I Twc (ns) 1 MarginiRemarks
b r i te Cycle Time I rnin I rnax I min I rnax I
I I min 1 max I lnin I max I Address Setup
/Address Hold I Tah (ns) I Tah (ns) I MarginiFlemarks
l ~ a t a Setup I Tds (ns) I Tds (ns) I MarginiFlemarks
90
I I min I max I min I max I
185
I I min I max I min I max I
Tas (ns)
Data hold
Read Recover time Before Write Tghwl (ns) Tghwl (ns) MarginiFiemarks
I I min I max I rnin I max I
- Tas (ns)
45
Tdh (ns) Tdh (ns)
9511s
MarginiRemarks
60ns
MarginiFlemarks
Write Pulse width kgh 1 3 5 1 - 1 1 1 5 1 - 8011s
rite Pulse width Twph (ns) Twph (ns) MarginiRemarks
I I min rnax 1 min -
I rnin I rnax I rnin I rnax I
0
rnax
I rnin ( rnax I rnin I rnax I
1 lOns
2E Pulse width -ligh Tch (ns) MarginiRemarks
TWP (ns)
I rnin I rnax I rnin I rnax I
-- MarginiRemarks
I rnin I rnax I rnin I rnax I 2E Setup time
I rnin I rnax I rni~n I rnax I
25
Tcs (ns) Tcs (ns)
2E hold time
IP SPEAKERRPHONE REFERENCE DESIGN 46
15ris
MarginiRemarks
Tch (ns) Tch (ns) - MarginiRemarks
6.1.4.1 Timing
Timing for both the read and write to the flash is shown in Table 11. A
133MHz DRAM clock and Micron MT48LCaM32B2-7 was used for the timing
analysis.
Table 11 : SDRAM Timing
I SDRAM Specification I MSP2020 Specification
SDR-CK-OUT Clock Ipeiiod , ( 1
tYP
- -
bvcle I Dclk (ns) I Dclk (ns)
MarginlRt marks
MarginIRe marks
MarginIRe marks
I j min I typ K i m i T max ,
Trclk (ns) Trclk (ns)
0
SDR-CK-OUT High Period
SDR-CK-OUT Low Period
IFall Time I Trclk (ns) I Trclk (ns) MarginIRe I marks
min min tY P max 2.75 4.1 3 0.63ns
IP SPEAKERRPHONE REFERENCE DESIGN 48
Thclk (ns)
min
2.75
MarginIRe marks --
0.63ns -
MarginIHe marks
tYP fxaZ -
Thclk (ns)
Tlclk (ns)
3.38 min
Tlclk (ns)
ty P max
4.1 3
Parameter SDR AM Specification -- MSP2020 Specification
Processor Read 1 Data Setup time 1 T;rrtl Ri T;g) 1 T;irrt [:in; TzE) 1 M:%:e 5.5 - 0.3 1.7ns
l~rocessor Write 1 1
6.1.5 MI1 (101100 Ethernet) Interface (Pages 2 and 7 of schemalics)
From the three independent MI1 interfaces of the MSP2020, MACA
(MII-A) interface is used for WAN connectivity; MACB (MII-6) and MACC
(MII-C) interfaces are not used for this reference design. In this reference
design, the MSP2020 is connected to either the IC+ IP101A or Realtek RTL8201
PHY devices. The MSP2020 ELB-CLKO and GPIO-0 are used as PHI'-CLK
and device reset signal, respectively, for the IP101 A. The timing analysis below is
specific to these two devices but in general they can be applied to any 11 011 00 MI1
PHY. Figure 11 shows how the MSP2020 is connected to the IP1OIA in the MI1
mode of operation.
IP SPEAKERRPHONE REFERENCE DESIGN 49
I Ma;t;rdfe- I Larameter I PHY S~ecification I MSP2020 S~ecification
Mll Interface DatdCntrl Hold I time Thtxclk (wrt Rising
rnin 0.5
Receive Timing
Processor Write
Management Interface
klargin1Rema rks
16ns MarginIRema
rks
13ns
DatdCntrl Setup time Tdrxclk (wrt Falling Tsrxclk (wrt Rising Edge) (ns)
min min 1 7
NlarginIRema rks
600ns > 100ns ---
ransmit I Receive MDC (ns) MDC (ns)
Data Setup time
-
l~rocessor Read I 1 1 1
tYPm=
min 280
Tds (wrt Rising Edge) (ns) m in tYPma 10 - -
Data Hold time
Clock to Output Tco (wrt Rising Edge) (ns)
tYPmax 600
Tdh (wrt Rising Edge) (ns) m in 10
MarginIRema
802.3 Specs
DataICntrl Hold time Thrxclk (wrt Rising Thrxclk (wrt Rising Edge) (ns)
min min 7
pata Setup time 1 o
Tds (wrt Rising Edge) (ns) ]e:":EEE 802.3 Specs
(0 - 1 Ons)
-
(10 - 300ns) MarginIRema
rats old time 1 1 Tdh (;!Rising - 1 -rks y ma Meets IEEE
802.3 Specs (0 - 1 Ons)
tYPm= -
--
IP SPEAKERRPHONE REFERENCE DESIGN 5 1
6.1.9 JTAG Interface (Page 1 of schematics)
The MSP2020 supports the IEEE Boundary Scan Specification as
described in the IEEE 1149.1 standard. Refer to PMC-2021518 JTAG Test
Features Description application note for a description of the test features
included on the MSP2020. The JTAG interface is also used to load boot code
and burn it to the Flash.
The JTAG interface of the MSP4200 in this reference design is connected
to a 2x7 header so that if the JTAG interface is not used, the JTAG interface data
and clock signals can be pulled low or high and the JTAG reset signal ('TRST-N)
can be connected to the master reset signal (RESET--N).
6.1.1 0 GPlO Allocation (Page 1 of schematics)
Table 13 shows the lists of GPlO allocation for this reference Design.
Table 13: GPlO Allocation for IP Phone Deslgn with FAX Support
GPlO MSP Function IP Phone Available Function
GPIONOT MSP IP Phone Available Function Function
1 I TIMER-B I Flash button (I) 0 I TIMER-A I LHY--RESET-
15 I UARTI-SOUT I LCD Soft key 2 1 4 I TDM-RXD I
8
GPlO 1 ELB-CS7-N 1 ;;lurne HIILO 1 1; ;; 1 TDM-TXD 1 ELB-CS6-N Volume H IILO SMPI-SDO FXS Signaling
PCI-AD-8
s - b i i - : ~ i g n a ~ n g GPIO
30 1 ELB-CS4-N I Mute LED h 1 - - 1 SMPI-SCLK I FXS Signaling
Speaker (ONIOFF) (I)
IP SPEAKERRPHONE REFERENCE DESIGN 54
2
-
TDM-RXCLK
GPlO I MSP Function IP Phone Available I Function
GPlO NOT Available
MSP IP Phone Function Function
31 I ELB-CS3-N I Hold (I) 11 2 - - 32 I ELB-CS2-N I Mute Button 1 13
HDR Spare I GPlO 1 -I7
HDR Spare I GPlO
Key Pad 2
44 I MII-C- I LED FAN ind. 1 35
42
43
Key Pad Exten.
GPlO
MI I-C-
MI I-C-
45
46
HDR Spare I GPlO I HDR Spare 1 GPlO
Key Pad 4
Key Pad 3
MI LC-
MI I-C-
INTI-N 1 PHY-..INT
24 - ;! 5
MI I-C- PWRAMP-SHU
OFF HOOK (1)
LCD-DATA
6.1.1 1 Keypad Interface (Page 1 of schematics)
For this IP Phone reference design, a standard 3x4 keypad (e.g., -1 02
model from Grayhill) that needs 7 input signals is used. 7 GPlO pins of the
MSP2020 are allocated for this keypad interface. Refer to Table 13 for the
mapping of the GPlO pins used as the keypad input signals. The keypad can be
36 - 54
IP SPEAKERRPHONE REFERENCE DESIGN 55
The reason the shift register is preferred to a dedicated serial-in/'parallel-
out register is that data is latched on the rising edge of the clock, thus the timings
and clock polarity is not an issue here. Before data can be written to, the shift
register is cleared by loading every latch with zeros. Next to provide the "EM
gate, a high voltage (logic "1") is written followed by the "WS" bit and the four
data bits. Once the register is loaded in correctly, the LCD-Data signal is pulsed
to Strobe the "E" bit.
A voltage divider circuit is used to input a constant voltage to the LCD
controller's Contrast pin. The other alternative would be to use a potentiometer
wired as a voltage divider to provide an adjustable contrast function.
A 16-pin header is also provided to connect the LCD module directly or via
ribbon cable to the PCB.
6.1.13 Unused Pins
Terminate unused pins according to the instructions below:
If the JTAG port is not used, connect TRSTB to RSTB. The
boundary scan state machine must be reset prior to normal device
operation to prevent some or all device I10 pins being held in test
mode.
Depending on the interfaces used. The unused interface':;
active-high input pins should be grounded, and active-low input
pins should be tied high. Use pull-up and pull-down resistors when
it is possible. This increases the feasibility of future modification.
IP SPEAKERRPHONE REFERENCE DESIGN 57
Unused output pins can be left floating. The Unused GPIO pins are
connected to a header in this reference design so that they can be
used for additional applications.
6.1.1 4 Power Supply Circuit Design (Page 8 of schematics)
The reference design board contains devices that require a three different
voltage levels. They are 12V, 5V, 3.3V and 1.8V. The input voltage to the board
is 12V supplied via an ACIDC adapter frorn a standard 100-240 VAC, 50-60Hz
wall outlet. The 12V is then converted to 5 'V and 3.3V through two dedicated
regulators. The 3.3V supply is then also converted to 1.8V for the core power to
the MSP2020.
A mater hardware reset circuitry with a push-bottom switch is also
provided in the power supply design schernatics page.
6.1.1 5 Thermal Management
The MSP2020 was designed to operate over a wide temperature range
when used with a heat sink. Refer to the device compact model, located in the
Package and Thermal section of [8], to determine if a heat sink is required for
your system.
6.1.1 6 Simulation Models
The MSP2020 IBIS simulation model is available for download from the
http://www.pmc-sierra.com website. The IBIS model can be used to simulate all
I10 except for high-speed 110, differential I/C), and other interfaces that cannot be
accurately modeled with IBIS.
IP SPEAKERRPHONE REFERENCE DESIGN 58
7 Layout Design Considerations
7.1 MSP2020 Layout
7.1.1 Placement
This section describes some guidelines that can assist in the placement
and orientation of the MSP2020 and other external components in PCB designs.
It is important to know where the various signal groups are located on the
MSP2020 prior to placement and orientation of the chip. Figure 15 shovvs a top
view of the MSP2020 and the approximate physical locations of the signal
groups. (Note that although the center region of the chip shows power and
ground connections, these connections are also mixed in amongst the other
signal regions as well.)
The placement and orientation of the MSP2020 and other external
components should be done while paying attention to the guidelines presented in
the rest of section 7.1.
--A
IP SPEAKERRPHONE REFERENCE DESIGN 59
Match trace lengths of address, data and control lines to within
0.05" to minimize skew from signal routing. Skew decreases the
timing margins resulting in incorrect memory accesses.
Minimize vias and via stub length on the interface.
For optimal signal integrity, sirnulations should be performled.
7.1.4 MI1 (101100 Fast Ethernet) Interface Layout
The following guidelines should be for routing the Ethernet interface
signals:
To keep the skew value in the timing analysis performed in section
6.1.5 as low as possible, trace length matching was performed to
keep the signals in the same transmit or receive direction of the MI1
interfaces to +/- 0.01 0".
Ethernet PHY layout should follow guidelines provided in the PHY
device Layout Guide. This document should describe requirements
between the PHY and the RJ-45 connector and to the magnetic
circuits. Some Ethernet PHY vendors recommend using separate
analog power and ground planes for each PHY on the board.
7.1.5 TDM Interface Layout
The following guidelines should be used for routing the TDM interface
signals:
Trace length matching should be performed to keep the signals in
the same transmit or receive direction of the TDM interfaces to +/-
0.01 0".
Signal integrity simulations on the clock line should be preformed to
determine if termination resistors are necessary.
IP SPEAKERRPHONE REFERENCE DESIGN 63
Use series termination on for each TDM clock and frame lpulse to
overcome any glitch in the clock and frame pulse signals.
7.1.6 FXS Interface Layout
FXS requires Isolation around the connector and TIP/RING tracc !S (' 1.e. no
planes around connector and TIPIRING traces).
7.1.7 PLL Filter Layout
PLLO-AVDH and PLL1-AVDH RC filter circuits should be placed as close
as possible to the device pin to reduce noise picked up on the path to the device
pin.
7.1.8 Audio Interface Layout
Run a dedicated trace (fairly wide, approx 0.020") from the LM4950 audio
amplifier's ground pin to a point as close a possible to where the +12V :;upply
enters the board. The critical concern here is to keep the fairly substant~~al current
created through the speaker from creating even tiny voltages on the analog
ground especially near the return point for the base microphone.
IP SPEAKERRPHONE REFERENCE DESIGN 64
8 Conclusion
VolP products such as ATAs, IP phor~es, wired and wireless routers and
gateways have gained significant popularity for last couple of years in the world.
While in Korea, Japan, Taiwan and some countries in Europe full deployment of
VolP networks has been started, in North America, China, and India this
technology is rapidly gaining market share and challenging the Internet Service
Providers (ISP) and traditional PSTN networks to reduce their prices, particularly,
for long distance phone calls.
The IP Speakerphone Reference Design will assist engineers in designing
low cost IP speakerphone boards using PMC-Sierra's PMC-Sierra MSP2020
multi-service processor and bring their designs to market more quickly. This
reference design supports most of major telephony applications in hardware
along with an optional T.38 FAX interface.
In this project, the board layout considerations such as PCB signaling
layers, the analogue and digital trace lengths, widths and impedance
were not deeply taken into account in details. However, like any other
high-speed board designs, specific requirements must be me? by the
physical layout designers based on the implemented technology.
Depending on the type and dimension of the enclosure selected to
contain the PCB, a software tuning is needed to configure the AEC
module of the ATH3100 for an optimal performance.
IP SPEAKERRPHONE REFERENCE DESIGN 65
Also, this project did not include the software development phase;
however, for future need, the software for this Reference Design
should include the following modules:
Driver modules to configure and control the operation of MSP2020,
ATH3100, SLICISLAC device, Ethernet PHY device and LCD
controller.
Voice Processing firmware module running on MIPS core of the
MSP2020.
Application software for basic telephony functions such as on-hookloff-
hook, ring generation, busy tone, mute, speed call, hold, forward, etc.
A real time operating system with reliable multi-tasking and m~ulti-
scheduling capabilities to control and harmonize the devices'
operations and to satisfy the real time processing specifications.
SIP application or other signaling protocol to activate and coordinate
the various components to complete a call.
After the prototype board is built and the software is available, the
designer should debug the board by performing a feature test plan to test the
main functionality of the design.
Although currently, there are various versions of IP phones available in
market, design of the ones with high voice quality, with reduced power
consumption (for wireless applications), and with full features and less echo is
still one of the real challenges in today's VoIP world.
IP SPEAKERRPHONE REFERENCE DESIGN 66
9 Disclaimer
This document is a paper reference design, and as such, has not been
built or tested as of this date.
Because the schematics and BOM are a part of PMC-Sierra's Intellectual
properties, they are not included in this report. The schematics and BOlM can be
downloaded from PMC-Sierra's web site (w_ww.pmc-sierra.com) with an
advanced permission from PMC-Sierra, Inc.
IP SPEAKERRPHONE REFERENCE DESIGN 67
Acronym List ADC
AEC
ATA
CNG
DAC
ELB
FolP
FXO
FXS
IC
I P
LAN
LCD
LEC
LMS
MI1
MOS
NLMS
PCB
PCM
PESQ
PLL
RTP
SIP
SLAC
SLlC
s o c
TDM
UDT
Analog to Digital Converter
Acoustic Echo Canceller
Analog Telephone Adapter
Comfort Noise Generation
Digital to Analog Converter
External Logic Bus
Fax over IP
Foreign exchange Office
Foreign exchange Subscriber
Integrated Circuit
Internet Protocol
Local Area Network
Liquid Crystal Display
Line Echo Canceller
Least Mean Square
Media Independent Interface
Mean Opinion Score
Normalized Least Mean Square
Printed Circuit Board
Pulse Code Modulation
Perceptual Evaluation of Speech Quality
Phase Lock Loop
Real-time Transport Protocol
Session Initiation Protocol
Subscriber Line Audio-processing Circuit
Subscriber Line Interface Circuit
Systems on Chip
Time Division Multiplexing
Unstructured Data Transfer
IP SPEAKERRPHONE REFERENCE DESIGN 75
UNI
VAD
Vol P
WAN
User-network Interface
Voice Activity Detection
Voice over Internet Protocol
Wide Area Network
Definitions
TDM (Time Division Multiplexing) -- A method of multiplexing by which a
transmission channel is divided into discrete time intervals
PESQ MOS - PESQ stands for Perceptual Evaluation of Speech Quality
and is an enhanced perceptual quality measurement for voice quality in
telecommunications. PESQ was specifically developed to be applicable to end-
to-end voice quality testing under real network conditions, like VolP, POTS, ISDN,
GSM etc. The PESQ MOS score as defined by the ITU recommendation P.862
ranges from 1 .O (worst) up to 4.5 (best).
--
IP SPEAKERRPHONE REFERENCE DESIGN 76
References
1. Acoustic Technologies, August 2004, "ATH3000 Data sheet", Version 1.5.
2. Acoustic Technologies, July 2004, "ATt-13000 Use & Integration Guide", Version 1.5.
3. Acoustic Technologies, June 2005, "ATH3100 Preliminary Information", Version 0.1.
4. Balaji Kumar, 1995, Broadband Commugications, , McGraw-Hill, Inc.
5. Daniel Minoli & Emma Minoli, 2002, Dedjverinq Voice over IP Networks, 2nd Edition, Wiley.
6. ITU-T Recommendation G. 168, 08/2004,, "Digital Network Echo Cancellers".
7. ITU-T Recommendation (2.23, 1111 998, "Technical Features of Push-Button Telephone Sets".
8. PMC-Sierra Inc., October 2004, PMC-2041639, "MSP2020 Multi-Sewice Processor Data sheet", lssue 1.
9. PMC-Sierra Inc., October 2004, PMC-2041704, "MSP2020 Multi-Sewice Processor Data sheet Addendum", lssue 1.
10. PMC-Sierra Inc., October 2004, PMC-2041640, "MSP2020 Multi-Sewice Processor Hardware User's Manual", Issue 1.
1 1. PMC-Sierra Inc., November 2005, PMC-2041860, "MSP20xx Errata", lssue 2.
12. PMC-Sierra Inc., October 2004, PMC-2041641, "MSP2020 Multi-Sewice Processor Product Overview", lssue 1 .
13. Legerity, "Le8822ll226I24ll246 Dual AHS VoicePort Device Data sheet"
IP SPEAKERRPHONE REFERENCE DESIGN 77