Upload
denise
View
33
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Understanding VoIP. Dr. Jonathan Rosenberg Chief Technology Strategist Skype. What is this course about?. Getting “under the hood” and understanding how VoIP works An exploration of the protocols and technologies behind VoIP - PowerPoint PPT Presentation
Citation preview
Understanding VoIP
Dr. Jonathan Rosenberg
Chief Technology Strategist
Skype
What is this course about?
Getting “under the hood” and understanding how VoIP works
An exploration of the protocols and technologies behind VoIP
Conveying an understanding of the various problems that need to be solved for VoIP to work
What this course is not about
A general introduction to telephony A detailed cookbook or deployment guide to
VoIP A product survey of VoIP and IP telephony
products In particular, Cisco or Skype products are not
discussed except in passing
Ground Rules
Ask Questions ANY TIME! I will be bored if this is a one way
conversation No question is too stupid Laughing or mocking anyones questions is
unacceptable Please ask off-the-wall or exploratory
questions – there is a lot that is not in here!
Agenda
Breaking up the problem Voice and Video coding Voice and Video Transport Quality of Service Signaling Security NAT Traversal
Non-Agenda
Programming APIs Emergency Services, Lawful Intercept Numbering, Routing, Naming (ENUM, TRIP) PSTN Interworking Billing, Provisioning, OAM Conferencing, IVR, Applications
Breaking Up the Problem
Endpoint Endpoint
IP NetworkIP Network
SignalingServers
DirectoriesDatabases
AccountingBilling
PresenceServers
MediaServers
OAM
ApplicationServer
RTP
IPIP
SIP, H.323,MGCP,H.248 SIMPLE,
XMPP
SIP
LDAP,ENUM
RADIUSDIAMETER
Voice Coding
DTMF/Tone
Generation
DTMF/ToneDetection
Hybrid EchoCanceller
LossAdmin
NonlinearProcessing
+
-
Silence Detection
SpeechEncoding
Packetizer
No Speech
Speech
Unpacker
ComfortNoise
Generation
SpeechDecoding
2-wire interface
Voice Endpoint Model
Codecs Waveform codecs:
Directly encode speech in an efficient way by exploiting temporal and/or spectral characteristics
Attempt to reproduce input signal’s waveform by minimizing error between input and coded signals
Source codecs / vocoders: Estimate and efficiently encode a parametric
representation of speech
CELP Minimizes perceptually
weighted error similar to waveform coders
Short-term predictor is LP (vocal tract) filter
Excitation is obtained from codebook and long-term pitch predictor
Closed-loop search is MIPS intensive
Codec ComparisonCodec Sampling Bitrate Latency Comments
G.711 8 Khz 64 kbps 125 us PSTN Codec
G.729 8 Khz 8 kbps 10ms CS-ACELP
G.723.1 8 Khz 5.3/6.3 kbps 37.5ms
AMR 8 Khz 4.75 – 12 kbps
25ms GSM codec
G.722.1 16 Khz 24/32kbps 40ms Polycom SIREN
AMR-WB 16 Khz 6.6-23.85 kbps
25ms GSM Wideband – encumbered
SILK 8, 12, 16, 24 Khz (SWB)
6-40kbps 25ms Skype codec
Listen at: http://www.voiceage.com/listeningroom.php
Echo Cancellation
Packet Network
Echo Path
Estimation2-4-wire
Hybrid
Non-LinearProcessor
+
-Reflection
Analog
Digital
Echo Canceller
ERLE
ERL
This echo canceller cancels‘local’ echoes from the hybrid reflection
ERL: Echo Return Loss (dB)
ERLE: Echo Return Loss Enhancement
Double-talk Convergence time
Echo Canceller Specifics The voice echo path is like an electrical circuit
If a ‘break’ (cancellation) is made anywhere in the ‘circuit’, you will eliminate the echo
The easiest place to make the break is with a canceller ‘looking into’ the local analog/digital telephony network, NOT the packet network (which has much longer and variable delays)
The echo canceller at the other end of the call eliminates the echoes that YOU hear, and vice versa
Echo canceller coverage (e.g. 32 ms) is the maximum length of echo impulse response that can be cancelled from the local analog/digital network (the packet network delay does not matter)
The non-linear processor is used to ‘clean-up’ any residual echo left over from the canceller
Voice Activity Detection
Speech Magnitude (dB)
Speech Detected Hang-Over Speech Detected Hang-Over
time
Sentence 1 Sentence 2
Typically fixedat 200 ms
Noise Floor
Signal-to-NoiseThreshold
Front-endSpeech Clipping
Front-endSpeech Clipping
Comfort Noise Generation Silence isn’t golden…it’s annoying
When speech stops…what do you play to the listener?
Simple techniques: Play white/pink noise Replay last receiver packet over and over
Fancier technique: Transmitter measures local “noise environment” Transmitter sends special “comfort noise” packet
as last packet before silence Receiver generates noise based CN packet.
MOS of 4.0 = Toll Quality
Voice Quality:Mean Opinion Scores
Source Impairment
Codec ‘X’
Channel Simulation
“Nowadays, a chicken leg isa rare dish”
1 2 3 4 5
1 2 3 4 5
Rating
Speech Quality
Distortion
5 Excellent Imperceptible
4 GoodJust perceptible but not annoying
3 FairPerceptible and slightly annoying
2 PoorAnnoying but not objectionable
1Unsatisfactory
Very annoying and objectionable
Clear Channel MOS’s
MeanOpinionScore
5
G.711(64 kbit/sPCM)
4.1
G.726(32 kbit/sADPCM)
G.723.1(6.4 kbit/sMP- MLQ)
G.729(8 kbit/sCS-ACELP)
IS-54(8 kbit/sNA DigCellular)
3.8 3.9 3.93.44
3
2
1
MOS Under Varying ConditionsG.729
Avg Speech Level (-20 dBmO) 3.85Low I nput Level (-30 dBmO) 3.542 Tandem codings 3.463 Tandem codings 2.681% Frame Erasure Rate5% Bit Error Rate 3.245% FER 3.0210% FER20% FER
Video Coding
Key Terms
Term Description
Frame An individual picture in a sequence that makes up the video
Frame Rate The number of frames per second in video. 30 is excellent (TV quality)
Resolution The number of horizontal and vertical pixels. VGA=640x480.
Interlacing A mechanism for transmitting video by splitting a frame into two fields, one field representing the odd lines, and one the even field. This is the “i” in 1080i
Progressive As opposed to interlaced, a method for transmitting video by sending each frame as a whole.
HD High Def resolutions – 720p is 1280x720 with 60fps. 1080i is 1920x1080 at 30fps
Key Concept: Macroblocks
Rectangular block inan image which isa basic unit ofcompression. Typically16x16 pixels.
Key Concept: Inter-Frame Prediction
Encode
Predict information in the current frame by looking at previous frames,possibly taking into account motion.
Key Concept: Discrete Cosine Transform (DCT)
A technique for representing amacroblock by its component frequencies. Discarding the higherfrequencies throws away the finerdetails without losing the core image.
Increasing horizontal frequenciesIncreasing vertical frequencies
Video Encoder Block Diagram
Key Codec Comparisons
Codec Timeline Applications
H.261 1990 ISDN at multiples of 64kbps
H.263 1996 Early Flash using Sorenson Spark implementation. Original RealVideo codec. Required in IMS.
H.264 –AVC
2003 Youtube, iTunes, Blu-ray; most modern video conferencing. The current primary video codec for real-time. Typical VGA 15fps bitrate = 500kbps
H.264-SVC
2007 “Layered” video that provides improved quality and resilience; ideal for multiparty video conferencing.
VP7 2005 On2 Technologies codec; Skype, successor to H263 in Flash
Voice and Video Transport: RTP
RTP: What is it? Real Time Transport Protocol RFC 3550
product of avt working group 1996 proposed standard –
RFC1889 2004 full standard
What does it do e2e transport of real time media optimized for multicast provides sequencing, timing,
framing, loss detection provides feedback on reception
quality
What does it do (cont) provides information on
group members provides data to correlate
audio and video and other media
Works with any codec need payload format for
each codec Flexible
RTP: What isn’t it? Doesn’t guarantee quality of
service doesn’t reserve network
resources doesn’t guarantee no loss or
bounded delay can work with QoS protocols
(RSVP) Doesn’t provide signaling
other protocols must be used to set up RTP (like SIP or H.323)
Not a specific protocol type Does not run directly
ontop of IP Runs ontop of UDP No fixed port number
RTP Stack
IP
UDP
RTP RTCP
Big Picture: RTP, SDP and SIP
End
User
End
User
Proxy Proxy
IP Network
SIP w/ SDP
C=IN IP4 123.1.2.3m=audio RTP/AVP 1122 0 1m=video RTP/AVP 1130 98a=rtpmap:98 h263
RTP
RTP Components: Data + Control
Data aka RTP very confusing
Usually on an even UDP port (NATs change this – later)
Provides sequencing timing framing content labeling User identification
Control = Real Time Control Protocol (RTCP)
Same address as data, but one higher port usually
Provides reception quality sender statistics participant information
(multicast) synchronization
information
Real Time Data Transport Originator breaks stream into
packets (segmentation) application layer framing
(ALF)!!! Packets sent; network may
lose, delay, reorder packets Must, at receiver:
reorder recover resegment rescynchronize clock synchronization!
RTP Source
RTP Sink
RTP
Packets
Transport System
Source Digitize Audio from mike Silence Suppression Echo cancellation Compress Audio
G.711: 64 kbps G.729: 8 kbps G.723.1: 5.3/6.3 kbps
Packetize Audio in RTP Send
Sink Receive packets Un-packetize decompress comfort noise generation reorder recover loss jitter buffer A/D conversion to
speakers
Jitter Buffer Packets delayed differently Must play them out
periodically Packets may arrive after
designated playout time -> loss
Insert extra delay to compensate
May need to adapt this amount
time
pkts
RTP Packet Header
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RTP Header Fields Version: 2 P: indicates padding (for
encryption) X: extension bit CSRC count: for mixers
(later) M: Marker Bit: indicates
framing audio codecs: first packet
in talkspurt video: last packet in frame
Payload Type: indicates encoding in RTP packet allows changes
per-packet Useful for:
adaptation DTMF codec silence codecs
SN: defines ordering of packets Timestamp: when packet was
generated SSRC: identifier CSRC: list of mixed users
RTP Timestamp
Tick units are dependent on codec For speech: 125
microseconds (standard 8 khz sampling rate)
For video: 90 KhZ For audio: 44.1 KhZ (CD
rate) Gaps in TS, but not in
SN mean silence Initial value random for
security
Video Timestamp represents
time at beginning of frame Many packets may have
same timestamp Speech
Time per packet may vary Depends on packetization:
20-100ms typical
Payload Formats Each codec needs a way to
be encapsulated in RTP RFC3550 defines
mechanisms for many common codecs G.711, G.729, G.723.1,
G.722, etc. Some simple video
More complex codecs have their own payload format documents MPEG H.263 and H.261
Payload format defines How to break frame into
packets extra fields needed below
main RTP header
Advanced Topics
DTMF and Tones RFC 2833 Special codecs for
encoding touch tones (DTMF) and other signals
Can send either the waveform (frequency, amplitude)
Or the actual signal (#, 8, 0)
Compressed RTP RFC 2508 For dialup links Don’t send header, just
send index Far side uses index to
retrieve header, and then increments certain fields
Quality of Service
Quality of Service
The problem we are trying to solve is to give “better” service to some at the expense of
giving worse service to to others — QoS fantasies to the contrary, it’s a zero sum
game
- Van Jacobson
Quality of Service So, what’s the problem?
Usability of Voice Circuit as a Function of End-to-End Delay
Time (msec)
Uti
lity
0.0
0.5
1.0
0
100
200
300
400
500
600
700
800
TollQuality
Early I-Phone TechnologyyImproving I-Phone
means:
• Lower PC Delay
• Lower Network Latency
• Tighten Network Jitter
SatelliteZone
CBZone
Fax Relay, Broadcast
Private NetworkVoFR & VoIPTechnology
Delay Budget Device sample capture Encode delay (algorithmic delay + processing delay) Packetization/framing Move to output queue/queueing delay Access (up) link transmission Backbone network transmission Access (down) link transmission Input queue to application Jitter buffer Decode processing delay Device playout delay
“The Network”
Some Techniques to Improve “Network QoS”
RED — Random Early Drop (or “Detect”) WFQ — Weighed Fair Queuing Intserv/RSVP — ReSerVation Protocol IP Precedence DiffServ CRTP — Compressed Realtime
Transport Protocol MCML — Multi-Class Multi-Link PPP
Random Early Detect (RED)this is Basic Hygiene!
Objectives Keep average queue size
low – good for voice Fairness – bigger streams
punished more Avoid synchronization
Only works with loss responsive transport protocols
Algorithm – probabilistic dropping of packets Queue Size
Drop P
robability
1
Min Max
Poll: Will RED Help Voice?
Yes No
• Voice not loss responsive• Mixing voice and data in same queue bad• Voice queues usually not congested
Weighted Fair Queueing
Each flow “sees” a dedicated amount of bandwidth Bj
A packet arriving at time t is transmitted at time t+size/Bj
B1
B3
B2
B
B = B1 + B2 + B3
Whats the Problem??
WFQ is unrealizable because Variable packet sizes Causality
Example: Link speed 100Kbps Flow 1: 10Kbps Flow 2: 90Kbps
1500
100
1500 100
8.8msTheory
128msActual
Approximations of WFQ
Many PhDs written with approximate and implementable algorithms
Algorithms differ in their delay bound How much worse than
perfect WFQ is this? Delay bounds a function of
bandwidth, number of queues, other params
Algorithms
SCFQ: Self-Clocked Fair QueueingWF2Q: Worst-Case Fair Weighted Fair QueueingFBFQ: Frame-Based Fair QueueingPGPS: DRR:
WFQ Voice Configuration
How to pick allocated bandwidth? Consider G.711, 30ms framing (74.6Kbps)
If Bi = 74.6kbps, delay is at least 30ms If Bi = 149.2Kbps, delay at least 15ms
Must set voice queue bandwidth at least 2x actual voice usage to keep delays down!
Unused bandwidth will go to data Need an accurate WFQ Implementation
Priority Queueing
Emulates the familiar “elite airport line” experience
Voice and data packets in separate queues
If there is any packets in voice queue, they are serviced
Voice Data
Server
Priority Queueing Considerations Easy to configure – no bandwidth values
required Main problem – data starvation Need to police voice queue Doesn’t work as well when there is other non-
voice high priority traffic (video) Head-of-Line Blocking from data queue
Intserv: Integrated Services Guaranteed Service (RFC 2212)
Mathematically provable bounds on end-to-end datagram queuing delay/bandwidth
Controlled Load Service (RFC 2211) Approximate QoS from an unloaded network for
delay/bandwidth Describe traffic with a “TSPEC”
r= token bucket rateb= token bucket depthp= peak transmission ratem= minimum (policed) packet sizeM= maximum packet size
Describe endpoints with a « FlowSpec » Source/Destination IP addresses, ports, protocol
RSPEC/FSPEC provides the policy to the queuing/scheduling algorithms
RSVP Design
Signaling distinct from routing (modularity, deployability, evolvability)
Soft state (robustness, simplicity) Transparent operation across non-RSVP routers
(deployability) Support shared and distinct reservations Applies to unicast & multicast applications Simplex & receiver-oriented.
RSVP protocol
PATH : Source Destination Traffic parameters of source Collects info on network capabilities Detects current route
RESV: Source Destination Receiver selected Int-Serv service Traffic parameters of receiver selected reservation Follows route detected by PATH Reservation actually nailed in network
RSVP messages carried over IP Can also be carried over UDP but few people do that
pathSrc Dest.resv
RSVP: Admission Control
Route Selection
Interface 1
Interface N
RoutingProtocol
Routing Database
Packets InPackets Out
Packets Out
AdmissionControl
Resource UtilizationDatabase
Switching
Routing
Queuing Policy Database
Flow Request
ReservationProtocol
Packet Scheduler
Packet Scheduler
Intserv/RSVP Acceptance
Time
Enthusiasm
TodayISP
Intserv/RSVP will solvethe world’s QoS
Cool thing to say:“RSVP does not scale”
vBNS RSVP over ATM transparently transport RSVP
Realvalue
TodayEnterprise
RSVP for VoIP in Enterprise
IP Precedence & Diffserv “Poor man’s” approach to QoS Set IP Precedence/DSCP higher on voice packets
This puts them in a different queue, resulting in isolation from best effort traffic
Can be done by endpoint, proxy, or in routers through heuristics
Scales better than RSVP – Keeps QoS control “local” Pushes work to the edges and boundaries Can provide bulk QoS by customer or network
No admission control Too much high-precedence traffic can still swamp the
network
Diffserv Architectural Model Clouds — regions of relative
homogeneity: Administrative control Technology Bandwidth
Within a cloud, QoS managed by local rules
Hard work confined to boundaries of clouds: Classification Conditioning/Policing
QoS information exchange limited to boundaries Bi-lateral, not multi-lateral Not necessarily symmetric
MeMeNot Me
Not Me
Also Not Me
Also Not Me
Far Away
Far Away
Diffserv Scalability Fundamental assumptions:
Relatively small number of feasible queuing/scheduling algorithms for high link speeds
Number of individual flows is large Many different rules, often policy driven
Group packets explicitly by the “Per-hop behavior (PHB)” they are to get Queue service Shaping/policing
Nodes in the middle of a cloud only have to deal with traffic aggregates
Diffserv Forwarding via PHBs
PHBs map to DSCPs (Diffserv Code Points) Values chosen for backward-compatibility with
IPv4 TOS byte including IP Precedence (RFC 2474)
Packets with different DSCPs may be re-ordered
Forwarding resources partitioned by PHB/DSCP
Assured Forwarding PHB(AF*) Four independent classes Within each class, three levels of drop
precedence A congested AF node discards packets with
higher drop preference first Packets with lowest drop preference must be
within the subscribed profile
*RFC2597
Expedited Forwarding PHB(EF*)
Targeted at VoIP and “virtual leased lines” Roughly equivalent to priority queuing,
with a safety measure to prevent starvation
Implications: No more than 50% of a link can be EF
see RFC3247,3248 for interesting mathematical analyses
Worst case jitter at each hop is max of: number of EF microflows in the aggregate, or a single MTU packet of some other aggregate
*RFC3246
Diffserv Traffic Conditioner
Classifier: selects a packet in a traffic stream based on the content of some portion of the packet header
Meter: checks compliance to traffic parameters (e.g. Token Bucket) and passes result to marker and shaper/dropper to trigger particular action for in/out-of-profile packets
Marker: writes/rewrites DSCP Shaper: delay some packets for them to be compliant with
the profile
Packets
Shaped
Dropped
Meter
Classifier Marker
Shaper /
Dropper
Diffserv Acceptance
Time
Enthusiasm
today
Diffserv will solvethe world’s QoS
Diffserv Engineering?Diffserv SLA ?Internet e2e SLA?
Diffserv Design & Deploymentintra Domain
Realvalue
Inter-SP Diffserv and end-to-endInternet QoS need furtherstandardisation and commercialarrangements
Mixing Intserv & Diffserv: Aggregation
Host signals with RSVP Edge or transit domains
Aggregate reservations mark packets using DSCP
In transit domains Blindly transfer end to end
reservations using another IP Protocol Number - change at edge
Routers detect egress of reservation (deaggregation) on transfer from an interior or aggregator interface to an exterior (deaggregating) interface
Aggregate reservation size varies with load
Edge
Edge
Backbone
RTP Compression
20ms @ 8kbit/s yields 20 byte payload
IP header 20; UDP header 8; RTP header 12 Twice size of
payload! Header compression:
40 bytes to 2-4 most of the time
Hop-by-hop: use only on the slow links
Sample Delay Budget (G.711 - 64kbps)
Delay Source (G.711) Budget (ms)Device Sample Capture .1Encode Delay (Algorithmic Delay + Processing Delay) 2.5Packetization/Fr aming 10 Move to Output Queue/ Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 I nput Queue to Application .5 J itter Buf fer 35 Decode Processing Delay .5 Device Playout Delay .5
Total 94.6
Sample Delay Budget (G.729 - 8kbps)
Delay Source (G.729) Budget (ms)Device Sample Capture .1Encode Delay (Algorithmic Delay + Processing Delay) 17.5Packetization/Fr aming 20 Move to Output Queue/ Queue Delay .5 Access (up) Link Transmission 30 Backbone Network Transmission 5 Access (down) Link Transmission 10 I nput Queue to Application .5 J itter Buf fer 35 Decode Processing Delay 5 Device Playout Delay .5
Total 119.1
Signaling: SIP
SIP is one of Many
ITU H.323 Originally for video conferencing The first standard protocol for VoIP Still in wide usage, but negative growth
MGCP Dumb phones controlled by smart server “Softswitch” – PSTN emulation view
Megaco/H.248 Standard version of MGCP
Core SIP Functions Establishment of peer to peer sessions Management of peer to peer sessions
Keepalives Graceful and Non-graceful termination
Rendezvous Forking Search
Policy Based Routing Loose Routing Mobility
Limited terminal mobility Device Mobility
Core SIP Functions
Secure User Identification Exchange and Management of Media
Session data User registration Capability declaration Capability query Reliability
SIP Technology Community
SIPRFC3261
DNS3263
Events3265
Rel3262
O/A3264
RTPSDP
SIMPLE
SigComp
SIP ExtensionsENUM
MIDCOM
STUN
ROHC
SIP Design Philosophy
Patterned after other Successful Internet Standards HTTP
Don’t Reinvent the PSTN General Purpose
Functionality Do Not Dictate
Architectures or Services
It needs to work on any IP Network
Leverage the Best of Existing Standards
URLs MIME RFC822
Scalability Push state to the edge
Basic Design
Request/Response Protocol SIP is a Peer Protocol – all
entities send requests and receive requests
Modelled after HTTP Each request invokes
method Main purpose of request
Messages contain bodies
Agent Agent
request
response
Transactions Fundamental unit of
messaging exchange Request Zero or more provisional
responses Usually one final response Maybe ACK
All signaling composed of independent transactions
Identified by Cseq Sequence number Method tag
INVITE
100200
ACK
BYE
200
First Transaction
Second Transaction
Cseq: 1
Cseq: 2
Session Independence Body of SIP message
used to establish call describes the session
Session could be Audio Video Game
SIP operation is independent of type of session
SIP Bodies are MIME objects MIME = Multipurpose
Internet Mail Extensions Mechanisms for
describing and carrying opaque content
Used with HTTP and email
Protocol Components
User Agent End systems Hard and soft phones PSTN Gateways Phone Adaptors Media Servers Anything that
originates or terminates SIP calls
Proxy SIP server responsible for relaying
and processing requests between user agents
Main job: where to send request next?
Back-to-Back User Agent (B2BUA) SIP server that terminates and re-
originates SIP SBCs, Call Agents, etc.
SIP Addressing SIP addresses are URL’s URL contains several components
Scheme (sip) Username Hostname Optional port Parameters Headers and Body
SIP allows any URI type tel URIs http URLs for redirects mailto URLs leverage vast URI
infrastructure
sip:[email protected]:5061; user=host?Subject=foo
The SIP Trapezoid
a.com b.com
SIP
RTP
SIP Methods
INVITE Invites a participant to a
session idempotent - reINVITEs for
session modification BYE
Ends a client’s participation in a session
CANCEL Terminates a search
OPTIONS Queries a participant
about their media capabilities, and finds them, but doesn’t invite
ACK For reliability and call
acceptance REGISTER
Informs a SIP server about the location of a user
SIP ArchitectureRequest
Response
Media
1
2
3
45
67
8
9
1011
12
Corp DB
13
14
sp.com
b.com
SIP Message Syntax
Many header fields from http
Payload contains a media description SDP - Session
Description Protocol
INVITE sip:[email protected] SIP/2.0From: J. Rosenberg <sip:[email protected]> ;tag=76ahSubject: Conference CallTo: John Smith <sip:[email protected]>Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: [email protected]: application/sdpCSeq: 4711 INVITEContent-Length: 187
v=0o=user1 53655765 2353687637 IN IP4 1.2.3.4s=Salesc=IN IP4 1.2.3.4t=0 0m=audio 3456 RTP/AVP 0
SIP Address Fields
Request-URI Contains address of
next hop server Rewritten by proxies
based on result of Location Service
To Address of original
called party Contains optional
display name From
Address of calling party
Optional display name
INVITE sip:[email protected] SIP/2.0From: J. Rosenberg <sip:[email protected]> ;tag=76ahSubject: Conference CallTo: John Smith <sip:[email protected]>Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: [email protected]: application/sdpCSeq: 4711 INVITEContent-Length: 187
v=0o=user1 53655765 2353687637 IN IP4 1.2.3.4s=Salesc=IN IP4 1.2.3.4t=0 0m=audio 3456 RTP/AVP 0
SIP Responses
Look much like requests Headers, bodies
Differ in top line Status Code
Numeric, 100 - 699 Meant for computer processing Protocol behavior based on
100s digit Other digits give extra info
Reason Phrase Text phrase for humans Can be anything
Status Code Classes 100 - 199 (1XX): Informational 200 - 299 (2XX): Success 300 - 399 (3XX): Redirection 400 - 499 (4XX): Client Error 500 - 599 (5XX): Server Error 600 - 699 (6XX): Global Failure
Two groups 100 - 199: Provisional
Not reliable 200 - 699: Final, Definitive
Example 200 OK 180 Ringing
Example SIP Response
Note how only difference is top line
Rules for generating responses Call-ID, To, From, Cseq
are mirrored in response
Branch parameter used as transaction ID
Tag added to To field to identify dialog
SIP/2.0 200 OKFrom: J. Rosenberg <sip:[email protected]> ;tag=76ahTo: John Smith <sip:[email protected]> ;tag=112Via: SIP/2.0/UDP 1.2.3.4;branch=z9hG4bK74bf9Call-ID: [email protected]: application/sdpCSeq: 4711 INVITE
SIP Transport
SIP Messages over UDP or TCP/TLS or SCTP
Reliability mechanisms defined for UDP
UDP More Widely Used Faster No connection state
TCP preferred these days NAT Larger SIP messages
Reliability mechanisms depend on SIP request method INVITE anything except INVITE
Reason: optimized for phone calls
Registrations
REGISTER creates mapping in server from one URI to another
REGISTER properties UA location in Contact Registrar identified in Request
URI Identifies registered user in To
and From field Expires header indicates desired
lifetime Can be different for each
Contact Registrations are soft-state
REGISTER sip:example.com SIP/2.0To: sip:[email protected];user=phoneFrom: sip:[email protected];user=phoneCall-ID: [email protected]: 123 REGISTERContact: sip:[email protected]: 3600
Registration Handling
Registrar is logical function handling REGISTER
Registrar steps: Authenticate Authorize Add Binding Lower expiration Return all currently
registered UA (can be more than one)
SIP/2.0 200 OKTo: sip:[email protected];user=phoneFrom: sip:[email protected];user=phoneCall-ID: [email protected]: 123 REGISTERContact: sip:[email protected];expires=3600Contact: sip:[email protected];expires=524
Forking
A proxy may have more than one address for a user Happens when more than one SIP
URL is registered for a user Can happen based on static routing
configuration In this case, proxy may fork Forking is when proxy sends request
to more than one proxy at once First 200 OK that is received is
forwarded upstream All other unanswered requests
cancelled
INVIT
E 8902
3077
@1.2
.3.4
INVITE [email protected]
Routing of Subsequent Requests
Initial SIP request sent through many proxies
No need per se for subsequent requests to go through proxies
Each proxy can decide whether it wants to receive subsequent requests Inserts Record-Route header
containing its address For subsequent requests, users
insert Route header Contains sequence of proxies (and
final user) that should receive request
Proxy
Proxy
Proxy
UA1
UA2
INVITE
BYE
Setting up the Session
INVITE contains the Session Description Protocol (SDP) in the body
SDP conveys the desired session from the callers perspective Session consists of a number of
media streams Each stream can be audio,
video, text, application, etc. Also contains information
needed about the session codecs addresses and ports
SDP also conveys other information about session Time it will take place Who originated the
session subject of the session URL for more information
SDP origins are multicast sessions on the mbone Originator of INVITE is
not originator of session
Anatomy of SDP SDP contains informational
headers version (v) origin(o) - unique ID information (I)
Time of the session Followed by a sequence of media
streams Each media stream contains an
m line defining port transport codecs
Media Stream also contains c line Address information
v=0o=user1 53655765 2353687637 IN IP4 128.3.4.5s=Mbone Audioi=Discussion of Mbone Engineering [email protected]=0 0m=audio 3456 RTP/AVP 0 78c=IN IP4 1.2.3.4a=rtpmap:78 G723m=video 4444 RTP/AVP 86c=IN IP4 1.2.3.4a=rtpmap:86 H263
Negotiating the Session Called party receives SDP offered
by caller Each stream can be
accepted rejected
Accepting involves generating an SDP listing same stream port number and address of called
party subset of codecs from SDP in request
Rejecting indicated by setting port to zero
Resulting SDP returned in 200 OK Media can now be exchanged
v=0o=user2 16255765 8267374637 IN IP4 4.3.2.1t=0 0m=audio 3456 RTP/AVP 0 c=IN IP4 4.3.2.1m=video 0 RTP/AVP 86c=IN IP4 4.3.2.1
Audio stream accepted, PCMU only.Video stream rejected
Changing Session Parameters
Once call is started, session can be modified
Possible changes Add a stream Remove a stream Change codecs Change address information
Call hold is basically a session change
Accomplished through a re-INVITE Same session negotiation as
INVITE, except in middle of call Rejected re-INVITE - call still active!
INVITE
200ACK
INVITE
200ACK
reINVITE
Hanging Up
How to hang up depends on when and who
After call is set up either party sends BYE request
From caller, before call is accepted send CANCEL BYE is bad since it may not reach
the same set of users that got INVITE
If call is accepted after CANCEL, then send BYE
From callee, before accepted Reject with 486 Busy Here
C S
INVITE
100
Hangup AcceptCANCEL
200 OK
200 OK
ACK
BYE
200 OK
Call Flow for basic call: UA to proxy to UA
Call setup 100 trying hop by hop 180 ringing 200 OK acceptance
Call parameter modification re-INVITE Same as initial INVITE,
updated session description Termination
BYE method
INVITE
100 Trying
INVITE
100 Trying
180 Ringing180 Ringing
200 OK200 OK
ACK
BYE
200 OK
RTP
Privacy and Identity
RFC 3325: A Private Extension for Asserted Identity in Trusted Networks
RFC 3323: A Privacy Mechanism for SIP RFC 4474: SIP Identity
RFC3325 Asserted Identity
Trust Domain
AuthenticatesCaller and verifiesidentity. Adds PAID.
INVITEP-Asserted-Identity: sip:[email protected]
RFC3323 – SIP Privacy
Trust Domain
INVITEP-Asserted-Identity: sip:[email protected]: anonymous
INVITEPrivacy: idFrom: anonymous
AnonymousCaller
INVITEFrom: anonymous
4474: SIP Identity
AuthenticatesCaller and verifiesidentity. Signs Request.
INVITEFrom: sip:[email protected]: asd87f7as66sda8z
INVITEFrom: sip:[email protected]
VerifiesSignature
Only useful for user@domain addresses!
Transfers and Dialog Movement: REFER (RFC 3515)
Joe
Alice
Bob
REFERRefer-To: Bob
INVITE
INVITE
INVITE BobReferred-By: Joe1
2
3
4
Third Party Call Control (3pcc): RFC 3725
RTP
INVITEno SDP
200SDP A
INVITESDP A
200SDP B
ACKSDP B
1
2
3
4
5
6
SIP and Quality of Service RFC 3312: Integration of Resource
Management with SIP Problem
How to make sure phone doesn’t ring unless resources are reserved
Solution SIP does not do resource
reservation! SIP INVITE tells far side not to ring Both sides do regular QoS
reservations RSVP PDP context activation
UPDATE to change state
INVITE w. Preconditions
183 Progress
QoS Reservations
UPDATE w. Preconditions
180 Ringing
200 OK
ACK
Security
VoIP Security
The only totally secure system I know of is a rock
- Tony Lauck, circa 1985
But Even Rocks can be Insecure..
It Had a Great User Interface
But it had a serious security vulnerability…
VoIP AttacksAttack Solution
Free Calls aka Toll Fraud User Authentication
Impersonation User Authentication, Secure Caller ID
Learning Private Information (calling patters, PIN codes)
SIP Encryption, Media Encryption
Steal Calls SIP Encryption, Media Encryption
DoS ICE, Others
SIP User Authentication
RTP
We want this SIP server to authenticatethis user
and this SIP server to authenticatethis user
SIP Digest Authentication
Hi, I’d liketo SIPREGISTER
401 –OK, tryagain. Nonce=a7szh1
REGISTER Nonce=a7szh1Username=joeDigest=z0v88a6
Digest= Hash(joe, a7szh1,myPassword)
OK, done!
Digest= Hash(joe, a7szh1,myPassword) = z0v88a6
Offline Dictionary Attack
REGISTER Nonce=a7szh1Username=joeDigest=z0v88a6
Digest= Hash(joe, a7szh1,alligator)
OK, done!
Digest= Hash(joe, a7szh1,alligator) =
Aardvark 9z8v77aAbacus lkf88z7Abate 8z77x…….Alligator z0v88a6
Word Hash(joe, a7szh1,word)
Solution: Digest over TLS
Digest= Hash(joe, a7szh1,alligator)
Digest= Hash(joe, a7szh1,alligator) =
TLSArmor
This is howWeb Security works!
Even Stronger: Mutual TLS for Devices
TLSArmor
MAC8x7a6
a.com
Phone has aCertificatewhich identifiesit
SIP Encryption
RTP
We want each SIP hop to beEncyprted so only the SIPservers and endpoints see thesignaling.
SIP Encryption: TLS
RTP
Mutual TLSAuthentication
a.com
b.com
Media Encryption Countermeasure against:
Eavesdropping Barge-in Modification
Two useful techniques IPSEC SRTP
Complications Key management Legal intercept (who has the keys) Firewall and NAT issues (covered later)
Alternative: Secure RTP Authentication and encryption of RTP and RTCP packets
timestamp
PV X CC M PT sequence number
synchronization source (SSRC) identifier
contributing sources (CCRC) identifiers…
RTP extension (optional)
RTP payload
SRTP MKI -- 0 bytes for voice
Authentication tag -- 4 bytes for voice
Authenticated portionEncrypted portion
SRTP Advantages
Provides both Privacy via encryption and authentication via message integrity check
Very little bandwidth overhead Does not break header compression schemes like cRTP For very low-rate channels (e.g. cellular) can sacrifice authentication
and have no packet expansion. Uses modern strong crypto suites: AES counter mode for
encryption and HMAC for message integrity Disadvantages
Needs key management End-to-end versus hop-by-hop trust tradeoffs in protecting keys Yet another security mechanism to ensure is implemented and
deployed correctly
NAT Traversal
What is NAT? Network Address Translation
(NAT) Creates address binding
between internal private and external public address
Modifies IP Addresses/Ports in Packets
Benefits Avoids network renumbering on
change of provider Allows multiplexing of multiple
private addresses into a single public address ($$ savings)
Maintains privacy of internal addresses
ClientNAT
NAT
S: 1.2.3.4:8877D: 67.22.3.1:80
Binding Table
Internal External10.0.1.1:6554 -> 1.2.3.4:8877
S: 10.0.1.1:6554D: 67.22.3.1:80
IP Pkt IP Pkt
Problem: Getting SIP Through NATs
NAT
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1
RTP to 10.0.1.1
Solution Space
Application Layer Gateways (ALGs) Session Border Controllers (SBC) Simple Traversal of UDP Through NAT
(STUN) Traversal Using Relay NAT (TURN) Interactive Connectivity Establishment (ICE)
Application Layer Gateway
NAT
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1
RTP to 10.0.1.1
INVITE sip:[email protected]
m=audio 1234 RTP/AVP 0 c=IN IP4 19.1.3.2
ALG
NAT also modifies SIPmessages to fix them up!
ALG Benefits and Drawbacks
Drawbacks Doesn’t work when security
turned on Hard to diagnose problems Requires network upgrade to
support new app Frequent implementation
problems (lack of expertise) Incentives mismatched
Benefits No change to clients or
servers
Session Border Controller
NAT
INVITE sip:[email protected]
m=audio 3456 RTP/AVP 0 c=IN IP4 10.0.1.1 SBC
9.8.7.6INVITE sip:[email protected]
m=audio 3225 RTP/AVP 0 c=IN IP4 9.8.7.6
RTP to9.8.7.6
SBC relaysRTP back tosource
SBC Benefits and Drawbacks
Drawbacks Expensive media relaying Interferes with some SIP
extensions Breaks more advanced SIP
security
Benefits No change to clients or
NATs Works with basic SIP
security mechanisms Easier to diagnose
Simple Traversal of UDP Through NAT (STUN)
NAT
What is my IP addressand port please?
STUNServer
9.8.7.6
INVITE sip:[email protected]
m=audio 3472 RTP/AVP 0 c=IN IP4 1.2.3.4
RTP to1.2.3.4
1.2.3.4
Its 1.2.3.4:3472
STUN Benefits and Drawbacks
Drawbacks Doesn’t always work
Benefits No change to servers or
NATs Works with all SIP
security mechanisms Can support non-VoIP
apps (e.g., games)
Traversal Using Relay NAT (TURN)
NAT
Give me an IP addressand port please?
TURNServer
9.8.7.6
INVITE sip:[email protected]
m=audio 2376 RTP/AVP 0 c=IN IP4 9.8.7.6
RTP to1.2.3.4
1.2.3.4
9.8.7.6:2376
TURN Benefits and Drawbacks
Drawbacks Expensive Media Relaying
Benefits No change to servers or
NATs Works with all SIP
security mechanisms Can support non-VoIP
apps (e.g., games)
Interactive Connectivity Establishment(ICE) Hybrid of STUN and
TURN P2P NAT Traversal Widely Deployed on
Internet Popular with
Application Providers
ICE Step 1: Allocation Before Making a Call, the
Client Gathers Candidates Each candidate is a
potential address for receiving media
Three different types of candidates Host Candidates Server Reflexive Candidates
(STUN) Relayed Candidates (TURN)
TURN
HostCandidates resideon the agent itself
STUN candidatesare addresses residing on a NAT
NAT
NAT
TURN candidates reside on a TURN server
STUN
ICE Step 2: Create Offer Each candidate is
placed into an a=candidate attribute of the offer
Each candidate line has IP address and port plus other info needed for ICE
c=IN IP4 192.0.2.3 t=0 0 m=audio 45664 RTP/AVP 0 a=rtpmap:0 PCMU/8000 a=candidate:1 1 UDP 2130706178 10.0.1.1 8998 typ host a=candidate:2 1 UDP 1694498562 192.0.2.3 45664 typ srflx raddr 10.0.1.1 rport 8998
ICE Step 3: Send INVITE
Caller sends a SIP INVITE as normal
No ICE processing by SIP servers
SIPServer
INVITE
ICE Step 4: Allocation Called party does
exactly same processing as caller and obtains its candidates
Recommended to not yet ring the phone!
TURN
NAT
NAT
STUN
ICE Step 5: Provisional Response Callee sends a
provisional response containing its SDP with candidates
As with INVITE, no processing by proxies
Phone has still not rung yet
SIPProxy
1xx
ICE Step 6: Verification Each agent pairs up its
candidates (local) with its peers (remote) to form candidate pairs
Each agent sends a STUN-based ping on each pair, starting at highest priority
If a response is received the check has succeeded and we know media can flow on that pair!
TURNServer
NAT
NAT
TURNServer
NAT
NAT
1
2
3
45
ICE Benefits and Drawbacks
Drawbacks Requires client changes Requires other side to
support it
Benefits Always Works No change to servers or
NATs Works with all SIP security
mechanisms Minimum Media Relaying Can support non-VoIP apps
(e.g., games) Built-In Anti-DOS Eliminates Ghost Rings
That’s it!
Questions?
GlossaryAI N Advanced I ntelligent Network ADPCM Adaptive PCM BGP Border Gateway Protocol CALEA Communication Access f or Law
Enforcement Act CBR Constant Bit Rate CELP Code Excited Linear Prediction CODEC Coder/ Decoder COPS Common Open Policy Service CRTP Compressed RTP CSRC Contributing Source CTI Computer-Telephony
I ntegration DSCP Diff serv Code Point DSL Digital Subscriber Line DSP Digital Signal Processor DTMF Dual Tone Multi-Frequency ERL Echo Return Loss ERLE ERL Enchancement HFC Hybrid Fiber/ Coax
I N I ntelligent Network I SDN I ntegrated Services Digital
Network I SUP I SDN User Part J TAPI J ava Telephony API LDAP Lightweight Directory Access
Protocol MCML Multi-class Multi-link PPP MGCP Media Gateway Control
Protocol MOS Mean Opinion Score MPLS Multi-protocol Label Switching NLP Non-linear Processing NTP Network Time Protocol PCM Pulse Coded Modulation PPP Point-to-point Protocol PHB Per-hop Behavior PQ Priority Queueing PSTN Public Switched Telephony
Network
Glossary (2)QoS Quality of Service RED Random Early Detect (or Drop) RTCP Realtime Transport Control
Protocol RTP Realtime Transport Protocol SCP Service Control Point SIP Session I nvitation Protocol SS7 Signaling System Number 7 SSRC Synchronization Source TAPI Telephony API TDM Time Division Multiplexed TRIP Telephony Routing I nformation
Protocol TSPEC Transmission Specification WFQ Weighted Fair Queueing