5
Consumer Electronics for Social Video Services Andres Marin, Daniel D´ ıaz-S´ anchez, Florina Almen´ arez Mendoza, Patricia Arias Cabarcos, Rosa S´ anchez Guerrero, Fabio Sanvido Universidad Carlos III de Madrid Avda de la Universidad 30 Legan´ es 28911 (Madrid), Spain Abstract—Video conference services have been there for quite a long time. First commercial services, enabled by ISDN, where mainly operated by Telcos, then IP enabled video conference and multiconference through the session description protocol (IETF RFC 4566). The common explanation why these services were not massively adopted was price, bandwidth limitation and poor usability. Today bandwidth has greatly improved thanks to ADSL penetration, and many free and commercial providers offer more usable video conference services. Still these services are not massively adopted by domestic users. We also base on the hypothesis that video conference integration with social networks and home networks, will dramatically improve usability and market, but we consider that both integration and usability will be greatly increased through flexible consumer electronics. In this article we explain the requirements of such a device, its architecture, and the advantages for users and technology adoption. I. I NTRODUCTION ISDN was designed for new data services, enabling the first generation of video conference commercial services. Telephone companies offered these services, arising great ex- pectations in the techological audience, though they achieved a very limited success. The penetration of IP in the Telco networks, offered new possibilities for video conferencing. Furthermore, multicast was foreseen to be widely adopted due to its benefits in bandwidth. The common explanation why these services were not massively adopted was price, bandwidth limitation and poor usability. Today bandwidth has greatly improved thanks to ADSL penetration, and many free and commercial providers offer more usable video conference services. Due to the economic crisis, many companies are opting for video conference as a solution to reduced travel budgets. IDC 1 recent studies predict over 17% anual growth to reach in 2015 $1 billion in Europe 2015 and $5 billion worldwide. Still these services are not massively adopted by domestic users. Recently, significant activity is being reported on the integration of video conference services with social networks. We also support this hypothesis that integration with social networks will dramatically improve the usability and market of video conference services. This work has been partially supported by the State of Madrid (CAM), Spain under the contract number S2009/TIC-1650, project E-Madrid 1 www.idc.com “Worldwide Enterprise Video conferencing and Telepres- ence 2010–2014 Forecast”, Mar 2010 Along the video services development and adoption path, the IETF standardized protocols like SDP and SIP [1] for sup- porting multimedia sessions. These protocols have achieved larger success due to a number of reasons: network adop- tion, flexibility, and applicability: besides the Internet, SIP has been adopted in cellular networks as a central part of the IP Multimedia Subsystem; they are independent of the underlying transport network; and they have been successfully applied in a number of applications like video conferencing, games, presence information, instant messaging, streaming multimedia, etc. Following this success story, we claim that consumer elec- tronics manufacturers should follow the same principles in the design of a new family of devices cheap, flexible, and suitable to be integrated in a large number of applications. User TV, camera, and home netork can greatly facilitate the adoption of social video services offering enhanced usability. Having this goal in mind, we structure the paper as follows. In section II we present an scenario and analyze it in terms of available networks, services, devices, and usability constraints. We capture the requirements in section III. Section IV discusses on the architecture of the device(s). Finally we compare with related works and draw our final concusions. II. ANALYSIS OF SCENARIOS Bob is at home, watching TV. Suddenly, his mobile phone rings once signaling Alice has just signed in the social network and wants to start a conference. Bob clicks in Alice icon, selects the videoEasy in the pop menu, and then Transfer, in order to use the wall TV with the HD webcam, both plugged to the brand new videoEasy device. Bob’s mobile establishes a bluetooth connection to videoEasy, queries for acceptable video conference parameters, and sends Alice a (SIP) invitation. Once Alice confirms the video conference invitation and sends her own parameters, Bob’s mobile sends them to videoEasy, and the video conference may start. The scenario includes social network with support for SIP add-ons, enabling thus the videoEasy device to set up the RTP/RCTP addresses for both parties. The scenario takes advantage of Bob’s mobile phone support of social network, not requiring the TV to be social enabled. Bluetooth or IrDA, ubiquitous protocols in mobile phones, are envisaged for the interaction with the videoEasy device. The exchange of parameters may be done easily over OBEX, typically 2011 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin) 978-1-4577-0234-1/11/$26.00 ©2011 IEEE 264

[IEEE 2011 IEEE First International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2011.09.6-2011.09.8)] 2011 IEEE International Conference on Consumer

  • Upload
    fabio

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Consumer Electronics for Social Video ServicesAndres Marin, Daniel Dıaz-Sanchez, Florina Almenarez Mendoza,

Patricia Arias Cabarcos, Rosa Sanchez Guerrero, Fabio SanvidoUniversidad Carlos III de Madrid

Avda de la Universidad 30Leganes 28911 (Madrid), Spain

Abstract—Video conference services have been there for quitea long time. First commercial services, enabled by ISDN, wheremainly operated by Telcos, then IP enabled video conferenceand multiconference through the session description protocol(IETF RFC 4566). The common explanation why these serviceswere not massively adopted was price, bandwidth limitation andpoor usability. Today bandwidth has greatly improved thanksto ADSL penetration, and many free and commercial providersoffer more usable video conference services. Still these servicesare not massively adopted by domestic users. We also baseon the hypothesis that video conference integration with socialnetworks and home networks, will dramatically improve usabilityand market, but we consider that both integration and usabilitywill be greatly increased through flexible consumer electronics.In this article we explain the requirements of such a device,its architecture, and the advantages for users and technologyadoption.

I. INTRODUCTION

ISDN was designed for new data services, enabling thefirst generation of video conference commercial services.Telephone companies offered these services, arising great ex-pectations in the techological audience, though they achieveda very limited success. The penetration of IP in the Telconetworks, offered new possibilities for video conferencing.Furthermore, multicast was foreseen to be widely adopteddue to its benefits in bandwidth. The common explanationwhy these services were not massively adopted was price,bandwidth limitation and poor usability. Today bandwidth hasgreatly improved thanks to ADSL penetration, and many freeand commercial providers offer more usable video conferenceservices.

Due to the economic crisis, many companies are optingfor video conference as a solution to reduced travel budgets.IDC1 recent studies predict over 17% anual growth to reachin 2015 $1 billion in Europe 2015 and $5 billion worldwide.Still these services are not massively adopted by domesticusers. Recently, significant activity is being reported on theintegration of video conference services with social networks.We also support this hypothesis that integration with socialnetworks will dramatically improve the usability and marketof video conference services.

This work has been partially supported by the State of Madrid (CAM),Spain under the contract number S2009/TIC-1650, project E-Madrid

1www.idc.com “Worldwide Enterprise Video conferencing and Telepres-ence 2010–2014 Forecast”, Mar 2010

Along the video services development and adoption path,the IETF standardized protocols like SDP and SIP [1] for sup-porting multimedia sessions. These protocols have achievedlarger success due to a number of reasons: network adop-tion, flexibility, and applicability: besides the Internet, SIPhas been adopted in cellular networks as a central part ofthe IP Multimedia Subsystem; they are independent of theunderlying transport network; and they have been successfullyapplied in a number of applications like video conferencing,games, presence information, instant messaging, streamingmultimedia, etc.

Following this success story, we claim that consumer elec-tronics manufacturers should follow the same principles in thedesign of a new family of devices cheap, flexible, and suitableto be integrated in a large number of applications. User TV,camera, and home netork can greatly facilitate the adoption ofsocial video services offering enhanced usability. Having thisgoal in mind, we structure the paper as follows. In section IIwe present an scenario and analyze it in terms of availablenetworks, services, devices, and usability constraints. Wecapture the requirements in section III. Section IV discusseson the architecture of the device(s). Finally we compare withrelated works and draw our final concusions.

II. ANALYSIS OF SCENARIOS

Bob is at home, watching TV. Suddenly, his mobile phonerings once signaling Alice has just signed in the social networkand wants to start a conference. Bob clicks in Alice icon,selects the videoEasy in the pop menu, and then Transfer,in order to use the wall TV with the HD webcam, bothplugged to the brand new videoEasy device. Bob’s mobileestablishes a bluetooth connection to videoEasy, queries foracceptable video conference parameters, and sends Alice a(SIP) invitation. Once Alice confirms the video conferenceinvitation and sends her own parameters, Bob’s mobile sendsthem to videoEasy, and the video conference may start.

The scenario includes social network with support for SIPadd-ons, enabling thus the videoEasy device to set up theRTP/RCTP addresses for both parties. The scenario takesadvantage of Bob’s mobile phone support of social network,not requiring the TV to be social enabled. Bluetooth orIrDA, ubiquitous protocols in mobile phones, are envisagedfor the interaction with the videoEasy device. The exchangeof parameters may be done easily over OBEX, typically

2011 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

978-1-4577-0234-1/11/$26.00 ©2011 IEEE 264

supported by most phones. Finally, the videoEasy deviceshould directly support the audio/video streams, possibly using802.11. It is unfeasible today thinking of having the mobilephone redirecting them, at least through bluetooth.

VideoEasy

Alice Calling...

Transfer

Bluetoothpaireddevice

Bob’s place

Alice

Video enabledSocial Network

12

3

Internet

0

Bluetooth OBEX transfer1

0

2

3

Video Call to contact in SN

SIP call negotiation & transfer (4RTTs)

Direct media transfer

Fig. 1. Video conference scenario

Fig. 1 illustrates the setup of the video conference, step 0 re-quires Bob’s presence in the social network, and encapsulatesa SIP INVITE, subsequently followed by a 100 TRYING. Thedata is then OBEX-transferred to the videoEasy device (step1), this starts the SIP client in the videoEasy device which willnegotiate the SIP video call with Alice (step 2), and step 4represents the actual media exchange of the video conference.

Other scenarios include business meetings in hotel roomsequipped with Internet connection, speakers, and large screens.The equipment to establish a video conference, includes thevideoEasy device, together with a camera, and possibly amicrophone.

III. REQUIREMENTS

Home users, potential consumers of social video services,are typically engaged in at least one social network. Usersinteract through the social network with friends, relatives, andmembers of common interest groups. In this section we reviewthe main requirements of the domestic social video services:

Networks

Home users have fixed, mobile Internet connectivity, orboth. Mobile connectivity includes different technologies likeGPRS, EDGE, HSDPA/HSUPA, evolving to 4G: WiMax, andLTE. Consequently, mobile connectivity upload rates varyfrom 28.8 kbps of GPRS 2.5G to the promised 86 Mbps offuture LTE with 4x4 MIMO. That will be more than enoughbandwidth for video, though aiming at a massive market, amore conservative approach for a couple of years from nowis still 64kbps, and possibly doubling in two years. Mobilebandwidth below 128kbps requires another network (fixednetwork) to meet the required quality.

Future pure mobile 4G connected users, may use the mobilephone to simply transport audio and video to the home

equipment. Theoretically this transport can be achieved usingenhanced data rate (EDR) bluetooth with specific profileslike video distribution profile (VDP [2]) over Bluetooth au-dio/video distribution transport protocol (AVDTP [3]). Un-fortunately, though approved by the Bluetooth ArchitectureReview Board, the VDP profile was discontinued, and theBluetooth core 4.0 specification does not bring any light onthis subject. Besides Bluetooth, other appealing technologytoday are wireless USB, USB 3.0 or HDMI. Despite a superiormobility, these solutions both wired and wireless are onlyavailable in a restricted number of mobile phones. In thefollowing analysis, we restrict to non pure mobile users, i.e.,users with a second network, typically ADSL, for setting upand transporting the video services.

Protocols

Regarding transport networks, UDP and TCP/IP, is a mustin almost any scenario. As it is HTTP/HTML as applicationlayer, supported in most mobile and limited devices. SIPis an application-layer control protocol that can establish,modify, and terminate multimedia sessions (conferences) suchas Internet telephony calls. SIP works with multimedia sessiondata carrying protocols. SIP main functions are (1) endpointdiscovery and (2) agreemeent on characterizing the sessionto be shared. Agreements are typically achieved by meansof offer/answer messages of Session Description Protocol[7]. SIP users users can maintain a single externally visibleidentifier regardless of their network location. Using theseidentifiers, users are discovered with the support of SIPproxy and registrar servers. Besides those main functions, SIPfacilitate users to establish their availability to be reached,and to setup and manage the session. SIP is a cornerstonein multimedia services, helping users and service providersto contact partners, configure multimedia session parameters,and access control. SIP support in mobile limited devices isincreasing though it will not reach the massive deploymentlevels of HTTP. Native SIP support in connected videoEasydevices will enable a larger application market, like socialvideo gaming.

Codecs

Video services provision planning include codecs, both forvideo conference, and for entertainment. Codec support ofthe videoEasy device is crutial for user adoption, since usersexpect video devices to interoperate. H.264 was originallydesigned for video conferencing and it has experimentedimportant advances, like algorithms for intraprediction modes[4] with notable reduction in time encoding for H.264/AVC,or low delay audio codecs [5] for H.264/AAC. Besides H.264,and due to latest strong efforts around HTML5 [6], which na-tively supports video and audio, there is a movement towardsopen codecs, such as Theora video and Vorbis audio in Oggcontainers, and lately VP8 and Matroska containers. Otherpopular video formats, like On2 VP6 (FLV4), Sorensen Spark(FLV1) and the new H.264 based Flash video (F4V) cannot beeasily neglected, and may serve to differentiate products. We

265

Fig. 2. videoEasy device architecture

may also think of scenarios where video services are supportedby intermmediate transcoders, adapting to constrained displaysand bandwidths, and helping in NAT traversal issues.

Social Networks

Enabling video services in a social network requires supportto negotiate the call and in most scenarios this includesNAT traversing support. Maintaining a SIP registrar helps thesocial network to translate the information they handle ofonline users to SIP-based presentity servers, and other SIPapplications like video conference. For usability and to staycloser to the user (user-centric) the social network can facilitateSIP applications via plug-ins as we propose in this paper.Plugins for mobile phone applications require some adaptationof the basic web browser plugin.

Security

Social networks offer privacy and security support to users.Besides their configuration and policies for chat/talk services,video conference requires some specific settings. Whitelist andblacklist are intuitive for most users. They need to be com-bined with context details included in the presence informationfor usability. This is achieved by establishing policies wherethe user defines a set of locations and a set of states indicatingits availability for video calls.

IV. PROPOSED ARCHITECTURE

Fig. 2 presents the proposed architecture of the videoEasydevice. The device connection with the TV is in charge of theHDMI module, while the USB module will be connected tothe camera, and optionally to the mobile phone as discussedin section III. The session manager block is responsiblefor maintaining the multimedia sessions, collecting state andstatistics from RTP/RTCP to inform the user. Finally, thedevice should facilitate NAT issues in case of fixed internetaccess. Such an architecture will enable other applicationsto establish SIP multimedia sessions for gaming, streaming,

and many other purposes. We reackon that SIP support inthe device will largely extend the market to many new pos-sibilities. Even in the case of HTML5 being widely deployedand the navigator.getUserMedia() JavaScript APIsupport extends to popular cameras2, there is still a hugenumber of non Internet TVs, market for videoEasy devices.

NAT issues

A significant number of Internet hosts is operating behind aNAT, that is the case of most ADSL domestic users and largebusiness. NATs send hosts packets to the Internet with a publicaddress and port, different from the ones at the originatinghost. Consequently, NATs make it more difficult to exchangepackets with other users. From a practical point of view thereare two possibilities: NAT traversal or managing the NAT.

Different protocols have been proposed to facilitate NATtraversal. An intermmediate server implementing SessionTraversal Utilities for NAT (STUN) [8], allows an internalhost to discover the presence and type of its NAT and publicaddress to establish a direct communication. Sometimes, dueto the combination of NAT types of the clients (symmetric/portrestricted), the intermediate node is required to act as acommunication relay. A server implementing Traversal UsingRelays around NAT (TURN) [9] allows two hosts to controland to exchange packets with its peers through the relay.Interactive Connectivity Establishment (ICE) [10] makes useof both STUN and TURN for any two peers to discover andexchange three candidate transport addresses including host(itself), mapped (on the NAT), and relayed (on the relay server)transport addresses.

Domestic users usually have the administration rights overtheir NATs. The NAT typically offers either a HTTP/HTMLinterface, or a Internet Gateway Device (IGD) StandardizedDevice Control Protocol over UPnP [11]. In both cases usersmay administer port mappings and leases, allowing incomingpeer packets and connections to be received by the NATedhost.

A. SIP

There are three possibilities for transferring the video con-ference to the videoEasy device: (1) merely transfer the media,(2) have the other end waiting until the call is transferred, and(3) preliminarily attend the call to transfer it afterwards.

The first option is performed through a target-refresh requestby means of a re-INVITE sent by the mobile phone. This isthe most cheap option since it does not require a SIP agent inthe videoEasy device, and subsequently no further control isneeded on the device. The drawbacks of this approach are:all the signaling path goes through the mobile phone, andthe user should use it to change media data in the sessions,such as muting audio or video; no video conferences maybe started from the videoEasy device, and it should alwaysbe triggered by the mobile phone; in case of transmissionproblems, the videoEasy device should still report them to

2see www.whatwg.org

266

the mobile phone for intervention. The SIP dialogue willstart with Alice INVITE, a 1xx response while the usertransfer the session parameters to the videoEasy device, a 200OK updating the SDP response with the transport address ofthe videoEasy device for the video conference and the finalACK. The mobile phone may also accept a restricted sendsa re-INVITE (with a higher CSeq sequence number) and therenegotation of the call starts. The mobile phone sends a re-INVITE to inform of the new target address, and waits untilthe response is received, sends the ACK and the transactionis done details

The other two options do require a SIP agent in thevideoEasy device and are consirably more flexible. SIP calls(and SIP video conferences) require a registrar server to locatethe available users. The registrar will usually (though notneed to) be in the domain of the social network, sharing thusthe presence information on connected users. NAT traversalmay also require the use of stateless and stateful proxies.The videoEasy device will include a SIP user agent that willperform the SIP registration. SIP dialogues in the videoEasydevice will follow [12] for session media and transport addressrenegotiation.

In the second option the call transfer between the mobilephone and the videoEasy device is achieved by means ofa simple redirection. The mobile phone generate 100 tryingresponses, while the user ensures that all the equipment is set,and pushes the button to setup the videoconference (OBEXtransfer). The SIP UA in the videoEasy device is started, itreceives the parameters of the established session (via OBEX),performs the REGISTER, and reveals its own transport addressto the mobile phone. At this point the mobile phone sends backa 302 Moved Temporarily response, and Alice will ACKto end the original invite transaction and send a new INVITEto the new address of Bob as specified in the Contactheader of the 302 response. The videoEasy SIP client willthen answer the call 200 OK, and once the ACK is receivedfrom Alice, the video conference may start. This illustrated infigure 3.

The third option is the most flexible, though it would requirethe endpoints to handle RFC 3515 refer method, which is notalways the case, and to establish three SIP sessions in total. Itstarts with the mobile phone accepting the incoming call, thisrequires modifications on the received SDP offer, since themobile phone will not accept the video codecs proposed bythe caller. The mobile phone generates a 200 OK responsewith an SDP response restricted to audio. After receivingAlice’s ACK, the call is established and media is settled. Bobinforms Alice that he is going to setup the video conference,transfer the session parameters through OBEX, and wait forthe videoEasy to give back its transport address. At this point,the mobile phone establishes a second SIP session with thevideoEasy device, and sends a REFER to transfer it to Alice,Alice accepts the transfer, NOTIFY Bob’s mobile phone onstarting a new session. Once the session between Alice andthe videoEasy is established, the videoEasy device finishesits session with Bob’s mobile, and Alice send a NOTIFY to

Fig. 3. SIP dialogue of the call transfer: Alice waits until Bob is ready

inform Bob that the new session is established. At this point,Bob may hangup his mobile phone, terminating the originalsession.

V. IMPLEMENTATION DETAILS

We are working in a prototype for the videoEasy devicein a commercially available media center, running a Linux2.6.12. The media center has a good support of audio andvideo codecs, and offers ethernet, HDMI, and a USB port.We are working to have USB camera support in the mediacenter. We use the User-Agent library of the Sofia SIP opensource project3 for its limited footprint, with the STUN moduleextension. The SIP user agent application has been compiledwith gcc for mips32r2 architecture. The bluetooth OBEXtransfer is being simulated with another application throughthe ethernet connection of the device. The SIP session istransferred from the PC sender of the simulated OBEX transferto the prototype.

VI. CONCLUSIONS AND RELATED WORKS

Users require their new mobile phones to be social network-able. Mobile network operators have to develop new productsand applications facilitating user interaction with social net-works. SIP gateways to social networks provide operators witha familiar environment, concepts, and tools, which extremelyreduce the time to market. Recent works have been reportedon integrating social networks with video conference services.

3sofia-sip.sourceforge.net

267

Socialeyes [13] is a third party Facebook application thatfacilitates video conferences to friends in the social network.It does not extend to other social networks yet. Similarly,Skype has recently been ported to a new family of PanasonicTVs and Logitech HD cameras [14]. This initiative is alsolittle flexible, and besides it requires specific hardware TVset and camera to work. On the HD side, the demo shownin [15] raised big expectations but it has not yet reachedthe commercial market. For video conference services beingmassively adopted by domestic users both approaches shouldbe combined. The integration with social networks will notbe enough, requiring from consumer electronics to improveits usability and flexibility, and SIP can play a central role insuch future devices.

REFERENCES

[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R.Sparks, M. Handley, E. Schooler, SIP: Session Initiation Protocol, IETFRFC 3261, Jun. 2002.

[2] Bluetooth SIG, Specification of the Bluetooth System, Profiles, version1.0 or later, Video Distribution Profile, Sep. 2004.

[3] Bluetooth SIG, Specification of the Bluetooth System, Profiles, version1.0 or later, Audio/Video Distribution Transport Protocol, Apr. 2007.

[4] D. Quan, Y. Ho, Categorization for fast intra prediction mode decision inH.264/AVC, IEEE Transactions on Consumer Electronics , vol.56, no.2,pp.1049-1056, May 2010.

[5] M. Schnell, M. Schmidt, M. Jander, T. Albert, R. Geiger, V. Ruoppila, P.Ekstrand, M.Lutzky, and B. Grill, MPEG-4 Enhanced Low Delay AAC- A New Standard for High Quality Communication, in The 125th AESConv., 2008.

[6] W3C Working Draft, HTML5: A vocabulary and associated APIs forHTML and XHTML, May 2011.

[7] M. Handley, V. Jacobson, C. Perkins, SDP: Session Description Protocol,IETF RFC 4566, Jul. 2006.

[8] J. Rosenberg, R. Mahy, P. Matthews, and D. Wing, Session TraversalUtilities for NAT (STUN), IETF RFC 5389, Oct. 2008.

[9] J. Rosenberg, R. Mahy, and P. Matthews, Traversal Using Relays AroundNAT (TURN): relay extensions to Session Traversal Utilities for NAT(STUN), IETF RFC 5766, Apr. 2010.

[10] J. Rosenberg, Interactive Connectivity Establishment (ICE): a protocolfor Network Address Translator (NAT) traversal for offer/answer proto-cols, IETF RFC 5245, Apr. 2010.

[11] P. Iyer, U. Warrier, M. Saaranen, F. Fontaine, Internet Gateway Device(IGD) V 2.0, UPnP Forum, Dec. 2010.

[12] G. Camarillo, C. Holmbger, Y. Gao, re-INVITTE and Target-RefreshRequest handling in the Session Initiation Protocol (SIP), IETF RFC6141, Mar. 2011.

[13] Socialeyes, http://www.socialeyes.com, accessed on May2011.

[14] CNET, Logitech and Panasonic team up for TV skype cam,news.cnet.com, May 2011.

[15] Fraunhofer Presents First True HD Communication over 4G Net-works to Eliminate Distance Barriers in Communication, in Mo-bile World Congress 2011, 14-17th February 2011, Barcelona,http://hdvoicenews.com/2010/02/02

268