7
IEEE Network • July/August 2002 22 he recent proliferation of mobile computing devices has resulted in a marked increase in the distribution of information. Under the mobile computing paradigm, users are provided access to their data wherever they are: at home on a relatively powerful and Internet-connected personal computer (PC), or on the road with a personal digi- tal assistant (PDA) or laptop, often only weakly or intermit- tently connected to a network. Systems that provide such ubiquitous data access must deal with the inherent problem of data synchronization, that is, changes made on the road should be reflected back in the office or home and vice versa. The task of a data synchronization protocol is to quickly identify these changes, resolve possible conflicts, and propagate updates to the various synchronizing devices. Synchronization protocols for mobile devices must be care- fully designed, since the resources available for these devices are often constrained. This is especially the case with PDAs, which typically communicate over serial or wireless links with low bandwidth (10–100 kb/s). Moreover, PDAs have limited CPU, memory, and power resources, and thus are unable to quickly process or transfer large amounts of data. For exam- ple, the new top-of-the-line Palm m515 PDA is equipped with a 33 MHz Motorola Dragonball VZ processor [1] and 16 Mbytes of RAM storage; its batteries last about one week between recharging. In a networked setting, scalable and effi- cient data synchronization protocols are essential for data- intensive applications such as email editors, spreadsheets, and even collaborative work environments. The goal of this article is to survey a number of representa- tive PDA synchronization protocols and investigate their per- formance with respect to utilization of bandwidth, processing, and memory resources. In particular, we show that existing syn- chronization protocols often do not scale well with the number of devices in a network, and with the size of the data to be syn- chronized. We compare these protocols to a new algorithm, called CPISync [2], which addresses some of the scalability issues. Our conclusions are intended to provide guidance for addressing the scalability issues inherent to synchronizing data over large, heterogeneous, tetherless networks. This article is organized as follows. We first describe some fundamentals of PDA synchronization. Next, we survey, from the perspective of scalability performance, some of the most popular synchronization protocols that have been developed for mobile devices. In particular, we explain Palm’s HotSync synchronization protocol [3], Pumatech’s Intellisync [4] synchronization protocol, and a recent industry-wide initiative called the SyncML [5] stan- dard. Our explanation includes a description of the new scalabili- ty-oriented CPISync [2] protocol. We next provide a taxonomy of the reviewed protocols according to various scalability metrics, and consider future trends. The last section is devoted to con- cluding remarks and directions for further research. Fundamentals of PDA Synchronization A PDA can be represented by a node in an ad hoc network. As such, the node is intermittently connected to the network and needs to synchronize regularly with other nodes on the network to maintain the consistency of its data. This task is relatively simple when the PDA synchronizes with just one desktop machine. However, trends toward pervasive comput- ing demand the ability for multidevice synchronization, in which a PDA synchronizes with various different desktops at home and/or at work, or with other mobile devices such as other PDAs, home digital assistants, or cellular phones. 0890-8044/02/$17.00 © 2002 IEEE On the Scalability of Data Synchronization Protocols for PDAs and Mobile Devices Sachin Agarwal, David Starobinski, and Ari Trachtenberg, Boston University Abstract Personal digital assistants and other mobile computing devices rely on synchroniza- tion protocols in order to maintain data consistency. These protocols operate in environments where network resources such as bandwidth, memory, and process- ing power are limited. In this article we examine a number of popular and repre- sentative synchronization protocols, such as Palm’s HotSync, Pumatech’s Intellisync, and the industry-wide SyncML initiative. We investigate the scalability performance of these protocols as a function of data and network sizes, and compare these pro- tocols to a novel synchronization approach, CPISync, which addresses some of their scalability concerns. The conclusions of this survey are intended to provide guidance for handling scalability issues in synchronizing data on large, heteroge- neous, tetherless networks. T T Part of this work was supported by the National Science Foundation under NSF CAREER Grant CCR-0133521.

On the scalability of data synchronization protocols for PDAs and mobile devices

  • Upload
    a

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 200222

he recent proliferation of mobile computing devices hasresulted in a marked increase in the distribution ofinformation. Under the mobile computing paradigm,users are provided access to their data wherever they

are: at home on a relatively powerful and Internet-connectedpersonal computer (PC), or on the road with a personal digi-tal assistant (PDA) or laptop, often only weakly or intermit-tently connected to a network. Systems that provide suchubiquitous data access must deal with the inherent problem ofdata synchronization, that is, changes made on the road shouldbe reflected back in the office or home and vice versa. Thetask of a data synchronization protocol is to quickly identifythese changes, resolve possible conflicts, and propagateupdates to the various synchronizing devices.

Synchronization protocols for mobile devices must be care-fully designed, since the resources available for these devicesare often constrained. This is especially the case with PDAs,which typically communicate over serial or wireless links withlow bandwidth (10–100 kb/s). Moreover, PDAs have limitedCPU, memory, and power resources, and thus are unable toquickly process or transfer large amounts of data. For exam-ple, the new top-of-the-line Palm m515 PDA is equipped witha 33 MHz Motorola Dragonball VZ processor [1] and 16Mbytes of RAM storage; its batteries last about one weekbetween recharging. In a networked setting, scalable and effi-cient data synchronization protocols are essential for data-intensive applications such as email editors, spreadsheets, andeven collaborative work environments.

The goal of this article is to survey a number of representa-tive PDA synchronization protocols and investigate their per-

formance with respect to utilization of bandwidth, processing,and memory resources. In particular, we show that existing syn-chronization protocols often do not scale well with the numberof devices in a network, and with the size of the data to be syn-chronized. We compare these protocols to a new algorithm,called CPISync [2], which addresses some of the scalabilityissues. Our conclusions are intended to provide guidance foraddressing the scalability issues inherent to synchronizing dataover large, heterogeneous, tetherless networks.

This article is organized as follows. We first describe somefundamentals of PDA synchronization. Next, we survey, from theperspective of scalability performance, some of the most popularsynchronization protocols that have been developed for mobiledevices. In particular, we explain Palm’s HotSync synchronizationprotocol [3], Pumatech’s Intellisync [4] synchronization protocol,and a recent industry-wide initiative called the SyncML [5] stan-dard. Our explanation includes a description of the new scalabili-ty-oriented CPISync [2] protocol. We next provide a taxonomy ofthe reviewed protocols according to various scalability metrics,and consider future trends. The last section is devoted to con-cluding remarks and directions for further research.

Fundamentals of PDA SynchronizationA PDA can be represented by a node in an ad hoc network.As such, the node is intermittently connected to the networkand needs to synchronize regularly with other nodes on thenetwork to maintain the consistency of its data. This task isrelatively simple when the PDA synchronizes with just onedesktop machine. However, trends toward pervasive comput-ing demand the ability for multidevice synchronization, inwhich a PDA synchronizes with various different desktops athome and/or at work, or with other mobile devices such asother PDAs, home digital assistants, or cellular phones.

0890-8044/02/$17.00 © 2002 IEEE

On the Scalability ofData Synchronization Protocols for

PDAs and Mobile DevicesSachin Agarwal, David Starobinski, and Ari Trachtenberg, Boston University

Abstract

Personal digital assistants and other mobile computing devices rely on synchroniza-tion protocols in order to maintain data consistency. These protocols operate inenvironments where network resources such as bandwidth, memory, and process-ing power are limited. In this article we examine a number of popular and repre-sentative synchronization protocols, such as Palm’s HotSync, Pumatech’s Intellisync,and the industry-wide SyncML initiative. We investigate the scalability performanceof these protocols as a function of data and network sizes, and compare these pro-tocols to a novel synchronization approach, CPISync, which addresses some oftheir scalability concerns. The conclusions of this survey are intended to provideguidance for handling scalability issues in synchronizing data on large, heteroge-neous, tetherless networks.

TT

Part of this work was supported by the National Science Foundation underNSF CAREER Grant CCR-0133521.

Page 2: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 2002 23

Single-device synchronization,where the PDA synchronizes to thesame PC every time, is simply han-dled by maintaining modificationflags or timestamps, as done by theFast Sync mode of the HotSyncprotocol detailed in a later section.Using this protocol, a PC and PDAkeep track of the records that havebeen added, modified, or deletedsince their last synchronization.This information is used during thesubsequent synchronization to rec-oncile the databases on bothdevices. In the case of a conflict,where the same record has beenmodified on both the PC and thePDA, the choice of which modifica-tion to keep is determined manual-ly or chronologically.

Multidevice synchronization is more difficult, becausemaintaining the consistency of several devices might requiremany individual synchronizations between pairs of devices.Figure 1 shows a simple data synchronization scenario wherea single initial appointment schedule is modified separately onthree devices. In order for all the devices to end up with thesame appointment schedule, the PDA must first synchronizewith the home PC, then with the office PC, and then againwith the home PC.

In a general peer-to-peer setting, the efficiency of multide-vice synchronization is determined by two factors: the orderin which devices reconcile their data, and the method bywhich these reconciliations are performed. These two factorsdepend in turn on the characteristics of the underlying sys-tem. For example, a synchronization schedule based on a gos-sip protocol [6] would have devices randomly choose partnerswith which to synchronize, ending (with high probability) inthe complete synchronization of the data on all devices. Inthis scenario, the method by which two individual devicessynchronize is extremely important, because its efficiency hasa multiplicative effect on the overall efficiency of synchroniz-ing the network.

An Overview of Synchronization ProtocolsWe now survey several representative protocols for data syn-chronization, with an eye toward scalability performance.Unfortunately, details of some well used protocols, such asMicrosoft’s ActiveSync, are not publicly available and couldnot be surveyed.

Palm HotSync ProtocolPalm PDAs, or Palm Pilots as they are popularly called, runthe Palm operating system. This operating system provides animplementation of the HotSync [3] synchronization protocolas a middleware interface for programs whose data needs tobe synchronized with other devices or PCs. HotSync providestwo modes of operations: Slow Sync and Fast Sync.

Fast Sync — Fast Sync is applicable only when a handheldPDA is being synchronized with the same PC as in its previoussynchronization. In this case the Palm Pilot uses status flags todetermine what changes have occurred since the last synchro-nization. Status flags are maintained for each record in anapplication’s database, and are toggled whenever data is insert-ed, deleted, or modified. Upon a synchronization request bythe user, the handheld transmits to the desktop PC all records

whose status flags have been changed; the PC compares theserecords to its own database with one of the following results:• The handheld record is added to the desktop database

(insertion).• The handheld record replaces the desktop version (modifi-

cation).• The record on the desktop is deleted (deletion).• The record is archived on the desktop machine (archived).

The PC then sends its corrections back to the handheld andall status flags are reset. Note that all comparisons of modifiedrecords, and subsequent decisions about what to keep, are madeon the PC rather than the computationally limited handhelddevice. If more than two devices need to synchronize with eachother, as in Fig. 1, Fast Sync cannot be used, because status flagswould be reset when synchronizing with one device even if theyare needed for synchronizing with the second device. Thus, FastSync trivially does not scale with the network size.

Slow Sync — In all cases where Fast Sync conditions are notmet, a Slow Sync is performed. In other words, a Slow Sync isperformed whenever the handheld device synchronized lastwith a different desktop, as might happen if one alternatessynchronization with one computer at home and another atwork. In such cases, the status flags do not reliably convey thedifferences between the synchronizing systems; instead, thehandheld device sends all of its data records to the desktop.Slow Sync then refers to a backup copy of the data, which wasstored on the PC during the last synchronization with thePDA, to determine which records have been modified, added,or deleted. The synchronization process completes in thesame way as the Fast Sync case.

Figure 2 depicts the difference between HotSync’s SlowSync and Fast Sync. In Fast Sync, the amount of communica-tion between the PC and the PDA is reduced because onlythe modified records are sent from the PDA to the PC; in theSlow Sync case the entire database is transferred to the PC.For this reason Fast Sync will typically be much more efficientthan Slow Sync with respect to latency and bandwidth usage.

An undesirable property of Slow Sync is that its communica-tion and time complexities increase linearly with the number ofrecords stored in the device, independent of the number ofrecord modifications. Figure 3 illustrates this phenomenon on aPalm IIIxe PDA, as measured by a demonstration version ofthe Frontline Test Equipment software [7]. Specifically, the fig-ure shows that the amount of data transfer needed to completea Slow Sync grows linearly with the number of records stored inthe device, whereas for Fast Sync it remains almost constant.Thus, Slow Sync does not scale with the number of entries

� Figure 1. An appointment schedule modified on different devices: a home machine, anoffice machine, and a PDA.

Initial schedule

PDA

Office PC

Home PC

9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack4 pm Dentist's appointment

Modified office schedule

9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack2 pm Call home4 pm Dentist's appointment

Modified home schedule

9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack4 pm Dentist's appointment

Modified PDA schedule

9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jackie4 pm Dentist's appointment

Page 3: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 200224

stored on the devices. In fact, Slow Sync can require as longas 20 min for large, but practical, database sizes.

IntellisyncThe Intellisync Anywhere [4] product family from Pumatechseeks to make every synchronization “Fast Sync enabled” in orderto minimize connection times. The technology relies on a centralserver, with which all devices synchronize, and on which useraccounts and security settings are maintained. Since each devicealways synchronizes with the same server, all synchronizations canbe characterized as Fast Syncs. Thus, the performance graphs forIntellisync will be very similar to the ones shown in Fig. 3 for FastSync. Moreover, no peer-to-peer connections are needed to syn-chronize multiple devices; instead, each device connects to thecentral server, which maintains a modification history.

The system enables synchronization of corporate applica-tions, such as Microsoft Outlook and Exchange, betweendesktop and mobile devices, including Palm PDAs, PocketPCs, and Symbian handhelds. Figure 4 shows a company net-work using an Intellisync Anywhere server to synchronizemobile devices using the company’s LAN, tethered terminals(e.g., PDA cradles attached to any computer on the network),wireless LANs, or the Internet.

Periodically, the Intellisync Anywhere Server synchronizeswith a Microsoft Exchange server connected to fixed devices.The period between two consecutive synchronizations is con-trolled by the server administrator, and reflects a trade-offbetween the accuracy of a user’s information and the commu-nication overhead needed for frequent synchronizations.

This protocol suffers from the same problems that generallyplague centralized architectures, namely a central point of fail-ure and communication, and a lack of scalability with increasingnumbers of clients. If the central server fails or becomes con-gested, synchronization is affected for all clients; this problemcan be mitigated with the use of several central servers, but notwithout introducing secondary problems in synchronizing thevarious servers. Moreover, no matter how the server is situated,it will necessarily be farther away from certain locations thanothers, translating to increased synchronization latency.

SyncMLSyncML [5] is an open industry initiative supported by hun-dreds of companies including Ericsson, IBM, Lotus, Matsushi-ta, Motorola, Nokia, Openwave, and Starfish. It seeks toprovide an open standard for data synchronization across dif-ferent platforms and devices.

To minimize communicationtime, the standard assumes thateach device maintains informationmodification flags for each of itsrecords with respect to every otherdevice on the network. Thus, mod-ifying a record on a PDA wouldentail toggling not just one set ofstatus flags as in Fast Sync, butrather one set of status flags forevery device on the network. Aftertwo devices synchronize with eachother, only status flags correspond-ing to these two devices arecleared. In this way, each synchro-nization looks like a Fast Sync,with performance similar to thatdepicted in Fig. 3.

Since a given device might syn-chronize with any other device, a

SyncML implementation would require a device to maintainmodification information about each of its records withrespect to each other device. Thus, the amount of memoryneeded by a network of n devices each with r records wouldbe roughly nr units/device. This memory overhead can easilybecome significant. Consider, for instance, a small network ofn = 20 devices, maintaining a replicated database of r =10,000 records, each associated with 8 bytes of metadatadetailing modification time, status flags, and so on. The totalmemory overhead for this system would exceed 1 Mbyte,which can be a substantial amount of memory for a portabledevice. Finally, it should be noted that timestamp protocolssuch as SyncML adapt poorly to dynamic situations wheredevices are frequently added or removed from the network.This is because adding or removing a device from the network

� Figure 2. A comparison between the two modes of HotSync — Slow Sync and Fast Sync.Slow Sync transfers the entire database from the PDA to the PC, whereas Fast Sync transfersonly modifications made after the last synchronization.

Home PCPDA

Slow sync

Fast sync

Modifications

Initial schedule9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack4 pm Dentist's appointment

Modified home schedule9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack4 pm Dentist's appointment

Modified home schedule9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jack4 pm Dentist's appointment

Modified PDA schedule9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jackie4 pm Dentist's appointment

Modified PDA schedule9 am Meet Tom10 am Meet Nancy12 pm Lunch with Jackie4 pm Dentist's appointment

� Figure 3. Comparison between the communication complexi-ties (in bytes) of the two modes of HotSync, Slow Sync and FastSync. Measurements were repeated with an increasing numberof 100-byte records on each device, but a fixed number of differ-ences (i.e., 10).

Number of records stored on each device

300000

Am

ount

of

byte

s tr

ansf

erre

d

1

2

3

4

5

6

7X105

2500200015001000500

Slow SyncFast Sync

Page 4: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 2002 25

entails an update to every otherdevice in the network.

CPISyncThe CPISync algorithm [8, 9], whichis an abbreviation for CharacteristicPolynomial Interpolation Synchro-nization, is based on an algebraicsolution to the problem of reconcil-ing two remote sets of information.This algorithm has been shown [2]to be significantly more efficientthan Slow Sync in the most com-mon scenario of data synchroniza-tion, that being where the numberof differences to be reconciled isfar smaller than the sizes of the rec-onciling databases. This scenario istypically the case for PDA applica-tions, where changes to an addressbook, for example, typically consist of only one or two entries,whereas the address book might contain hundreds of entriesin total.

Unlike the previous protocols, the CPISync algorithm iscomputationally intensive (time nearly cubic in the number ofdifferences between synchronizing devices), although almostall the computation can be done on one device. Thus, in aPDA–PC synchronization, the computational component ofthe protocol can be shifted to the relatively fast PC.

One feature of the CPISync algorithm is that it maintains thecommunication efficiency of Fast Sync in a general peer-to-peersetting. The amount of data it transfers for a synchronizationdepends linearly on the number of differences between databas-es, and is for all intents and purposes independent of the over-all sizes of the databases. Unlike the previous protocols,CPISync does not need to maintain any state information aboutany other devices on a network, and it can function in a peer-to-peer fashion in an ad hoc environment. In effect, CPISynccan truly operate as a middleware interface to any protocol thatmaintains consistency of distributed information.

Protocol Overview — The CPISync algorithm representsdatabases of records by sets of integers, with each integer rep-resenting a record. A set {x1, x2, x3, …, xn} is represented by acharacteristic polynomial

χS(Z) = (Z — x1)(Z — x2)(Z — x3)… (Z — xn).

The key to CPISync is the observation that for two sets SAand SB,

(1)

where ∆A represents the set difference SA — SB, and similarly∆B = SB — SA. This is because terms common to both setscancel out in the numerator and denominator. Thus, therational function in Eq. 1 can be uniquely interpolated fromm samples, if there are at most m differences between syn-chronizing sets.

The CPISync algorithm can be generally described as fol-lows:• Hosts A and B evaluate their characteristic polynomials on

m sample points (over a chosen finite field). Host A sendsits evaluations to host B.

• The evaluations are combined to compute m sample pointsof the rational function

which are interpolated to determine

• The numerator and denominator of the interpolated func-tion are factored to determine the differences between SAand SB.

Experimental Data — Figure 5 shows the superior scalability(in terms of synchronization time) of CPISync over that ofSlow Sync, for two devices with a fixed number of differences.The results were obtained for a Palm IIIxe PDA synchroniz-ing with a Pentium III class machine. The CPISync time doesnot grow with the number of records on the device, whereasSlow Sync time grows linearly. For large databases and largenumbers of synchronizing devices, this difference in time andcommunication is substantial, rendering Slow Sync effectivelyunusable for such environments.

The nearly cubic computation complexity of CPISync implies

χ

χ∆

A

B

Z

Z

( )

( ).

χ

χSA

SB

Z

Z

( )

( ),

χ

χ

χ

χSA

SB

A

B

Z

Z

Z

Z

( )

( )

( )

( ),= ∆

� Figure 4. An Intellisync Anywhere server installed on a company's network.

Wirelessaccesspoint

Internetaccesspoint

TetheredPDA

TetheredPDA Tethered

PDA

Modem

Periodic synchronization (e.g., 15 min)

Company network (LAN)

Intellisync Anywhereserver

Microsoftexchange server

WirelessLAN

Internet

� Figure 5. Time needed to synchronize with CPISync versusSlow Sync.

Number of records

10,00000

200

Tim

e (s

)

150

100

50

8000600040002000

Slow SyncCPISync

Page 5: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 200226

that there is a threshold number of differences below whichCPISync is faster than Slow Sync and above which Slow Sync isfaster than CPISync. This threshold can be determined analytical-ly for a particular system [2], and then used in a hybrid schemefor efficient synchronization in a general setting. Note that thevalue of the threshold is typically quite large, making CPISync theprotocol of choice for many synchronization applications. Forinstance, for 1000 records, this threshold corresponds to about400 differences, meaning that if up to 400 records differ betweensynchronizing devices, CPISync will be the faster algorithm.

Figure 6 depicts an analytical computation of this thresh-old. It shows that Slow Sync time increases linearly with thesize of the database, whereas CPISync time remains essential-ly constant as the database size increases. On the other hand,CPISync time increases rapidly with increasing number of dif-ferences between the two synchronizing databases, whereasSlow Sync remains fairly constant. Recent results [10] modifyCPISync to avoid cubic computational growth at the expenseof multiple rounds of communication.

Taxonomy of Synchronization ProtocolsIn this section we first provide a qualitative comparison of thescalability performance of the various protocols discussed inthis work. We then compare the suitability of these protocolsto different applications and consider future trends.

ScalabilityEach of the synchronization protocols described in the previ-ous section addresses different scalability concerns. The“radar” diagrams in Fig. 7 qualitatively compare these proto-cols, highlighting their strengths and weaknesses. Each proto-col is judged according to the following criteria.

Data Transmission Load: This refers to the amount of dataexchanged between synchronizing devices. It is a particularlyimportant metric of scalability because it directly affects the

time needed for synchronization. CPISync requires onlya little more communication than the information theo-retic minimum needed for synchronization [9].

Computation: This refers to the amount of processingdone by devices when they synchronize. Computationalcapabilities of PDAs and mobile devices are generallyrelatively weak, so this is an important metric. Slow Syncis one winner on this criterion because it requires almostno computation. CPISync, on the other hand, is compu-tationally intensive, but most of its computation is asym-metric and can be done on the more powerfulsynchronizing device.

Network Size: This refers to the ability of the proto-col to scale with increasing numbers of nodes in thenetwork. Fast Sync is weakest in this category becauseit only allows two hosts to synchronize. SyncML is alsounsuitable for large networks because each device hasto maintain synchronization information with respectto every other device in the network. Intellisync,although better, is still not very scalable with networksize because its central server takes all the synchroniza-tion load.

Robustness: This is a desirable feature of any multi-device synchronization protocol. Intellisync is not veryrobust because its server presents a central point of fail-ure. The other protocols are more distributed, so therisk of synchronization failure is significantly lower.

Memory: This relates to the amount of memory over-head needed by each synchronization protocol. SyncMLis the least scalable protocol from this perspective. SlowSync is scalable as long as PDAs synchronize only with

PCs, whose large memory can usually accommodate a localcomparison of two synchronizing databases.

Each of the synchronization methods excels in scalability with-in a particular framework and for specific categories. Fast Syncworks best for two-device systems, whereas Slow Sync is suited tocomputationally weak networks with very fast interconnections.Intellisync and SyncML are suitable for small to mid-sized net-works, and CPISync is best suited to large networks with relative-ly slow interconnections. The ideal protocol, corresponding to afull pentagon in the radar graphs of Fig. 7, can probably beforged out of a hybridization of these various protocols.

Scenarios and TrendsThe HotSync protocol lends itself well to scenarios where syn-chronization always takes place between the same two devices,thus allowing for activation of the Fast Sync mode. Otherwise,it may be useful only when the size of the synchronizingdatabases is small (relative to the link speed) so that the datatransfer time of Slow Sync is not a major constraint. Typicalcurrent usage of PDAs, in which synchronization only occurswith one PC, is thus well matched to HotSync.

It is interesting to note that the latency problem of SlowSync may not necessarily be solved by the use of faster con-nection interfaces, such as Firewire (IEEE 1394) [11] andUSB [12], due to the corresponding increase in the storagecapacity of handheld devices. Table 1 compares the timerequired to perform a wholesale data transfer for various rep-resentative PDAs and interface technologies. We observethat a Palm Pilot (1 Mbyte storage) using a serial link totransfer its entire data to a PC will require approximately thesame order of time (i.e., 1 min) as a state-of-the-art AppleIpod (5 Gbytes storage) using the Firewire 1394b data trans-fer technology.

We expect centralized architectures such as Intellisync, toprovide good solutions for small to mid-sized corporate net-

� Figure 6. Synchronization time of Slow Sync vs. CPISync. The inter-section of the two surfaces marks the threshold values at which SlowSync and CPISync require equal amounts of time.

Number ofrecords

CPISync

Differences

Slow Sync

2000

50

Tim

e (s

)

0

100

150

200

04000 6000 8000

1200 1000 800 600 400 200 0

CPISync

Slow Sync

Page 6: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 2002 27

works. For example, solutions based on the Intellisync tech-nology have been deployed to allow school administrators ofthe New York City Board of Education to efficiently synchro-nize email and calendar information between their Palm OShandhelds and the New York Board of Education’s MicrosoftExchange server.

The SyncML standard will allow integration of a heteroge-neous mix of devices manufactured by different companies. Forinstance, IBM recently unveiled the SyncML-based DB2 Every-place relational database to manage data in PDAs and embed-ded devices. The synchronization architecture of DB2Everyplace [13] is based on a central server, making it akin tothe Intellisync architecture. We believe that SyncML will alsobe useful in peer-to-peer settings. However, its memory over-head may become a serious limitation in large networks.

The CPISync protocol is a future technology. The primarystrength of this protocol is its ability to perform synchroniza-tion with an almost minimal amount of communication.CPISync is thus a strong candidatefor wireless synchronization anddynamic peer-to-peer settings. Withincreasing CPU capabilities webelieve that computation will ceaseto be a serious bottleneck forCPISync, as shown by Fig. 8.Although we have not yet imple-mented CPISync between two PDAs,newer PDAs, such as those based onStrongARM [14] processors, willmost probably run CPISync withacceptable computation times. Forexample, some commercial modelsof the iPAQ Pocket PC based onStrongARM architectures run atclock speeds of 206 MHz, approach-ing the lower end of the clock speedsshown in Fig. 8. For applications with

low round-trip communication latency, the latest results in [10]would eliminate the computational bottleneck of CPISync alto-gether.

ConclusionIn this article we survey a number of synchronization proto-cols available for synchronizing PDAs with other mobiledevices and wired PCs. Each of the protocols has its strengthsand weaknesses, arbitrating scalability in three parameters:• The amount of information exchanged between two devices

during synchronization• The amount of information that needs to be stored in the

memory of the devices to allow consistent synchronization• The computational complexity of the synchronizationIn some applications with large propagation delays betweenhosts, the number of rounds of communications per synchro-nization, which is to say the number of times there is a two-

� Figure 7. Scalability strengths and weaknesses of the different synchronization protocols. An absolute scale is used in which high valuesdenote strengths and low values denote weakness.

Computation

Fast Sync

Data transmission load

Network size

MemoryCentral point of failure

Computation

CPISync

Data transmission load

Network size

MemoryCentral point of failure

Computation

Slow Sync

Data transmission load

Network size

MemoryCentral point of failure

Computation

SyncML

Data transmission load

Network size

MemoryCentral point of failure

Computation

Intellisync

Data transmission load

Network size

MemoryCentral point of failure

� Table 1. Approximate time required to perform wholesale data transfer (Slow Sync).Recent data transfer technologies such as USB and Firewire have reduced data transfertimes by orders of magnitudes, although the potential size of data that may be transferred toor from a PDA has also increased accordingly.

RS-232 @ 115 kb/s 1 minute 10 minutes 1 hour 4 days

USB 1.0 @ 12 Mb/s 1 second 6 seconds 1 minute 1 hour

Ethernet @ 100 Mb/s 80 ms 1 second 5 seconds 7 minutes

Firewire 1394 @ 400 Mb/s 20 ms 160 ms 1 second 2 minutes

USB 2.0 @ 480 Mb/s 18 ms 150 ms 1 second 1 minute

Firewire 1394b @ 800 Mb/s 10 ms 80 ms 1 second 1 minute

PDA Palm Pilot Palm m505 iPAQ, HP Jornada Apple IpodStorage capacity 1 Mbyte 8 Mbytes 64 Mbytes 5 Gbytes

Page 7: On the scalability of data synchronization protocols for PDAs and mobile devices

IEEE Network • July/August 200228

way exchange of communication between devices, may also bean important factor.

The optimum choice of synchronization protocol dependslargely on the application and the type of data being synchro-nized. For example, if data changes slowly relative to the fre-quency of synchronizations, CPISync is the most scalablesolution. On the other hand, if data changes frequently and

network connections are very fast, Slow Syncshould be used. Therefore, an important areaof further research is the development of statis-tical models for user activities that predict howoften users synchronize their data and whatamount of data is typically updated during syn-chronization events. The development of suchmodels would require collection and processingof real data from users.

This article has shown that the developmentof scalable data synchronization protocols is ofcritical importance for the deployment of large-scale mobile computing platforms. We expectthat the conclusions of this article will highlightscalability issues for future-generation tether-less systems.

References[1] Motorola, “Dragonball: Enabling the Next Generation of

Wireless Devices,” http://e-www.motorola.com/collater-al/DBBROCHURE.pdf

[2] A. Trachtenberg, D. Starobinski, and S. Agarwal, “FastPDA Synchronization Using Characteristic PolynomialInterpolation,” IEEE INFOCOM, 2002, to appear.

[3] Palm developer knowledgebase manuals, ht tp://palmos.com/dev/support/docs/palmos/ReferenceTOC.html.

[4] P. Whitepapers, “Invasion of the Data Snatchers,” 1999, http://www.pumat-ech.com/enterprise/wp-1.html

[5] SyncML specifications, http://www.syncml.org/downloads.html[6] A. J. Demers et al., “Epidemic Algorithms for Replicated Database Mainte-

nance,” Proc. 6th Annual ACM Symp. Principles of Dist. Comp., no. 6, Van-couver, British Columbia, Canada, Aug. 1987, pp. 1–12.

[7] Frontline test equipment, http://www.fte.com[8] Y. Minsky, A. Trachtenberg, and R. Zippel, “Set Reconciliation with Nearly

Optimal Communication Complexity,” Tech. rep. TR1999-1778, TR2000-1796, TR2000-1813, Cornell Univ., 2000.

[9] Y. Minsky, A. Trachtenberg, and R. Zippel, “Set Reconciliation with NearlyOptimal Communication Complexity,” Int’l. Symp. Info. Theory, June 2001,p. 232.

[10] Y. Minsky and A. Trachtenberg, “Practical Set Reconciliation,” Boston Univ.tech. rep. ECE TR2002–03.

[11] IEEE 1394, http://computer.org/multimedia/articles/firewire.htm[12] Universal serial bus, http://www.usb.org/developers/docs.html[13] A. Noether, Extending Enterprise Data to Mobile Devices, http://www-

3.ibm.com/software/data/db2/everyplace/extendingenterprise.pdf[14] Intel PCA application processors, ht tp://developer.intel.com/

design/pca/applicationsprocessors/index.htm

BiographiesSACHIN AGARWAL ([email protected]) is a graduate student at Boston University’s ECEDepartment, where he has been working at the Laboratory of Networking andInformation Systems as a graduate research assistant. He received his Bachelor’sdegree in electronics and communication engineering from the Regional Engi-neering College, Warangal, India, in 2000. He was a recipient of the Corpo-rate/NSF Travel Award for the IEEE INFOCOM 2002 conference. His interestsinclude data synchronization in distributed and mobile networks, epidemiologicalprotocols, and data replication in networks.

DAVID STAROBINSKI ([email protected]) received his Ph.D. in electrical engineeringfrom the Technion, Israel Institute of Technology, in 1999. In 1999–2000 he wasa visiting post-doctoral researcher at the EECS Department in the University ofCalifornia, Berkeley. In 2000 he joined the Electrical and Computer EngineeringDepartment at Boston University as an assistant professor. He received a fellow-ship for prospective researchers from the Swiss National Science Foundation,and awards from Intel Corp. and the Gutwirth Foundation for outstanding aca-demic achievements. He is on the Program Committee of IEEE INFOCOM andIEEE CCW. His research interests are in network performance evaluation, Internettraffic engineering, and high-speed and wireless networking.

ARI TRACHTENBERG ([email protected]) is an assistant professor of electrical andcomputer engineering at Boston University. He received his S.B. degree from MITin 1994 in mathematics with computer science, and his M.S. and Ph.D. degreesin computer science from the University of Illinois at Urbana/Champaign in1996 and 2000, respectively. At the University of Illinois he was a UniversityFellow and received the Kuck Outstanding Ph.D. Thesis award. He also receivedthe NSF CAREER award in 2002 for work on practical data synchronization. Hisresearch interests include data synchronization in distributed systems, error-cor-recting codes, and computational algorithms.

� Figure 8. Computation time of CPISync using various processors for the recon-ciliation of 200 differences between two databases of arbitrary size.

Processor

PentiumPro

200 MHz

PII266 MHz

Celeron433 MHz

PII600 MHz

PII900 MHz

P41.4 GHz

DualXeon

1.7 GHz

0

12Ti

me

(s)

14

10

8

6

4

2