Implementing Server Side Data Synchronization for Mobile Apps

Preview:

DESCRIPTION

Today mobile apps are everywhere. These apps cannot count on a reliable and constant internet connection: working in offline mode is becoming a common pattern. This is quite easy for read-only apps but it becomes rapidly tricky for apps that create data in offline mode. This talk is a case study about a possible architecture for enabling data synchronization in these situations

Citation preview

Implementing Server Side Data Synchronization for

Mobile Apps

Michele Orselli CTO@Ideato ! _orso_ ! micheleorselli ! mo@ideato.it

Agenda

scenario design choices

implementation alternative approaches

Sync scenario

A

B

C

Sync scenario

A B C

A B C

A B C

Dealing with conflicts

A1

A2

?

Brownfield project !several mobile apps for tracking user generated data (calendar, notes, bio data) !iOS & Android !~10 K users steadily growing at 1.2 K/month

Scenario

MongoDB !Legacy App based on codeigniter !Existing RPC-wannabe-REST API for data sync

Scenario

get updates: !POST /m/<app>/get/<user_id>/<res>/<updated_from> !!!send updates: !POST /m/<app>/update/<user_id>/<res_id>/<dev_id>/<res> !!

Scenario

api

!!6 different resources, 12 calls per sync !apps sync by polling every 30 sec !every call sync little data !!

Scenario

!!rebuild sync API for old apps + 2 incoming !allow image synchronization !more efficient than previous API !!

Challenge

Existing Solutions

Tstamps, Vector clocks,

CRDTs

syncML, syncano

Azure Data sync

Algorithms Protocols/API

Platform

couchDB, riak

Storage

Not Invented Here?

Don't Reinvent The Wheel, Unless You Plan on Learning More About Wheels

!J. Atwood

!!2 different mobile platforms several teams with different skill level !changing storage wasn’t an option forcing a particular technology client side wasn’t an option

Architecture

Architecture

c1

server

c2

c3

sync logic conflicts resolution

thin clients

!!In the sync domain all resources are the same !For every app one endpoint for getting new data one endpoint for pushing changes one endpoint for uploading images

Implementation

!Get all changes (1st sync): !GET /apps/{app}/users/{user_id}/changes !Get latest changes: !GET /apps/{app}/users/{user_id}/changes?from={from}

Get changes

!Get all changes (1st sync): !GET /apps/{app}/users/{user_id}/changes !Get latest changes: !GET /apps/{app}/users/{user_id}/changes?from={from}

Get changes

timestamp?

timestamp are inaccurate (skew and developer errors) !server suggests the “from” parameter to be used in the next request

Server suggest the sync time

c1 server

GET /changes

{ ‘next’ : 123456, ‘data’: […] }

Server suggest the sync time

c1 server

GET /changes

{ ‘next’ : 12345, ‘data’: […] }

Server suggest the sync time

c1 server

GET /changes

{ ‘next’ : 12345, ‘data’: […] }

GET /changes?from=12345

{ ‘next’ : 45678, ‘data’: […] }

operations: {‘op’: ’add’, id: ‘1’, ’data’:[…]} {‘op’: ’update’, id: ‘1’, ’data’:[…]} {‘op’: ’delete’, id: ‘1’} {‘op’: ’add’, id: ‘2’, ’data’:[…]} !!states: {id: ‘1’, ’data’:[…]} {id: 2’, ’data’:[…]} {id: ‘3’, ’data’:[…]}

what to transfer

!we chosen to transfer states {id: ‘1’, ’type’: ‘measure’, ‘_deleted’: true} {id: 2’, ‘type’: ‘note’} {id: ‘3’, ‘type’: ‘note’} !!ps: soft delete all the things!

what to transfer

How do we generate an unique id in a distributed system? !UUID: several implementations (RFC 4122) !Local Ids/Global Id: server generates GUIDs clients use local ids to manage their records

unique identifiers

c1 server

GET /changes

{‘data’:{’guid’: ‘58f0bdd7-1481’}}

unique identifiers

c1 server

POST /merge{ ‘data’: [ {’lid’: ‘1’, …}, {‘lid’: ‘2’, …} ] }

{ ‘data’: [ {‘guid’: ‘58f0bdd7-1400’, ’lid’: ‘1’, …}, {‘guid’: ‘6f9f3ec9-1400’, ‘lid’: ‘2’, …} ] }

!server handles conflicts resolution mobile generated data are “temporary” until sync to server !conflict resolution: domain indipendent: last-write wins domain dipendent: use domain knowledge to resolve

conflict resolution algorithm (plain data)

function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}

conflict resolution algorithm (plain data)

function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}

conflict resolution algorithm (plain data)

function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}

conflict resolution algorithm (plain data)

no conflict

function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}

conflict resolution algorithm (plain data)

remote wins

function sync($data) {!!! foreach ($data as $newRecord) {!!! ! $s = findByGuid($newRecord->getGuid());!! !! ! if (!$s) {!! ! ! add($newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }! !! ! !! ! if ($newRecord->updated > $s->updated) {!! ! ! update($s, $newRecord);!! ! ! send($newRecord);!! ! ! continue;!! ! }!! ! !! ! updateRemote($newRecord, $s);!}

conflict resolution algorithm (plain data)

server wins

conflict resolution algorithm (plain data)

c1

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

server

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘BBB’, ‘updated’ : ’20’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }

conflict resolution algorithm (plain data)

c1 server

{ ‘lid’: ‘1’, ‘guid’: ‘af54d’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ’guid’: ‘af54d’, ‘data’: ‘AAA’, ‘updated’ : ’100’ }

{ ‘lid’: ‘2’, ‘data’ : ‘hello!’, ‘updated’: ’15’ } POST /merge

{ ‘guid’: ‘e324f’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{‘ok’ : { ’guid’: ‘af54d’ }}

{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}

conflict resolution algorithm (hierarchical data)

!How to manage hierarchical data? !!

{ ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … }

{ ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }

conflict resolution algorithm (hierarchical data)

!How to manage hierarchical data? 1) sync root record 2) update ids 3) sync child records !!

{ ‘lid’ : ‘123456’, ‘type’ : ‘baby’, … }

{ ‘lid’ : ‘123456’, ‘type’ : ‘temperature’, ‘baby_id : ‘123456’ }

function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …

conflict resolution algorithm (hierarchical data)

function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …

conflict resolution algorithm (hierarchical data)

parent records first

function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …

conflict resolution algorithm (hierarchical data)

function syncHierarchical($data) {!!! sortByHierarchy($data);!!! foreach ($data as $newRootRecord) {!! ! !! ! $s = findByGuid($newRootRecord->getGuid());!! ! !! ! if($newRecord->isRoot()) {!!! ! ! if (!$s) {!! ! ! ! add($newRootRecord);!! ! ! ! updateRecordIds($newRootRecord, $data);!! ! ! ! send($newRootRecord);!! ! ! ! continue;!! ! ! }! !! ! !! ! ! …

conflict resolution algorithm (hierarchical data)

no conflict

!! ! …! ! !!! ! if ($newRootRecord->updated > $s->updated) {! ! ! !! ! ! update($s, $newRecord);!! ! ! updateRecordIds($newRootRecord, $data);! ! !! ! ! send($newRootRecord);!! ! ! continue;!! ! } else {!! ! ! updateRecordIds($s, $data);!! ! ! updateRemote($newRecord, $s);!! ! }! !! } else {!! ! sync($data);!! }! !}!

conflict resolution algorithm (hierarchical data)

remote wins

!! ! …! ! !!! ! if ($newRootRecord->updated > $s->updated) {! ! ! !! ! ! update($s, $newRecord);!! ! ! updateRecordIds($newRootRecord, $data);! ! !! ! ! send($newRootRecord);!! ! ! continue;!! ! } else {!! ! ! updateRecordIds($s, $data);!! ! ! updateRemote($newRecord, $s);!! ! }! !! } else {!! ! sync($data);!! }! !}!

conflict resolution algorithm (hierarchical data)

server wins

conflict resolution algorithm (hierarchical data)

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

c1 serverPOST /merge

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘1’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

conflict resolution algorithm (hierarchical data)

c1

{ ‘lid’: ‘1’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

serverPOST /merge

{ ‘lid’: ‘1’, ‘guid’ : ‘32ead’, ‘data’ : ‘AAA’ ‘updated’: ’100’ }

{ ‘lid’: ‘2’, ‘parent’: ‘32ead’, ‘data’ : ‘hello!’, ‘updated’: ’15’ }

{‘update’ : { ‘lid’: ‘1’, ’guid’: ‘af54d’ }}

{‘update’ : { lid: ‘2’, ’guid’: ‘e324f’ }}

!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data?

enforcing domain constraints

!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data? 1) relax constraints

enforcing domain constraints

!!e.g. “only one temperature can be registered in a given day” !how to we enforce domain constraints on data? 1) relax constraints 2) integrate constraints in sync algorithm

enforcing domain constraints

!!from findByGuid to findSimilar !first lookup by GUID then by domain rules !“two measures are similar if are referred to the same date” !!! !

enforcing domain constraints

enforcing domain constraints

c1 server

enforcing domain constraints

c1 server

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

enforcing domain constraints

c1 server

{ ‘lid’: ‘1’, ‘when’: ‘20141005’ }

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

POST /merge

{ ’guid’: ‘af54d’, ‘when’: ‘20141005’ }

!Binary data uploaded via custom endpoint !Sync data remain small !Uploads can be resumed

dealing with binary data

!Two steps* 1) data are synched to server 2) related images are uploaded !* this means record without file for a given time

dealing with binary data

dealing with binary data

c1 server

POST /merge

POST /upload/ac435-f8345/image

{ ‘lid’ : 1, ‘type’ : ‘baby’, ‘image’ : ‘myimage.jpg’ }

{ ‘lid’ : 1, ‘guid’ : ‘ac435-f8345’ }

!Implementing this stuff is tricky !Explore existing solution if you can !Understanding the domain is important

What we learned

vector clocks

!

Conflict-free Replicated Data Types (CRDTs)

!

Constraining the types of operations in order to:

- ensure convergence of changes to shared data by uncoordinated, concurrent actors

- eliminate network failure modes as a source of error

CRDT

Math!!!

!

Bounded-join semilattices

- join operation defining a least upper bound

- partially order set

- always increasing

CRDT

Gateways handles sync

Data flows through channels

- partition data set

- authorization

- limit the data

!

Use revision trees

Couchbase Mobile

Distributed DB Eventually/Strong Consistency !Data Types !Configurable conflic resolution - db level for built-in data types - application level for custom data

Riak

!

Questions? !

Please leave feedback! https://joind.in/11797 !

That’s all folks!

Vector Clocks http://basho.com/why-vector-clocks-are-easy/ http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks http://basho.com/why-vector-clocks-are-hard/ !CRDTs http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html http://www.infoq.com/presentations/problems-distributed-systems https://www.youtube.com/watch?v=qyVNG7fnubQ !Riak http://docs.basho.com/riak/latest/dev/using/conflict-resolution/ !Couchbase Sync Gateway http://docs.couchbase.com/sync-gateway/ http://www.infoq.com/presentations/sync-mobile-data !API http://developers.amiando.com/index.php/REST_API_DataSync https://login.syncano.com/docs/rest/index.html

Links

phones https://www.flickr.com/photos/15216811@N06/14504964841 wat http://uturncrossfit.com/wp-content/uploads/2014/04/wait-what.jpg darth http://www.listal.com/viewimage/3825918h blueprint: http://upload.wikimedia.org/wikipedia/commons/5/5e/Joy_Oil_gas_station_blueprints.jpg!building: http://s0.geograph.org.uk/geophotos/02/42/74/2427436_96c4cd84.jpg!brownfield: http://s0.geograph.org.uk/geophotos/02/04/54/2045448_03a2fb36.jpg!no connection: https://www.flickr.com/photos/77018488@N03/9004800239!no internet con https://www.flickr.com/photos/roland/9681237793!vector clocks: http://en.wikipedia.org/wiki/Vector_clock!crdts: http://www.infoq.com/presentations/problems-distributed-systems

Credits

Recommended