Upload
sybase-an-sap-company
View
8.816
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
SUP 2.1 KNOWLEDGE TRANSFER DATA MODELING
MICHAEL HO
OCTOBER 20, 2011
2 – Company ConfidenAal – December 12, 2011
CONTENTS
• MBO DefiniAon • Data Loading • Cache Policies • Data Model ImplicaAons on Client • Challenges in the Field • SUP 2.1.1 Preview
3 – Company ConfidenAal – December 12, 2011
MBO DEFINITION
THE MBO DATA MODEL IS THE CLIENT DATA MODEL
4 – Company ConfidenAal – December 12, 2011
MBO DATA MODEL
• MBO data model is NOT a model for backend business objects • It is the data model for the mobile applicaEon on the device
– A user replicates the data model for the desktop applicaAon on the device where many of the aTributes are never used, leading to slow synchronizaAon and performance degradaAon
• Empirical data shows that MBO data model impacts not only mobile applicaEon development but synchronizaEon performance
• MBO definiEon should take into consideraEon mobile database limitaEons
• SynchronizaEon group defines what to synchronize • Cache group defines what and when to load from backend to fill the tables in CDB
5 – Company ConfidenAal – December 12, 2011
MBO DATA MODEL • RelaEonship enables navigaEon, whole-‐part removal, cascade operaEons
– Supports associaAon (items product) and composiAon (sales order items)
• Surrogate key scheme on the client database
– As primary key for synchronizaAon – Foreign key to implement relaAonship
– Cache associates surrogate key with backend primary key
6 – Company ConfidenAal – December 12, 2011
MBO DATA MODEL • Backend create operaEon expected to return primary key so associaEon between surrogate key and primary key can be formed
• SynchronizaEon parameters serve as subscripEon to download data
– Can have mulAple sets of synchronizaAon parameters at any given Ame
– Data corresponding to these sets of synchronizaAon parameters are downloaded to the device
– May delete en#re collecAon of synchronizaAon parameters sets to reclaim storage space
7 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE MBO DEFINITION • Every aUribute is used by the mobile applicaEon • MBO instance = database row (must fit within a page)
– Large row size requires larger page size impacAng performance on device and synchronizaAon
– Do not define a MBO with more than 50 aTributes
– Do not use STRING data type. Instead, use STRING(n) to define the maximum string length (STRING defaults to 300)
– PromoAon of VARCHAR(n) to LONG VARCHAR can occur during code generaAon if the specified page size is less than the calculated maximum row size
– Use larger page size during code generaAon and run with a smaller one on device if normal size is much lower than maximum
8 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE INDEX • Use the minimum number of indexes to support queries used by the mobile applicaEon
– Index slows down update operaAons on device and synchronizaAon, especially on low end devices
– Uncheck findByPrimaryKey and FindAll queries generated for each MBO by default if they are not needed by the mobile applicaAon
– Determine if index should be created for user defined object queries
9 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE SYNCHRONIZATION GROUP • Use synchronizaEon group to add flexibility on what to synchronize
– Controls which MBOs to synchronize at a parAcular Ame
– Supports prioriAzaAon i.e. get service Ackets without details – Limits the amount of data during synchronizaAon for customers facing impaired connecAvity to avoid repeatedly trying to complete a large synchronizaAon
– Think twice if the synchronizaAon group has more than 5 members
– Run Ame flexibility available by combining synchronizaAon group to reduce overhead
– RelaAonship across synchronizaAon groups may result in incomplete object graphs on the client
10 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE CACHE GROUP • Use cache group to control what and when to load data into CDB
– Break up expensive data retrievals from backend
– RelaAonship across cache group may result in incomplete object graphs in CDB
– Mapping cache group to synchronizaAon group reduces unnecessary refresh not related to the triggering synchronizaAon
– Avoid circular dependencies between cache groups. – Similarly, avoid driving the load of an MBO in one cache group based on the aTributes of an MBO in another cache group
11 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE PRIMARY KEY • MBO primary key should match backend business key • MBO with a composite primary key and the EIS load operaEon parameters do not match in scope, data may be duplicated in the cache
• If no primary key is modeled, an implicit composite primary key that is made up of all columns is generated
12 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE SYNCHRONIZATION PARAMETER • SynchronizaEon parameters should be defined and mapped to all result-‐affecEng load parameters
– MBO uses WS operaAon getAllBooksByAuthor(Author, userKey) where userKey is simply a mechanism to authenAcate a user and does not effect the results of the operaAon. In this case "userKey" is not a result-‐affecAng parameter and therefore should not be mapped to a synchronizaAon parameter
– MBO uses WS operaAon getEmployees(Group, department). Group is mapped to a synchronizaAon parameter but department is mapped to a personalizaAon key. The parAAon idenAfied by Group is now constantly overwriTen
13 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE PARTITION • ParEEon is parEEons/unique EIS result sets idenEfied by the result affec+ng load parameters
• Use mulEple parEEons whenever possible. By default, we have a single parEEon – Increase parallelism
Load vs. load, load vs. update
– Reduce refresh latency – Data/rows must not be contained in mulAple parAAon otherwise data need to be constantly updated, even if the actual business data didn't change, as it will bounce between parAAons and that severely impacts performance and download incorrect data to client
14 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE PARTITION
– ParAAons can also be used to get data "real-‐#me" when using a cache interval of zero and very small parAAons in their own cache group/sync group relaAonship
– ParAAon granularity Too coarse -‐ long refresh Ame
Too fine – high overhead due to EIS/SUP chadness. It is more efficient to have reasonably chunky interface between server networked components
15 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE SHARED READ MBO • Use shared read MBO when appropriate
– Populate mulAple MBOs with a single data retrieval invocaAon for efficient data loading
– All MBOs share the same parAAon key, they will always be loaded/refreshed together when the cache expired
– Apply operaAons results always only applies to the MBO instance the operaAon is executed on, you can not fill mulAple MBO from a single operaAon output/result*
– For “Apply Results to Cache” to work, the output from operaAon has to look just like the output from the primary read*
16 – Company ConfidenAal – December 12, 2011
GOOD MBO MODELING PRACTICE SHARED READ MBO
– Shared read is very useful to read objects into the cache efficiently and transacAonally, you sAll have to use client parameters for transacAonal write operaAons* Child rows cannot be apply to cache
– AlternaAvely, use MLI to chain discrete creaAon operaAons within the hierarchy Root create operaAon returns primary key for associaAon with surrogate key and child creaAon
– Apply results should be used whenever possible, if one wants to maintain the surrogate -‐> business key affinity
– Invalidate cache is required to be used if the device side content must be confirmed in the most Amely fashion MulAple parAAons to reduce invalidate-‐refresh cost to retrieve result
17 – Company ConfidenAal – December 12, 2011
DATA LOADING
FILLING THE CDB WITH ENTERPRISE DATA
18 – Company ConfidenAal – December 12, 2011
DATA LOADING DESIGN PREPARATION • Know Thy Data
– Reference vs. TransacAonal: Mostly Read vs. Read/Write
– Shared vs. Private – Sources of changes: coherency implicaAons
– Update frequency and freshness requirement
– Access paTern: peak and valley or distributed – Data volume: size does maTer
• Know Thy Data Sources – Efficiency of interface
Protocol: JCO vs. Web Services
API: Number of invocaAons required
– Push vs. Pull – ReacAon to peak load
19 – Company ConfidenAal – December 12, 2011
RULE OF THUMB RE: DATA LOADING • Do not use exisEng API just because it is there
– Evaluate its efficiency for loading data into CDB
– Develop custom mobile adapAon if appropriate – Load what is needed not what is provided
• Use an efficient interface (protocol) for high data volume
• Use DCN for very large data volume – Avoids large data transfer and differenAal calculaAon – Does not help with iniAal loading
• Use mulEple parEEons to split the loading whenever possible
– Private data should consider the use of “parAAon by requester and device idenAty” or equivalent
– Develop backend API to load by parAAon if appropriate
20 – Company ConfidenAal – December 12, 2011
RULE OF THUMB RE: DATA LOADING • Do not mix DCN with scheduled or on demand • Do not use very large DCN message to improve efficiency
– Excessive memory consumpAon to process large message – May block download due to many locked rows
• Use cache groups to group MBOs with similar usage characterisEcs to tune load performance – Reference vs. transacAonal – Private vs. shared
• Use shared read operaEons if possible – Reduce backend interacAons
21 – Company ConfidenAal – December 12, 2011
REFERENCE DATA • [Mostly Read, Large Data Volume, Low VolaElity] • Strategy: Cache and Share • On Demand with non zero cache interval
– Alleviate large iniAal data loading issue through parAAoning if users take different subsets of the reference data Large iniAal load is spread out over Ame
Load data on demand and in parallel
– SaAsfy data freshness requirement through cache interval
22 – Company ConfidenAal – December 12, 2011
REFERENCE DATA • Scheduled
– Match backend with predetermined reference data update schedule e.g. batch run @ midnight
– ParAAoning to restrict loading only for subscribed data – Not recommended for high data freshness if backend data is volaAle as we are limited by the update interval
23 – Company ConfidenAal – December 12, 2011
REFERENCE DATA • DCN (Fill and Filter Model)
– Enable backend data change propagaAon to cache with lowest cost compared to on demand or scheduled
– Enables SIS without extra work – Supports high data freshness through proper change detecAon interval
– ∞ cache interval
– Use synchronizaAon parameters for filtering download data – High iniAal load cost for large data volume can be an issue
24 – Company ConfidenAal – December 12, 2011
SERVER INITIATED SYNCHRONIZATION REFERENCE DATA • On Demand
– Change detecAon funcAonal if someone refreshed the data + cache expiraAon (NZCI)
• Scheduled – Change detecAon funcAonal whenever the change is pulled into the system
• DCN – Most opAmal with SIS as changes are pushed to the cache
– Change detecAon funcAonal aper push • NoEficaEon MBO paUern for On Demand with ZCI
25 – Company ConfidenAal – December 12, 2011
READ/WRITE DATA WITH CACHE INTERVAL
• Data in cache using Non Zero Cache Interval ≠ System of Record
– It helps to reduce number data retrieval invocaAons to backend
– Apply results to cache is not always the same as what is in the backend even when the operaAon succeeds
– Race condiAon can produce a stale result in the cache unAl next refresh. In case of DCN, it may be unAl the next update
26 – Company ConfidenAal – December 12, 2011
TRANSACTIONAL DATA (PRIVATE) • [Read/Write, Per User Data, Low Volume, Moderate VolaElity] • On Demand with zero cache interval
– Data is always consistent with backend – Requester/Device based or equivalent parAAoning limits refresh cost
– SIS can be implemented through noAficaAon MBO paTern – Evaluate if backend can handle peak load if users tend to synchronize within certain Ame of the day
• On Demand with non zero cache interval – No benefit unless user synchronizes repeatedly in succession e.g. submidng operaAon and downloading of applied results
27 – Company ConfidenAal – December 12, 2011
TRANSACTIONAL DATA (PRIVATE) • Scheduled
– Enables SIS based on cache interval – Performance implicaAons
• DCN – Enables SIS without extra work – Supports high data freshness through proper change detecAon interval
– Data is not always consistent with backend. May require another update to fix the inconsistency
See notes
28 – Company ConfidenAal – December 12, 2011
TRANSACTIONAL DATA (SHARED) • Sharing at two levels
– MBO instances level (ML)
– ParAAon level (PL) • On Demand with non zero cache interval
– SIS based on user synchronizaAon acAvity + expiraAon – ParAAon by Requester and Device IdenAty (ML)
Duplicated rows in non-‐overlapping parAAons
Duplicated parAAons if user has mulAple devices
– ParAAon by user specific idenAty (ML) Make sure that the user specific idenAty is combined with backend primary key to form the MBO primary key to avoid shared row bouncing between parAAons
29 – Company ConfidenAal – December 12, 2011
CACHE POLICY
STAGING VS. CACHING
30 – Company ConfidenAal – December 12, 2011
CACHE POLICY: ON DEMAND • Refresh triggered by synchronizaEon • Zero cache interval
– Allows latest data from backend to be retrieved – Unless data volume is small, should be coupled with parAAoning
– User synchronizaAon acAviAes allow changes to be detected • Non zero cache interval
– Reduce data loading invocaAons against backend – Coupled with parAAoning to reduce amount of data to be loaded per invocaAon
– User synchronizaAon acAviAes + cache interval expiraAon allow changes to be detected
– Chances of inconsistency with backend – Increase parallelism when for shared data
31 – Company ConfidenAal – December 12, 2011
CACHE POLICY: ON DEMAND • Refresh triggered by synchronizaEon • Zero cache interval
– Allows latest data from backend to be retrieved – Unless data volume is small, should be coupled with parAAoning
– User synchronizaAon acAviAes allow changes to be detected
32 – Company ConfidenAal – December 12, 2011
CACHE POLICY: SCHEDULED • AutomaEc refresh based on interval • Cache interval is base case noEficaEon granularity • ParEEoning helps to spread out iniEal data loading • Match backend data update frequency especially for reference data
• Chances of inconsistency with backend
33 – Company ConfidenAal – December 12, 2011
CACHE POLICY: DCN • Single parEEon • Download filtering via synchronizaEon parameters
• IniEal data loading can take a long Eme. SynchronizaEon must wait for loading to complete
• Concurrency with synchronizaEon @ row level
• DCN takes advantage of mulEple SUP servers in the cluster to parallelize loading
• Use a noEficaEon MBO to let device know data is ready
• Referred to MBOs have to be pushed before referring MBOs
34 – Company ConfidenAal – December 12, 2011
DATA MODEL IMPLICATIONS ON CLIENT
LIMITATIONS OF MOBILE DATABASE
35 – Company ConfidenAal – December 12, 2011
CLIENT IMPLICATIONS • Database page size governed by maximum row size derived from MBO definiEon
– Lots of aTributes or lengthy ones larger rows larger page size
– On some devices like the Blackberry, more than memory is consumed – object handles
– Based on our observaAons, page sizes between 1k – 4k seems to provide best overall performance
– Do not forget to account for non LaAn encoding which will result in large row size
– Large rows means less rows per page and more pages must be fetched or cached. For MBOs used in list views, this can impact the UI response
36 – Company ConfidenAal – December 12, 2011
CLIENT IMPLICATIONS • Large MBO instance leads to slow and expensive object instanEaEon
• Object query returns object(s) and dynamic query returns result set. Use dynamic query to bypass object instanEaEon and selecEvely retrieve a subset of aUributes
• For one many or one ↔ many relaEonship where count(many) is large
– NavigaAon and cascade operaAon can be expensive • Does the data model enable applicaEon to use simple queries for most use cases?
– Simple joins are expensive on mobile devices. This is true even for iPhone and the like
• Indexes slow down synchronizaEon and updates