21
(ATS03-PLAT08) Optimizing Protocol Performance Eddy Vande Water Director, EMEA Field Application eddy.vandewater@accelry s.com Andrew LeBeau Advisory Product Manager andrew.lebeau@accelrys. com

(ATS3-PLAT08) Optimizing Protocol Performance

Embed Size (px)

DESCRIPTION

An overview of techniques for improving Pipeline Pilot protocol performances. Simple principles can be applied when developing protocols in order to improve overall performance. But first let’s see how to identify bottle-neck and avoid very common mistakes!

Citation preview

Page 1: (ATS3-PLAT08) Optimizing Protocol Performance

(ATS03-PLAT08) Optimizing Protocol Performance

Eddy Vande WaterDirector, EMEA Field [email protected]

Andrew LeBeauAdvisory Product [email protected]

Page 2: (ATS3-PLAT08) Optimizing Protocol Performance

The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.

Page 3: (ATS3-PLAT08) Optimizing Protocol Performance

Agenda

• Profiling and Refactoring• Data Access• Data Computing• Others key T&T• Server optimization• Summary

Page 4: (ATS3-PLAT08) Optimizing Protocol Performance

• Consider the first version (V1) of a “complete” protocol…perhaps ~30 components

• Protocol building is typically an incremental process with much iterative design. Therefore, completion of V1 represents the documentation of an intellectual process

• However, very significant optimizations can be achieved by reviewing V1 and considering major (perhaps complete) refactoring of the protocol, using the knowledge developed from building V1

Protocol Refactoring

Page 5: (ATS3-PLAT08) Optimizing Protocol Performance

• Identify protocol bottlenecks• Ctrl+T to toggle between options– Absolute compute time (sec)– Compute time as percentage of total execution time

Component Profiling

Page 6: (ATS3-PLAT08) Optimizing Protocol Performance

Demo: Protocol version 01

Protocol development flow• Get big file with activity on several

targets and lot of other props.• Need to pivot data• Only interested by one target• Need structure• Join my activity• Compute new property• Need additional data from db• Only interested by a range of data• Create nice report

Page 7: (ATS3-PLAT08) Optimizing Protocol Performance

Demo: Protocol version 02

28 seconds instead of 6 minutes!

Why?Because I used some simple principles!

Page 8: (ATS3-PLAT08) Optimizing Protocol Performance

• Keep the records as small as possible to do what you need. Don’t read in things just because they are there in the file; only read it what you will use! Don’t pass anything further down the pipeline than it is needed.

• If writing to disk to pass information between pipelines, caches are faster than delimited text (or any other file).

Data Access

Page 9: (ATS3-PLAT08) Optimizing Protocol Performance

• All create implicit caches• Filter before merging/caching• Reduce the number of properties• Merge on a sub-stream then join back• Sort before join – on the primary key• Cache Writer: Use Pre-Index options if the cache will later

be joined on

Merge / Join / Group / Sort / Cluster / etc.

Page 10: (ATS3-PLAT08) Optimizing Protocol Performance

• Database access should be tuned:– See

• (ATS3-PLAT04) Database Connectivity for Application Development • (ATS2-23) Managing Data Source Connections

– PP should be located close to the database server– Join in the database if possible– Use batch inserts, etc.– Use batches with the SQL Select for Each Data

Database

Page 11: (ATS3-PLAT08) Optimizing Protocol Performance

• Think about the order you need to do things

• Compared with…

When and Where to Calculate Properties

Page 12: (ATS3-PLAT08) Optimizing Protocol Performance

• Allows parallelization of computationally intensive tasks

• Need to pay attention to batch size – don’t make it too small– Performance can be almost linear with

number of cores (our numbers and customers’)

• Can be problematic for subprotocols using R, or other external apps

Parallel Processing in Subprotocols

Page 13: (ATS3-PLAT08) Optimizing Protocol Performance

Demo: Protocol version 2.0

Page 14: (ATS3-PLAT08) Optimizing Protocol Performance

• Prefer linear pipelines– Most efficient memory usage

• Avoid excessive branching– Branching pipes causes data cloning. This can be expensive for large data

records

• Avoid hash tables as caches– Use a file cache

• Reduce usage of caches and caching components– Merge, Group, Sort and Cluster create unseen caches– Be mindful of children nodes

Others key T&T

Page 15: (ATS3-PLAT08) Optimizing Protocol Performance

Others key T&T ctd.

• General relative speed of implementations: – Components >= Pilot Script >= Java

• Protocol Function– Use AJAX to call a protocol within a page– Can provide better performance if only needs to update part of a report

• Be careful!– Run To Completion (RTC) subprotocol can slow down protocol execution:

Use sparingly…– Check point are very good to debug but should not be kept while protocol is

finished.

Page 16: (ATS3-PLAT08) Optimizing Protocol Performance

__PoolID

• PP Server uses daemons and job pooling to speed up executing jobs

• Setting __PoolID sets which job pool your protocol is executed in• You CANNOT put the __PoolID parameter on the protocol itself

=> Admin discussion in (ATS3-PLAT11) Advanced Planning for AEP Deployments_Migrations

Page 17: (ATS3-PLAT08) Optimizing Protocol Performance

Job Pooling Illustration

Page 18: (ATS3-PLAT08) Optimizing Protocol Performance

• Built-in job pools (Some job pools are configured to run OOTB):– Warm-up pool– Keep-warm pool– Default pool

• Job pools and impersonation

Using __PoolID

Page 19: (ATS3-PLAT08) Optimizing Protocol Performance

Server optimization

Cluster • Built into Pipeline Pilot

Grid

• Leverages Existing Grid Engine• Sun GridEngine• PBS Pro• LSF• Custom Scripts

See (ATS3-PLAT11) Advanced Planning for AEP Deployments_MigrationsAnd also (ATS2-07) Solving Large Computing Challenges with Pipeline Pilot

Page 20: (ATS3-PLAT08) Optimizing Protocol Performance

• Protocol Refactoring is a very critical step.

• Application of basic principles can improve dramatically

performances

• Fine tuning needs good knowledge of the context

• Use a specific job pool for your apps

• Accelrys Enterprise Platform is very scalable.

Summary

Page 21: (ATS3-PLAT08) Optimizing Protocol Performance

The information on the roadmap and future software development efforts are intended to outline general product direction and should not be relied on in making a purchasing decision.

For more information on the Accelrys Tech Summits and other IT & Developer information, please visit:https://community.accelrys.com/groups/it-dev