22
DREAMSPACE*D4.2.2_v1.3_300915.docx Page 1 of 22 D.4.2.2 Framework for synchronized near*time 2D operation graphs Deliverable due date: 30.09.2015 Actual submission date: 30.09.2015 Project no: FP7 * 61005 Project start date: 01.10.13 Lead contractor: The Foundry

DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!1!of!22!!

!

!

!

!

!

!

!

!

!

!

!!!!!! !!!!!!!!!!

!

D.4.2.2!

!Framework!for!synchronized!near*time!2D!operation!graphs!Deliverable!due!date:!30.09.2015!

Actual!submission!date:!30.09.2015!!

!

Project!no:!FP7!*!61005!!

Project!start!date:!01.10.13!

Lead!contractor:!The!Foundry!

Page 2: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!2!of!22!!

! !

Project,ref.,no., FP7!*!61005!

Project,acronym, DREAMSPACE!

Project,full,title, Dreamspace:! A! Platform! and! Tools! for! Collaborative!Virtual!Production!Dreamspace!

Security,(distribution,level), RE!

Contractual,date,of,delivery, Month!24,!30/9/2015!

Actual,date,of,delivery, Month!24,!30/9/2015!

Deliverable,number, D4.2.2!

Deliverable,name, Framework! for! synchronized!near*time!2D!operation!graphs!

Type, Prototype!and!Report!

Status,&,version, v1.3!!

Number,of,pages, !23!

WP,/,Task,responsible, The!Foundry!

Other,contributors, !

Author(s), A!Purvis,!J!Starck,!The!Foundry!

Internal,Reviewer, !Oliver!Grau!

EC,Project,Officer, !Alina!Senn!

DOCUMENT HISTORY

Version Date Reason of change

1.0 01 – 07 – 2015 Document created 1.1 14 – 09 – 2015 Version submitted for internal review

1.2 30 – 09 – 2015 Changes in response to internal review

1.3 30 – 09 – 2015 Version submitted to Commission

Page 3: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!3!of!22!!

!

!Contents

1! Executive,Summary,......................................................................................................,4!

2! Introduction,.................................................................................................................,5!

2.1! Scope,of,work,.......................................................................................................,5!2.2! StateOofOtheOart,in,Virtual,Production,....................................................................,5!

3! Scheduling,...................................................................................................................,8!

3.1! Overview,...............................................................................................................,8!3.2! The,Dreamspace,Scheduling,Problem,....................................................................,8!3.3! Compute,Cost,........................................................................................................,9!3.4! Transfer,Cost,.......................................................................................................,10!3.5! Problem,Formulation,..........................................................................................,11!3.6! Optimization,Strategy,.........................................................................................,12!3.7! Acceleration,........................................................................................................,13!3.8! Summary,.............................................................................................................,16!

4! Evaluation,..................................................................................................................,17!

4.1! Overview,.............................................................................................................,17!4.2! Performance,Test,................................................................................................,17!4.3! Discussion,...........................................................................................................,19!4.4! Summary,.............................................................................................................,20!

5! Conclusion,.................................................................................................................,21!

6! References,.................................................................................................................,22!

,

, ,

Page 4: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!4!of!22!!

,

1 Executive,Summary,

Currently! high*end! VFX! is! produced! through! compositing! tools! such! as! NUKE! from! The!Foundry! in! film! post*production.! Artists! build! up! intricate! effects! by! creating! a! graph! of!operations!to!composite!elements!onto!the! footage!that! is!captured! in!production.!This! is!currently!separated!from!on*set!visualization!in!Virtual!Production!where!only!a!limited!set!of!compositing!operations!are!used!to!provide!the!live!preview!at!the!time!of!shooting.!!

One!of! the!targets! in!Dreamspace! is! to!explore!a!connected!pipeline!that! is!shared!on*set!and! in! post*production! with! the! ability! to! create! a! live! composite! for! use! on*set! and!reproduce! this! in! NUKE.! In! WP4! a! prototype! application! called! BlinkPlayer! has! been!developed!as!proof*of*concept!for!a!live!compositing!system!that!allows!a!composite!to!be!authored! in! NUKE! and! exported! to! run! standalone.! In! this! deliverable! we! report! on! the!development!of!a!scheduler!that!has!been!designed!to!optimize!the!execution!of!a!graph!of!operations!in!BlinkPlayer!to!achieve!the!fastest!execution!time!as!part!of!a!live!system.!

The!Dreamspace! scheduling! problem!has! been! formulated! in! terms! of! the! partitioning! of!image!processing!across!available!compute!devices.!On!a!target!platform,!the!compute!and!transfer!costs!are!modeled!and!scheduling!is!formulated!to!find!the!optimal!data!partition!to!minimize! the! total! execution! time.! An! efficient! optimization! scheme! has! been! developed!based! on! Metropolis*Hastings! Markov*Chain! (MHMC)! sampling! to! explore! the! high!dimensional!space!of!possible!partitions.!An!initial!evaluation!is!presented!to!provide!insight!on!the!performance!of!the!scheduling!scheme.!!

The! system! that! has! been! developed! allows! a! user*defined! graph! of! operations! to! be!authored! in! NUKE! and! run! stand*alone.! This! provides! a! configurable! system! that! is! not!limited!to!a!fixed!set!of!compositing!operations.!Scheduling!is!critical!to!make!best!possible!use! of! the! available! hardware! resource.! For! a! fixed! set! of! operations! a! hand*tuned! data*partition! and! execution! can! be! constructed.! For! a! configurable! set! of! operations! a! fixed!static!partition!or!the!use!of! just!a!single!GPU! is!sub*optimal.!The!performance!evaluation!demonstrates! the! speed! up! that! can! be! obtained! by! scheduling! to! get! the! best! resource!utilization!and!the!proposed!scheduler!scales!to!use!additional!hardware!available.!

!

!! !

Page 5: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!5!of!22!!

2 Introduction,

Virtual!Production!for!Film,!Commercials!and!TV!uses!a!variety!of!hardware!components!and!software! tools! that! are! usually! assembled! into! a! proprietary! system! to! create! a! Virtual!Studio.!There!is!a!separation!between!the!assets!and!pipeline!used!for!live!visualization!on*set!in!a!Virtual!Studio!and!the!creation!of!the!final!composite!delivered!in!post*production.!One!of!the!targets! in!Dreamspace! is!to!provide!a!connected!pipeline!with!the!flexibility!to!create!more! complex! compositing!operations! that! can!be! setup!on*set,! viewed!as!part!of!the! live! visualization! in! production!with! continuity! through! to! post*production!where! the!composite!can!be!reproduced!and!refined!to!deliver!the!final!shot.!

!

2.1 Scope,of,work,

The!target!in!WP4T2!is!to!create!a!real*time!framework!for!compositing!with!the!following!key!requirements!that!extend!the!state*of*the*art:!

1. Support!for!a!user!defined!graph!of!compositing!operations!in!Virtual!Production!2. Ability!to!view!the!composite!in!real*time!as!part!of!a!live!environment!3. Ability!to!reproduce!the!composite!offline!in!post*production!

The! industry!standard!NUKE!compositor! from!The!Foundry!has!been!used!as! the!basis! for!this! work.! NUKE! has! GPU! accelerated! operations! written! using! BLINK,! a! domain*specific!language!designed!for!image!processing.!!

In!deliverable!D4.2.1,!BLINK!was!developed!to!run!stand*alone!outside!NUKE!with!data*level!parallelism!and!consistent!results!for!operations!run!on!CPU!or!GPU!on!different!platforms.!This!provided!the! first!proof*of*concept! to!author!a!composite! in!NUKE,!which!could! then!run!stand*alone!with!consistent!results.!

In! this! deliverable! D4.2.2,! we! describe! the! development! of! a! scheduler! to! distribute!processing!in!a!graph!of!BLINK!operations!across!available!compute!resources.!This!provides!the!ability!to!get!close!to!optimal!performance!by!leveraging!both!CPU!and!GPU!resources,!and!it!enables!processing!to!scale!with!the!hardware!available.!

!

2.2 StateOofOtheOart,in,Virtual,Production,

Deliverable! D2.1.1! provides! a! comprehensive! overview! of! the! state*of*the*art! in! Virtual!Production.! There! are! relatively! few! commercial! solutions! available! and! one! complete!system! that! includes! tracking,! rendering! and! compositing! is! the! Previzion! system! from!Lightcraft.!Previzion!provides!a!live!composite!of!live*action!foreground!content!shot!against!a!green*screen!with!a!virtual!scene!as!the!background!as!shown!in!Figure!1.!!

!

Page 6: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!6!of!22!!

!“XXIT”!Previzion!and!Maya!controls!on!a!green!screen!stage!(left)!and!composite!with!CG!background!(right)!

!“XXIT”!green!screen!stage!(left)!and!composite!with!CG!foreground!and!background!elements!(right)!

Figure,1,–!Lightcraft!Previzion!in!use!at!Stargate!Studios!from!D2.1.1!

The!Previzion!system!provides!a!fixed!set!of!effects!that!can!be!applied!to!the!live!video!feed!The! most! critical! is! control! of! the! chroma! key! to! correctly! separate! the! live*action!foreground!and! control!of! the! colour! correction! to!match! the! look!of! the! real! and!virtual!elements.!This!ensures!the!Director!and!Director!of!Photography!have!a!preview!on*set!for!the!final!composite!to!make!creative!decisions!on!shot!framing,!lighting!and!timing.!

Today,!systems!like!Previzion!are!limited!to!a!fixed!set!of!operations!in!the!live,!the!interface!and! controls! are! highly! technical! requiring! trained! operators! and! there! is! no! continuity!between!the!effects!and!the!final!composite!in!post*production.!Figure!2!shows!the!controls!for! the! chroma! keyer! in! Previzion.! This!would! be! set! up! on*set! for! the! best! possible! live!preview,!then!recreated!separately!in!post*production!in!NUKE.!!!

!

Figure,2,–!Lightcraft!Previzion!chroma*key!controls!from!D2.1.1!

Page 7: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!7!of!22!!

The!aim!in!Dreamspace!is!to!provide!more!intuitive!controls!in!production!with!continuity!in!the!data!from!set!through!to!post*production.!The!visual!effects!supervisor!should!have!the!ability! to!set!up! the!key!simply!and!quickly,! then! reproduce!and! refine! this! in!post.!There!should!also!be!more!flexibility!in!defining!the!composite,!so!that!it!can!be!extended!beyond!the!chroma!key!and!colour!correct! if!necessary! for!different! shots!–!again!with!continuity!through!to!post*production!to!reproduce!the!intent!created!on*set.!

Currently! in! post*production! tools! such! as! NUKE! are! used! to! combine! real! and! virtual!elements.!NUKE!allows!an!artist!to!build!a!non*destructive!description!of!a!composite!that!can! then! be! reproduced,! revised! and! iterated.! This! is! an! offline! process! where! teams! of!artists!work!collaboratively!to!create!the!visual!effects!for!different!shots.!Figure!3!shows!an!example! of! a! graph! of! operations! in! NUKE! that! defines! the! compositing! operations! for! a!shot.!This!represents!a!significant! level!of!complexity!beyond!the!fixed!stack!of!operations!performed!in!Virtual!Production!systems!today.!

!

!

!

!

!

!

!

!

!

!

Figure,3,–!Nuke!script!set!up!to!combine!elements!in!a!final!composite!

NUKE!is!used!near*set!as!part!of!virtual!production!for!high*end!film!and!episodic!TV.!This!provides! the!ability! to! review! the! composition!of! shots!and!visual! effects!with!a! fast! turn!around! time.! The! process! requires! highly! complex! management! of! data! and! processing!tasks.! As! an! example! the! round*clock! workflow! for! the! visual! effects! team! on! Hugo! is!described!in!the!following!article:!

http://www.fxguide.com/featured/hugo*a*study*of*modern*inventive*visual*effects/!

There!is!currently!no!connection!between!the!systems!used!to!provide!the!live!visualization!on*set!at!the!time!of!shooting!and!the!visual!effects!team!that!creates!the!composite!for!a!shot! in! NUKE.! Today! it! remains! a! separate! process! with! significant! overhead! in! data!management.!The!work!in!WP4T2!aims!to!enable!the!execution!of!NUKE!composite!as!part!of!a!real*time!system.!

Page 8: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!8!of!22!!

3 Scheduling,

3.1 Overview, ,, ,, ,, ,

The problem posed for scheduling, is to take a graph of operations designed to modify and combine image sources as shown in Figure 3 and distribute the processing of the operations across the available compute resource, either CPU or GPU, to obtain the fastest possible execution time. A significant body of prior research explores executing serially dependant tasks across multiple work machines or execution resources. This type of scheduling problem has been previously been formalised in work on “Job Shop Scheduling Problems” (JSSP)[4][5]. JSSP was originally created to describe mathematical optimisation of business orders in industrial manufacture. Despite this original problem domain, the heterogeneous scheduling problem posed by the Dreamspace project is clearly recognisable as a subclass of JSSP identified as Flexible Job Shop Scheduling Problems (FJSSP)[6]. FJSSP differs from the regular JSSP in that it is specifically concerned with optimising work scheduling where each component operation may be performed by any of multiple heterogeneous work machines. In contrast a regular JSSP only allows each operation to be performed at one specific work machine. Several prior projects have investigated heterogeneously scheduling operations via a FJSSP approach, these include local approaches to scheduling such as the Heterogeneous Earliest Finish Time algorithm (HEFT)[2], Critical Path on Processor (CPOP)[2], Performance Effective Task Scheduling (PETS)[3] and global approaches such as ICENI Grid Middleware[1]. It is important to note that canonical F/JSSP approaches are only concerned with task and not data parallelism. In this work we extend the problem to schedule data processing. In order to model Dreamspace’s scheduling problem as a type of FJSSP, we only require a directed acylic graph (DAG) of well defined operations, a set of execution resources that these operations will be performed on, and the expected duration (cost) of performing each operation on every execution resource. The first of these two are trivially available from NUKE and a host computer’s operating system. For the last of these three, we will later present a cost metric to estimate the duration of an operation in advance of its execution. We may then minimise the FJSSP’s execution time to create the fastest possible schedule for a given DAG.

3.2 The,Dreamspace,Scheduling,Problem, ,, ,, ,, ,

There are a number of requirements imposed by Dreamspace which do not fit the FJSSP representation. For this reason, we must extend the classical FJSSP problem into what we will call the Dreamspace Scheduling Problem (DSSP). A classical FJSSP does not model the cost of moving intermediate results between execution resources, which is clearly not representative of modern heterogeneous computing platforms where a system’s various processors will typically be separated by finite speed busses. ICENI already extends the canonical FJSSP to include a model of this extra cost. We will similarly address data transmission with an additional model of transport costs for intermediate results when they are moved between execution resources. Further to this limitation, a classic FJSSP treats each operation as an indivisible and uninterruptible action to be performed on only one execution device. As Dreamspace is concerned not only with task parallelism, but also data parallelism, it is required that we modify the FJSSP to allow for multiple parts of an operation to be performed in parallel on different execution units. We will achieve this with a novel transformation of the input DAG prior to scheduling.

Page 9: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!9!of!22!!

Where a task may be split into two or more parallel tasks, partitioning may be performed as a pre-transformation of the DAG in advance of scheduling. This reduces the DSSP to a formulation closer to a classical FJSSP. However, in a practical system this division splits the optimisation problem into two serial decisions. First, to optimally partition all divisible nodes of the DAG, without knowledge where the subdivisions will be executed. Second, to optimally schedule the post-partition workloads to hardware which may not be well suited for the partitions previously selected. To globally optimise the full system, it is clear that these two decisions must be made together in a single step. To unify partitioning operations with scheduling partitions on to execution hardware, we preprocess the DAG to split every operation into one sub-operation per execution resource. Deciding the fractional size of these partitions is deferred until scheduling where an individual partition may be sized as “0” to indicate that given resource computes no pixels of the operation’s output. This allows the partitioning and scheduling decisions to be made in a single step by selecting partition sizes and shuffling operation execution order. It is worth noting that this final formulation ultimately reduces the DSSP to a type of non-Flexible JSSP with additional partitioning parameters. The total execution time for a schedule is called the makespan. The makespan of our DSSP is easily determinable using the same methods as any JSSP. First, the longest path in the scheduling must be identified, then the start times and end times may be computed using the standard methods, taking account of DAG edges where additional costs may be incurred from the transmission of intermediate results between execution devices.

3.3 Compute,Cost,

JSSP problems require an expected duration for all operations to be defined in advance of scheduling. We take an empirical approach to defining such a model for the DSSP. As DSSP may partition computation of an operation across several execution devices, it is required that our model not only express the cost of each operation on every execution resource, but also estimate the cost of performing an arbitrary fraction of an operation. To measure the cost of operations, we execute a number of dummy executions of each operation at the time of DAG specification. Multiple dummy runs are performed for a number of equally spaced subdivisions between 0% and 100% of the full workload. A median time for each partial workload is computed and these values are used as samples in a piecewise linear approximation of the underlying cost of the operation. Figure 4 and 5, for example show the compute time for a horizontal and vertical Gaussian blur applied to an HD image when processing from 0% to 100% of the image rows. It is assumed that there is no correlation between the cost of evaluating a pixel and its location within an image. A further assumption expects the runtime of an operation depends only on the size of the image region being evaluated and the parameters to that operation. It is assumed that operation execution time is independent of the pixel values of input images. When empirically measuring costs, there is an assumption that the device being measured is not under load and all of its computational power is available to us. It is also assumed that the cost of evaluating a zero pixel region on any device is zero as no work has to be performed.

Page 10: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!10!of!22!!

Figure,4,–!GPU!compute!cost!(μs)!to!execute!different!numbers!of!pixel!rows!in!an!HD!image!!

Figure,5,–!CPU!compute!cost!(μs)!to!execute!different!numbers!of!pixel!rows!in!an!HD!image!!

!

3.4 Transfer,Cost,

Similar to operation execution, a cost metric must be defined for the transfer of intermediate results between operations. For two subsequent operations performed on the same piece of execution hardware it is clear that transfer costs are zero as produced data is already in the correct location for further work. Where this is not the case, a cost will be incurred between every pair of devices which need to communicate data. It is assumed that cost of transferring data is a direct function of the data’s size in bytes. This requires the measurement of D*(D-1)/2 cost functions given a set of ‘D’ devices. On application startup, we can empirically derive these costs using a similar technique to the measurement of operation costs. For

Page 11: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!11!of!22!!

each pair of execution devices, dummy transfers of various sizes are made in both directions. Medians are computed and a piecewise linear function is created.

Figure,6,–!Transfer!cost!(μs)!to!transfer!different!numbers!of!pixel!rows!in!a!2K!image It has proven empirically critical that timings be taken for transfers in both directions between every pair of devices. This is due to the asymmetrical nature of some modern bus architectures.

3.5 Problem,Formulation,

The!DSSP!scheduling!problem!requires!partitioning!image!processing!operations!among!the!available!compute!resources!such!that!the!total!time!to!process!(the!makespan)!is!as!small!as!possible!while!taking!account!of!the!transfer!time!to!move!data!between!devices.!

Each!operation!has!to!be!allocated!to!the!available!devices.!For!N!devices!an!N*way!partition!of! an! operation! can! be! easily! represented! by! a! N*dimensional! simplex.! This! is! a! natural!parameterisation,! as! the! fractional! allocations! of! each! operation! must! sum! to! unity! to!ensure!that!the!full!image!is!processed.!

We! define! a! state! matrix! of! size! N*M! where! N! is! the! number! of! devices! and! M! is! the!number! of! operations.! In! this! formulation! a! state! matrix! column! vector! defines! the!allocation! of! an! operation! across! the! N! devices! and! must! satisfy! the! constraint! that! it!represents! a! point! within! a! N*dimensional! unit! hypercube.! This! column! vector! can! be!renormalized!when!mapping! to! fractional! partitions! such! that! the! elements! of! the! vector!sum!to!1.!After!such!a! transformation,!every!coordinate!except! the!origin!maps! to!a!valid!partitioning!of!work.!It!is!worth!noting!that!like!the!N*simplex,!the!edges!and!corners!of!the!N*hypercube! represent! partitions! that! do! not! use! all! available! execution! devices.!We!will!exploit!this!later.!

The!DSSP!scheduling!problem!is!parameterised!by!a!real*value!state.!While!this!is!treated!as!a! continuous! problem! for! optimisation,! the! partition! is! mapped! to! a! discrete! number! of!pixel!rows!that!are!executed!on!the!available!devices.!

!

Page 12: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!12!of!22!!

3.6 Optimization,Strategy,

Minimising!the!makespan!of!the!DSSP!requires!a!high*dimensional!numerical!optimisation.!This!is!a!highly!non*linear!problem!where!there!are!significant!discontinuities!resulting!from!changes! in!overheads!when!switching!between!different!devices!for!processing.!There! is!a!wealth!of!existing!work!on!JSSP!with!Simulated!Annealing!(SA)![8][7][10][11],!which!is!based!on!Metropolis*Hastings!Markov*Chain! (MHMC)! sampling! [9].! SA/MHMC!based!approaches!have!achieved!very!promising!and! robust! results.!A!SA/MHMC!approach!does!not! require!derivatives! or! any! particular! properties! from! the! function! being! optimised,! and! as! such!makes!a!great!candidate!for!our!optimisation!problem.!

While!our!parameterisation!is!significantly!different!to!that!of!a!classical!JSSP,!it!is!simple!to!define!a!neighbourhood,!proposal!distribution!and!suitable!sampling!technique!for!MHMC**!these! arise! naturally! from! our! geometric! parameterisation.! For! a! proposal! distribution,! a!simple! N*dimensional! gaussian! offset! which! wraps! around! at! the! opposing! faces! of! the!hypercube!may!be!used.!Computing!probabilities!and!sampling!for!this!strategy!is!trivial.!As!MHMC!is!a!stochastic!optimisation!technique,!care!must!be!taken!if!repeatable!results!are!important.! If! required,! a! custom! set! of! random! number! generators! with! predetermined!seeds!may!be!used!to!get!deterministic!behaviour.!

While! a! wrapping*gaussian! mutation! strategy! is! trivial! to! implement! it! poses! several!weaknesses.!The!primary!drawback!is!the!infinitesimal!probability!of!producing!state!vectors!that! contain! ordinates! exactly! equal! to! zero.! As! this! subset! of! state! vectors! represent!schedulings! which! use! only! a! subset! of! the! available! devices,! the! faces! and! corners! of!hypercubes! are! particularly! interesting! regions! of! the! scheduling! function! to! explore.! In!order! to! improve! the! quality! of! the!wrapping*gaussian!mutation,! we! can!wrap! ordinates!exceeding! unity! and! clamp! any! negative! ordinate! to! zero.!Once! a! state! vector! contains! a!zero!and!thus!is!“on!a!lower!face”!of!the!a!hypercube,!it!is!then!beneficial!to!randomly!select!between! strategies! which! explore! the! surface! of! that! face! by! only! mutating! non*zero!ordinates,!or!(to!ensure!ergodicity)!exit!the!face!back!into!the!interior!of!the!hypercube.!One!significant!advantage!of! this!clamping!approach! is!a!guarantee!of!well*defined!acceptance!probabilities! for! the!MHMC! algorithm.!Without! a! discrete! stay/leave! probability! for! state!vectors!which!lie!on!faces,!it!is!possible!for!the!MHMC!acceptance!probability!to!take!zero!or!infinite!values!and!become!trapped!in!subregions!of!the!hypercube.!

Another! useful! mutation! is! to! permute! the! order! of! ordinates! within! a! state! vector.!Conceptually,! this! is! the! same! as! swapping! which! image! region! is! computed! on! which!execution!device.!As!a!final!guarantee!of!ergodicity,!it!is!desirable!to!sometimes!propose!a!uniformly! sampled! location!within! the!hypercube! irrespective!of! the! current! state! vector.!This!effectively!breaks!the!underlying!Markov!chain!as!the!proposal!state!has!no!relation!to!the! current! state.!With! this! last! mutation! we! offer! an! easy! guarantee! against! becoming!stuck!in!regions!of!the!hypercube.!

In! our! experiments,! the! use! of! a! Cooling! Schedule[8]! to! modify! Metropolis! Hastings!acceptance! probability! between! iterations! proved! largely! counterproductive! when!compared!to!pure!MHMC!sampling!with!a!record!of!best!scheduling!to!date.!This!was!due!in!part!to!the!difficulty!of!designing!a!suitable!cooling!schedule!and!anticipating!the!number!of!iterations!required!for!good!convergence!of!a!graph.!!

Page 13: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!13!of!22!!

!

3.7 Acceleration,

Figure! 7! shows! the! total! cost! for! a! schedule! over! time,! demonstrating! convergence! to! a!minimal! compute! cost! for! a! graph! of! operations.! Given! an! even! partitioning! as! an! initial!state,!a!rapid!reduction!of!makespan!duration!is!achieved!in!the!initial!iterations.!After!this,!we!still!observe!a!reduction!of!makespan!with!additional!iterations,!but!there!is!a!dramatic!decrease! in! rate! of! improvement.! To! improve! the! convergence! rate! of! the! presented!algorithm,!we!present!several!optimisations!in!the!following!subsections.!

!

!Figure,7,–!Total!compute!cost!over!time!as!different!solutions!evolve!in!MHMC!!

Population,Metropolis,Hastings,Monte,Carlo,(PMHMC),

MHMC! is! a!Markov! chain! technique.! As! such,! it! is! not! easily! threadable! to! reap! the! full!performance! of! modern! multi*core! CPU! architectures.! This! follows! from! one! of! the!fundamental!properties!of!a!Markov!chain,!namely!the!propositional!state!as!function!of!the!chain’s!current!state.!Given!this!serial!dependency,!it!is!not!possible!to!trivially!partition!the!computation!of!a!chain.!

Instead!of! trying! to!parallelise!a! single!Markov!chain,!we!propose! running! several!parallel!MHMC!processes,!each!with!an!independent!Markov!chain!and!random!number!sequence.!On! its! own,! this! constitutes! a! simple! “Best! of! N”! approach.! While! this! can! reduce! the!probability! of! generating! a! poorly! performing! chain,! it! does! not! fundamentally! alter! the!convergence! of! the! underlying! MHMC! processes.! In! order! to! improve! this! “Best! of! N”!approach,!we!borrow!an!idea!from!Particle!Swarm!Optimisation!(PSO)[12][13]!and!Genetic!Algorithms! (GA)[14].! We! call! this! new! technique! Population! Metropolis! Hastings! Monte!Carlo!(PMHMC).!!

PMHMC!defines!a!small!period!of!iterations!called!a!super*iteration.!We!have!found!100*200!iterations! to! represent! a! good! duration.! During! such! a! super*iteration,! N! independent!MHMC!processes!will! attempt! to! independently!optimise! an! initial! state.!After! the! super*

Page 14: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!14!of!22!!

iteration!is!complete!the!individual!MHMC!processes!are!ranked!by!their!best!result!and!via!a! replacement!policy!weaker!chains!will!be! replaced!by!clones!of!more!productive!chains.!Conceptually!this!allows!information!to!be!shared!between!MHMC!chains!after!each!super*iteration.! As! such! the! parallel! chains! no! longer! represent! an! independent! Best! of! N!approach,! but! a! single! cooperative! optimisation! process.! A! significant! benefit! in! the!convergence!rate!of!the!proposed!system!is!achieved!as!shown!in!Figure!8.!

!Figure,8,–!Total!compute!cost!reduces!more!rapidly!during!optimisation!using!PMHMC!Lines!St1*St4!represent!individual!MHMC!processes.!Lines!“M08/16/32_*”!represent!PMHMC!with!population!sizes!of!8,!16!and!32!respectively.!!

Dimensionality,reduction,

Depending! on!DAG! topology! and! image! access! requirements,! it! is! possible! to! reduce! the!dimensionality! of! the! problem! space! to! be! explored! and! aid! the! convergence! of! the!PMHMC/MHMC!approach.!Specifically,!linear!“chains”!of!operations!within!a!graph!have!no!benefit!to!being!partitioned!differently!if!they!use!“Pixel!Access”!on!their!inputs!(Eg.!A!Gain!followed!by!an!Invert!followed!by!a!Gamma!Correction).!Forcing!this!partitioning!constraint!guarantees!that!no!proposed!schedule!requires!any!inter*device!transfers!along!such!chains.!While! the! PMHMC/MHMC! process! will! eventually! settle! on! such! a! schedule! through!ordinary! operation,! an! acceleration! of! convergence! to! this! state! may! be! reaped! by!identifying! such! subgraphs! of! the! DAG! and! forcing! the! no*transfer! constraint.! This! may!easily! be! achieved! by! sharing! a! single! hypercube! state! vector! between! all! operations!forming!such!a!chain.!This!demonstrates!improved!convergence!as!shown!in!Figure!9.!

Page 15: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!15!of!22!!

! !

Figure,9,–!Convergence!is!improved!when!constraining!chains!of!operations!that!have!single!pixel!access!patterns!to!be!parameterized!by!a!single!N*dimensional!partition.!

!

SubOgraph,scheduling,

Where!graphs!become!large,!the!dimensionality!of!the!system!being!optimised!may!become!extreme.!In!these!cases!convergence!may!suffer!due!to!the!scale!of!the!problem!space!to!be!explored.! In! our! experiments! this! property! has! been! observed! to! become! significant!between!50*100!nodes!per!DAG!on!a! four!device!system.!While!graphs!exceeding!this!are!arguably!larger!than!what!can!be!considered!within!the!scope!of!a!real!time!system!based!on!contemporary!technology,!the!principle!of!optimising!a!graph!for!fastest!possible!time!to!completion!may!still!be! interesting!on!DAGs!upward!of!5000!operations.!At! this!scale,! the!convergence!of!our!PMHMC/MHMC!system!breaks!down!and!becomes! impractical.!While!the!development!of!better!mutation!strategies!for!the!underlying!Markov!chain!may!help!to!alleviate! this,! the! curse! of! dimensionality! ensures! that! finding! an! optimal! value! within! a!space!of!unbounded!dimensionality!will!become!harder!and!harder!as!that!space!grows.!

To!overcome!this,!we!have!prototyped!a!“Split!Graph”!scheduling!technique.!This!adds!two!more! flavours! of! scheduling! on! top! of! our! PMHMC/MHMC! algorithm! (MHMC,! PMHMC,!SMHMC,! SPMHMC).! In! the! split! version! of! both! previously! algorithms,! a! DAG! is! pre*processed! into! a! number! of! subgraphs! containing! less! than! 50! nodes! each.! When!constructing! these! subgraphs,! an! attempt! is! made! to! partition! into! groups! of! maximal!connectivity.! These! subgraphs! may! then! be! scheduled! efficiently! using! the! existing!PMHMC/MHMC!algorithms.!!

Once! a! set! of! subgraphs! is! optimised,! a! graph! of! subgraphs! can! then! be! scheduled.! The!purpose!of!this!higher!level!graph!scheduling!is!to!optimise!the!transfers!of!results!between!individual! subgraphs.!Where! one! subgraph! chooses! to! leave! its! intermediate! results,! the!next!subgraph!is!constrained!to!work!with!its!inputs!in!a!location!that!was!decided!out!of!its!

Page 16: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!16!of!22!!

control.! By! iterating! between! subgraph! and! outer*graph! optimisation,! the! system! may!converge!to!a!good!makespan!over!time.!

!

Figure, 10, –! The! execution! time! for! different! number! operations! in! the! graph,! comparing!different!scheduling!strategies!relative!to!an!initial!execution!time!on!a!single!device.!

Figure! 10! demonstrates! split*graph! scheduling! in! comparison! the! MHMC! for! the! same!number!of! total! iterations! in!optimization.! This!was!applied! to!a!problem!with!a!20! node!graph!connected!as!a!linked!set!of!chains!to!create!larger!graphs.!The!results!are!normalized!in!comparison!to!the!time!to!execute!the!graph!on!the!CPU.!This!demonstrates!the!potential!to!adopt!split*graph!scheduling!for!large!graphs!but!this!is!not!expected!to!be!necessary!as!part!of!the!live!compositing!system!in!Dreamspace!which!will!need!to!run!small!graphs!for!real*time!performance.!It!was!found!as!part!of!this!test!that!the!convergence!was!improved!when! constraining! chains! of! operations! that! have! single! pixel! access! patterns! to! be!parameterized!by!a!single!N*dimensional!partition.!

!

3.8 Summary,

The!classic!JSSP!formulation!has!been!extended!to!include!data!parallelism!and!data!transfer!costs.! The!problem! is!parameterized!by! the!data*parallel! partitioning!of! image!operations!across!available!devices.!This!parameterisation!is!represented!by!a!state!matrix!and!the!total!execution! cost! is! estimated! from! empirical!models! on! the! target! platform! including! data!transfer!costs!incurred!between!execution!resources.!An!efficient!optimization!scheme!has!been!developed!based!on!MHMC!to!explore!the!space!of!possible!schedules!to!minimize!the!execution!time.!

!

!

Page 17: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!17!of!22!!

4 Evaluation,

4.1 Overview, ,, ,, ,, ,

A!prototype!called!BlinkPlayer!has!been!developed!as!a!testbed!to!evaluate!the!standalone!execution! of! BLINK! node*graphs! separate! from! NUKE.! Deliverable! D4.2.1! provides! an!overview!of!the!prototype.!The!scheduler!has!been!integrated!into!BlinkPlayer!so!that!when!a!node*graph!is!imported,!the!scheduler!is!run!to!define!the!execution!order!of!operations!in!on! the!devices! available! for! the!end!platform.! This! section!describes! a! test! on! a!node*graph!to!provide!insight!into!performance!of!the!scheduler!on!a!test*bed!platform.!

!

4.2 Performance,Test, ,, ,, ,

The!performance!of!the!scheduler!is!demonstrated!for!a!node*graph!that!might!typically!be!used!in!Virtual!Production.!Figure!11!shows!the!BLINK!nodes!created!in!NUKE!which!perform!keying,!merging,!grading,!lens!distortion!and!simple!denoising!to!combine!a!video!back*plate!with!a!live!action!foreground!shot!against!a!green*screen.!

!

Figure,11,–!BLINK!node*graph!authored!in!NUKE!for!a!live!composite!in!BlinkPlayer.!

The! execution! time! for! the! graph! was! tested! on! a! HP! Z820! workstation! featuring! two!sockets! loaded!with!2.6Ghz! Intel! Xeon!E5*2650v2!CPUs! (8! Physical/16! Software! cores!per!socket)!and!32GB!of!DDR3!memory.!The!system!has!1x!AMD!W9000!and!1x!AMD!W8000!GPUs,! both! hosted! in! 16! lane! PCIe*3! buses.! The! test! system! runs! RHEL! 6.6! and! the!14.301.1010! AMD! driver.! Displays! were! connected! to! the!W8000! card’s! DVI! ports.! Blink!operations!are!executed!using!OpenCL!on!the!GPU!and!SSE2,!SSE4.1!and!AVX!on!CPU.!

!

Page 18: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!18!of!22!!

Figure!12! shows! the! initial! integration!of! the! scheduler! into! the!BlinkPlayer!application.!A!drop*down!menu!also!allows!the!user!to!select!between!several!scheduling!variants!and!a!large!number!of!static!divisions!of!work!(eg.!CPU!only,!Second!GPU!only,!20%!/!20%!/!60%!split!etc.).!

!

Figure,12,–!Initial!integration!of!the!scheduler!into!BlinkPlayer.!

Figure!13!shows!the!frame!rate!achieved!for!different!schedules!of!the!graph!in!BlinkPlayer!on!the!HP!Z820!workstation.! It! is!clear!that!both!GPUs!significantly!outperform!the!CPU!in!single!device! configurations.!Although! the!W9000! is!marginally! faster! than! the!W8000!on!paper,! the! latter! presents! slightly! higher! performance! because! it! is! driving! the! system’s!displays!and!doesn’t! require!a! final!PCI*e! transfer! to!get! the!graph’s!output! into!a!display!buffer.!It!is!clear!that!all!scheduled!options!reap!significant!benefits!over!a!single!device.!For!a!given!iteration!count!PMHMC!achieves!a!significantly!greater!convergence!than!the!simple!MHMC!algorithm.!PMHMC!matches!MHMC!with!only!10%!of!the!iterations.!Additionally,!the!best!overall!result!comes!from!the!PMHMC!algorithm.!

!

Page 19: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!19!of!22!!

!

Figure,13,–!Frame!rate!achieved!with!different!schedules!in!BlinkPlayer.!

To! demonstrate! the! difficulty! of! scheduling! on! heterogeneous! systems,! we! include! a!number!of!static!partitioning!options!within!BlinkPlayer.!It!can!be!seen!in!Figure!13!that!all!but!one!of!the!static!partitionings!are!slower!than!the!fastest!single!device.!The!only!static!partition! which! outperforms! a! single! GPU! is! the! 50:50! split! across! GPUs.! While! naively!splitting!across!multiple!GPUs!can!outperform!a!single!device,!the!benefits!are!small.!This!is!due! to! time! lost! in! communication! and! synchronisation! between! the! two! devices.! This! is!precisely!what!the!scheduler!seeks!to!optimise.!

Within! the!current!Blink! framework,!we!are!currently!unable! to!overlap!data! transfer!and!code! execution! on! a! compute! resource.! As! transfers! already! represent! a! significant!bottleneck!of!modern!hardware! systems! this! is! a! significant! limitation.! In!many! cases! the!scheduler! may! be! able! to! overlap! unrelated! computation! and! up/down*stream! transfers!were!Blink!able!to!support!this!functionality.!Figure!13!shows!the!hypothetical!performance!as! a! dotted! bar! were! it! possible! to! allow! simultaneous! transfer! and! computation! of!unrelated! buffers.! This! is! a! hypothetical! result,! but! in! practice! we! find! the! predicted!performance! from! the!modelled! execution! cost! to! be! typically!within! 1%!of! the! achieved!framerate.!!

!

4.3 Discussion, ,, ,, ,, ,

It!is!clear!watching!the!internal!state!of!the!scheduler!over!multiple!iterations!that!transfer!costs!are! the!most!significant! influencing! factor!on!performance.!Even! in!situations!where!the!scheduler!has!been!told!that! it!has!access!to! infinitely!fast!computational!hardware,! it!may!still! choose!not! to!use! that!hardware!depending!on! transferring!costs! incurred!when!moving!inputs!to!that!device!and!the!cost!of!sending!the!results!generated!to!their!required!final!destination.!With!modern!GPUs!coming!close! to!500GB/s!of! internal!bandwidth,!and!modern!CPUs!pushing!as!much!as!150GB/s,!the!3*10GB/s!achieved!across!PCI*e!is!clearly!a!huge!systemic!bottleneck.!It!is!not!uncommon!to!see!transfers!of!data!consume!up!to!50%!of! total! schedule! duration.! The! impact! of! this! problem! increases! with! the! triviality! of!

Page 20: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!20!of!22!!

individual!operations! in!the!DAG.!For!the!simplest!graphs,!this!means! it! is!not!possible!for!modern!heterogeneous!systems!to!achieve!any!benefit!over!the!single*device!performance!of!either!the!device!owning!the!source!images!or!the!device!on!which!the!final!result!must!be! located.! On! the! other! hand,! where! DAGs! contain! costly! compute*bound! operations,!significant! speedups! may! be! made! on! heterogeneous! systems! by! dividing! tasks! across!multiple!devices!and! taking! into!account! the! relative!suitability!of!each!compute! resource!for!a!given!task.!

Due!to!the!current!low!bandwidth!PCI*e!connections!between!pairs!of!GPUs!and!their!host!system,! it! is! common! for! the! scheduler! to! favour! the! GPU! to! which! the! displays! are!connected.! Where! the! GPU! is! specified! as! the! final! destination! for! computations,! this!minimises!incurred!transfer!costs.!Faster!interconnects!would!diminish!this!effect!and!allow!the! scheduler! to!make! better! use! of! secondary! GPUs.! It!may! be! seen! that! the! scheduler!becomes!more! aggressive! in! using! secondary! GPUs!where! graphs! contain!more! compute!heavy!operations!or!PCI*e!3.0!is!available.!

The!weight!of!computation!relative!to!data!size!on!a!given!platform!is!called!the!!Compute*Communications! Ratio! (CCR).! We! anticipate! the! future! availability! of! technologies! like!Nvidia’s!NVLink! to!decrease! the!CCR! and! thus! the! ability! of! the! scheduler! to! extract! high!performance!from!heterogeneous!systems.!It!is!worth!noting!where!CCR!is!high!enough!to!saturate!multiple!GPUs!on!a!PCI*e!architecture!the!scheduler!becomes!reluctant!to!use!the!CPU.!When!CCR!increases!enough!to!justify!involving!secondary!GPUs,!it!is!typically!beyond!the! ability! of! the! CPU! to! keep! pace! with! the! GPUs! and! constant! transfers! overheads!preclude! the! CPU! from! taking! part! in! computation.! With! the! increasing! proliferation! of!modern!“integrated”!GPGPUs!able!to!share!memory!with!the!system’s!CPU,!we!expect!that!the! scheduler!may! be! used! to! co*operatively! schedule! such! a! system’s! full! SoC! compute!power.!

!

4.4 Summary,

The!scheduler!has!been!integrated!into!the!BlinkPlayer!prototype!to!execute!a!user*defined!graph! authored! in! NUKE.! The! performance! evaluation! demonstrates! that! optimization! of!execution!using!the!scheduler!achieves!faster! frame*rates!through!better!utilization!of!the!devices!available!on! the!hardware!platform!compared! to!using!a! fixed!partitioning!on! the!devices.!! !

Page 21: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!21!of!22!!

!

5 Conclusion,, ,, ,

The! objective! in! WP4T2! is! to! develop! real*time! methods! for! image! processing! and!compositing!to!integrate!live!action!foreground!content!with!a!virtual!set!in!production!or!as!part!of!a!performance.!

A!workflow!has!been!developed!in!which!the!compositing!pipeline!can!be!authored!in!NUKE,!exported! and! run! as! part! of! a! live! system.! The! same! results! are! then! available! in! post*production!where!NUKE!is!normally!used!as!an!offline!compositing!package.!To!achieve!this,!the! composite! is! defined!using!BLINK!operations.! BLINK! is! a! domain! specific! language! for!image! processing! developed! at! The! Foundry! that! can! be! compiled! to! run! on! different!devices!such!as!the!CPU!and!GPU!for!optimal!performance.!This!advances!the!state*of*the*art! for!Virtual!Production! systems!as!a!provides!a!means! to!author!a! flexible! compositing!pipeline,! technical! artists! are! free! to! create! new! or! modify! existing! image! processing!operations!and!the!same!composite!is!available!on*set!and!in!post*production.!

In! WP4T2,! BLINK! has! been! separated! from! NUKE! to! run! stand*alone! as! a! BlinkPlayer!prototype.!In!this!deliverable,!the!prototype!has!been!extended!to!incorporate!scheduling!to!make!optimal!use!of!available!compute!devices! to!execute!a!graph! in! the! fastest!possible!time.! The! scheduler! has! been! designed! to!make! use! of! all! devices,! so! that! the! hardware!architecture!can!be!scaled!to!achieve!real*time!performance.!!

The!state*of*the*art!in!this!area!considers!task*level!parallelism!for!the!“Job!Shop!Scheduling!Problem”! in! which! a! scheduler! must! assign! tasks! to! available! resources.! This! has! been!extended! to! solve! the! problem! of! finding! the! optimal! partitioning! of! data*processing! for!tasks! across! compute! devices!while! accounting! for! the! cost! of! transferring! data! between!devices.!Currently!this!problem!would!be!solved!in!a!real*time!system!by!manually!assigning!tasks!or!data*processing!to!devices.!This!does!not!scale!to!handle!different!hardware!and!it!does!not!provide!the!flexibility!to!handle!new!tasks!or!changes! in!data!dependencies!with!different! graphs.! The! evaluation! demonstrates! the! performance! improvement! gained! in!scheduling!a!graph!compared!to!a!fixed!distribution!of!processing!across!devices.!!

The!next!stage! in! the!project! is! to!combine! the!BLINK!compositing!pipeline!and!scheduler!into! the! WP6T1! LiveView! application! in! conjunction! with! the! real*time! renderer! for! live!compositing.!

!

!! !

Page 22: DREAMSPACE-D4.2.2 v1.3 300915€¦ · WP,/,Task,responsible, The!Foundry! Other,contributors,! Author(s), A!Purvis,!J!Starck,!The!Foundry! Internal,Reviewer,!Oliver!Grau! EC,Project,Officer,!Alina!Senn!

! !

DREAMSPACE*D4.2.2_v1.3_300915.docx! Page!22!of!22!!

!

6 References,, ,, ,

[1]!Young!L,!McGough!S,!Newhouse!S,!!Darlington!J,!2003,!Scheduling!architecture!and!algorithms!within!the!ICENI!grid!middleware,!UK!e*science!all!hands!meeting,!Nottingham,!UK,!September!2003,!Pages:!5*12! ! !

[2]!Haluk!Topcuouglu,!Salim!Hariri,!and!Min*you!Wu.!2002.!Performance*Effective!and!Low*Complexity!Task!Scheduling!for!Heterogeneous!Computing.!IEEE#Trans.#Parallel#Distrib.#Syst.!13,!3!(March!2002),!260*274.!DOI=10.1109/71.993206!http://dx.doi.org/10.1109/71.993206!

[3]! Low! Complexity! Performance! Effective! Task! Scheduling! Algorithm! for! Heterogeneous!Computing! Environments.! Ilavarasan! E,! Thambidurai! P.! Journal! of! Computer! Science!02/2007;!DOI:!10.3844/jcssp.2007.94.103,

[4]!Coffman,!E.G.!Jr.!(ed.)!1976!Computer!and!Job*Shop!Scheduling!Theory.!John!Wiley,!New!York.!

[5]!French,!S.!1982!Sequencing!and!Scheduling:!An!Introduction!to!the!mathematics!of!the!Job*Shop.!Horwood,!Chichester,!UK.!

[6]!Paolo!Brandimarte.!1993.!Routing!and!scheduling!in!a!flexible! job!shop!by!tabu!search.!Ann.!Oper.!Res.!41,!1*4!(May!1993),!157*183.!

[7]! Peter! J.! M.! van! Laarhoven,! Emile! H.! L.! Aarts,! and! Jan! Karel! Lenstra.! 1992.! Job! shop!scheduling! by! simulated! annealing.! Oper.# Res.! 40,! 1! (January! 1992),! 113*125.!DOI=10.1287/opre.40.1.113!http://dx.doi.org/10.1287/opre.40.1.113!

[8]!1983!*!Simulated#Annealing.!S.!Kirkpatrick,!C.!D.!Gelatt,!Jr.,!M.!P.!Vecchi.!!

[9]!Nicholas!Metropolis,!Arianna!W.!Rosenbluth,!Marshall!N.!Rosenbluth,!Augusta!H.!Teller,!and!Edward!Teller.!Equation!of!state!calculations!by!fast!computing!machines.!The!Journal!of!Chemical!Physics,!21(6):1087–1092,!June!1953.!

[10]! A! Simmulated! Annealing*based! Heuristic! Algorithm! for! Job! Shop! Scheduling! to!Minimize!Lateness.!Int!J!Adv!Robot!Syst,!2013,!10:214.!doi:!10.5772/55956!

[11]! Distributed! Simulated! Annealing! Algorithms! for! Job! Shop! Scheduling.! K! Krishna,! K.!Ganeshan,!D.!Janaki!Ram.!Systems,!Man!and!Cybernetics,!IEEE!Transactions!on!!(Volume:25!,!!Issue:!7!)!

[12]! Kennedy,! J.;! Eberhart,! R.! (1995).! "Particle! Swarm!Optimization".! Proceedings! of! IEEE!International! Conference! on! Neural! Networks! IV.! pp.! 1942–1948.!doi:10.1109/ICNN.1995.488968.!

[13]! Shi,! Y.;! Eberhart,! R.C.! (1998).! "A!modified! particle! swarm! optimizer".! Proceedings! of!IEEE!International!Conference!on!Evolutionary!Computation.!pp.!69–73.!

[14]! Banzhaf,! Wolfgang;! Nordin,! Peter;! Keller,! Robert;! Francone,! Frank! (1998).! Genetic!Programming! –! An! Introduction.! San! Francisco,! CA:! Morgan! Kaufmann.! ISBN! 978*1558605107.!