Parallel Programming languages on heterogeneous architectures4 .docx

Embed Size (px)

Citation preview

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    1/26

    Title

    PARALLEL PROGRAMMING LANGUAGES ON

    HETEROGENEOUS ARCHITECTURES USING

    OPENMPC, OMPSS, OPENACC AND OPENMP

    Titulo

    LENGUAJES PARA PROGRAMACION PARALELA

    EN ARQUITECTURAS HETEROGENEAS

    UTILIZANDO OPENMPC, OMPSS, OPENACC Y

    OPENMP

    Authors

    Esteban Hernndez B.

    Network Engineering with Master degree on software engineering and Free Software

    construccin, minor degree on a!ied mathematics and network software construction.

    Now running the "octorate studies on Engineering of "istrita! #ni$ersit% and work as

    rincia! architect on N'. (our works in focus on )ara!!e! )rogramming, high

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    2/26

    erformance comuting, comutationa! numerica! simu!ation and numerica! weather

    forecast.

    *ontact+ ehernandezb-udistrita!.edu.co

    erardo de /es0s Monto%a a$iria,

    Engineer in meteoro!og% from the #ni$ersit% of 1eningrad, with a doctorate in h%sica!2

    mathematica! sciences of state moscow uni$ersit%, ioneer in the area of meteoro!og% in

    *o!ombia, in charge of meteoro!og% graduate of the Nationa! #ni$ersit% of *o!ombia,

    researcher and director of more than 34 graduate theses d%namic area metereoro!og5a and

    numerica! forecast, air 6ua!it%, efficient use of c!imate mode!s and weather. He is current!%

    a fu!! rofessor of the Facu!t% of eosciences of the Nationa! #ni$ersit% of *o!ombia.

    *ontact+ gdmonto%ag-una!.edu.co

    *ar!os Enri6ue Montenegro

    S%stem Engineering, )hd and Master degree on 7nformatics, director of research grou

    77&8 with focus on Socia! Network 8na!%zing, e1earning and data $isua!ization. He is

    current!% associate rofessor of Engineering Facu!t% on "istrita! #ni$ersit%.

    *ontact+ cemontenegrom-udistrita!.edu.co

    Abstract: 9n the fie!d of ara!!e! rograming has seen arri$e a new big !a%er in the !ast 3:

    %ears. 'he )#;s has taken a re!e$ant imortance on scientific comuting because offers

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]
  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    3/26

    high erformance comuting, !ow cost and sim!icit% of im!ementation. Howe$er one of

    the most imortant cha!!enges it the rogram !anguages used for this de$ices. 'he effort for

    recoding a!gorithms designed for *)#s is an imortant rob!em. 7n this artic!e we re$iew

    three of rincia! frameworks for rogramming *#"8 de$ices comared with the new

    directi$es introduced on the 9enM) < standard reso!$ing the /acobi iterati$e

    method=Meierink > ?orst, 3@AA Cang, n.d.D.

    Resumen: En e! camo de !a rogramacin ara!e!a, ha arribado un nue$o gran ugador en

    !os 0!timos 3: aos. 1as )#s han tomado una imortancia re!e$ante en !a comutacin

    cient5fica debido a 6ue ofrecen a!to rendimiento comutaciona!, bao costos % sim!icidad

    de im!ementacin, sin embargo uno de !os desaf5os ms grandes 6ue oseen son !os

    !enguaes uti!izados ara !a rogramacin de !os disositi$os. E! esfuerzo de reescribir

    a!goritmos diseados origina!mente ara *)#s in uno de !os rob!emas ms re!e$antes. En

    este art5cu!o se re$isan tres frameworks de rogramacin ara !a tecno!og5a *#"8 % se

    rea!iza una comaracin con e! reciente estndar 9enM) $ersin ?orst, 3@AA Cang, n.d.D.

    KeywordsCUDA, /acobbi method,9mSS 9en8**,9enM), 9enM), )ara!!e!

    )rogramming

    Palabras clave; Mtodo de acobi, 9mSS, 9en8**,9enM), )rogramacin ara!e!a.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    4/26

    I. INTRODUCTION

    Since 3: %ears ago, the massi$e!% ara!!e! rocessors ha$e used the )#s as rincia!

    e!ement on the new aroach in ara!!e! rogramming itGs e$o!$ed from a grahics2secific

    acce!erator to a genera!2urose comuting de$ice and at this time is considered to be in the

    era of )#s.=Nicko!!s > "a!!%, 4:3:D, howe$er the rincia! obstac!e for !arge adotion on

    the rogrammer communit% has been the !ake of standards that a!!ow rogramming on

    unified form different eisting hardware so!utions=Nicko!!s > "a!!%, 4:3:D. 'he most imort

    !a%er on )# so!utions is N$idia I with the *#"8I !anguage rogramming and his own

    comi!er =n$ccD=Hi!! > Mart%, 4::JD, with thousands of so!utions insta!!ed and reward on

    toK:: suercomuter !ist, howe$er the ortabi!it% itGs the rincia! rob!em. Some

    communit% roect and some hardware a!!iance ha$e roosed so!utions for reso!$e this

    issue. 9mSS, 9en8** and 9enM)* ha$e emerged as the most romising so!utions

    =?etter, 4:34D using the 9enM) base mode!. 7n the !ast %ear 9enM) board re!ease the

    $ersion

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    5/26

    A. Ompss

    9mss =a rogramming mode! form Barce!ona Suercomuter center based on 9enM)

    and StarSsD itGs framework focus on task decomosition aradigm for de$e!o ara!!e!

    a!ications on c!uster en$ironments with heterogeneous architectures. 7t ro$ides a set of

    comi!er directi$es that can be used to annotate a se6uentia! code. 8dditiona! features ha$e

    been added to suort the use of acce!erators !ike )#s. 9mSS is based on StartsS a task

    based rogramming mode!. 7t is based on annotating a seria! a!ication with directi$es that

    are trans!ated b% the comi!er. Cith it, the same rogram that runs se6uentia!!% in a node

    with a sing!e )# can run in ara!!e! in mu!ti!e )#s either !oca! =sing!e nodeD or remote

    =c!uster of )#sD. Besides erforming a task2based ara!!e!ization, the runtime s%stem

    mo$es the data as needed between the different nodes and )#s minimizing the imact of

    communication b% using afnit% schedu!ing, caching, and b% o$er!aing communication

    with the comutationa! task.

    9mSs is based on the 9enM) rogramming mode! with modications to its eecution and

    memor% mode!. 7t a!so ro$ides some etensions for s%nchronization, data motion and

    heterogeneit% suort.

    3D Eecution mode!+ 9mSs uses a thread2oo! eecution mode! instead of the traditiona!

    9enM) fork2oin mode!. 'he master thread starts the eecution and a!! other threads

    cooerate eecuting the work it creates =whether it is from work sharing or task constructsD.

    'herefore, there is no need for a ara!!e! region. Nesting of constructs a!!ows other threads

    to generate work as we!! =Figure 3D.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    6/26

    4D Memor% mode!+ 9mSs assumes that mu!ti!e address saces ma% eist. 8s such shared

    data ma% reside in memor% !ocations that are not direct!% accessib!e from some of the

    comutationa! resources. 'herefore, a!! ara!!e! code can on!% safe!% access ri$ate data and

    shared data which has been marked e!icit!% with our etended s%nta. 'his assumtion is

    true e$en for SM) machines as the im!ementation ma% rea!!ocate shared data to imro$e

    memor% accesses =e.g., N#M8D.

    LD Etensions+

    Function tasks+ 9mSs a!!ows to annotate function dec!arations or denitions

    *i!k="uran, )erez, 8%guad, Badia, > 1abarta, 4::JD, with a task directi$e. 7n this case, an%

    ca!! to the function creates a new task that wi!! eecute the function bod%. 'he data

    en$ironment of the task is catured from the function arguments.

    "eendenc% s%nchronization+ 9mSs integrates the StarSs deendence suort ="uran et

    a!., 4::JD. 7t a!!ows annotating tasks with three c!auses+ inut, outut, inOout. 'he% a!!ow

    eressing, resecti$e!%, that a gi$en task deends on some data roduced before, which wi!!

    roduce some data, or both. 'he s%nta in the c!ause a!!ows secif%ing sca!ars, arra%s,

    ointers and ointed data.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    7/26

    Figure 1. OmSS e!e"u#i$% m$&e'

    B. OpenACC

    9en8** is an industr% standard roosed for heterogeneous comuting on

    Suer*omuter *onference 4:33. 9en8** has fo!!owing the 9enM) aroach, with

    annotation on Se6uentia! code with comi!er directi$es =ragmasD, indicating those regions

    of code suscetib!e to be eecuted in the )#.

    'he eecution mode! targeted b% 9en8** 8)72enab!ed im!ementations is host2

    directed eecution with an attached acce!erator de$ice, such as a )#. Much of a user

    a!ication eecutes on the host. *omute intensi$e regions are off!oaded to the acce!erator

    de$ice under contro! of the host. 'he de$ice eecutes ara!!e! regions, which t%ica!!%

    contain work2 sharing !oos, or kerne!s regions, which t%ica!!% contain one or more !oos

    which are eecuted as kerne!s on the acce!erator. E$en in acce!erator2targeted regions, the

    host ma% orchestrate the eecution b% a!!ocating memor% on the acce!erator de$ice,

    initiating data transfer, sending the code to the acce!erator, assing arguments to the comute

    region, 6ueuing the de$ice code, waiting for com!etion, transferring resu!ts back to the

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    8/26

    host, and de2a!!ocating memor% =Figure 4D. 7n most cases, the host can 6ueue a se6uence of

    oerations to be eecuted on the de$ice, one after the other.=Co!fe, 4:3LD

    'he actua! rob!ems with 9en8** are re!ationshi with the on!% for2oin mode! suort

    and suort for on!% commercia! comi!ers can suorts his directi$es =)7, *ra% and

    *8)SD=Co!fe, 4:3LD=&e%es, 1ez, Fumero, > Sande, 4:34D. 7n the !ast %ear on!% one

    oen source im!ementations has suort =acc#11D =&e%es > 1ez2&odr5guez, 4:34D.

    Figure (. Oe%ACC e!e"u#i$% m$&e'

    C. OpenMPC

    'he 9enM)* =9enM) etendent for *#"8D itGs a framework to hide de com!eit% of

    rogramming mode! and memor% mode! to user=1ee > Eigenmann, 4:3:D. 9enM)*

    consists of a standar 9enM) 8)7 !us a new set of directi$es and en$ironment $ariab!es to

    contro! imortant *#"82re!ated arameters and otimizations.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    9/26

    9enM)* addresses two imortant issues on ))# rogramming+ rogrammabi!it%

    and tunabi!it%. 9enM)* as a front2end rogramming mode! ro$ides rogrammers with

    abstractions of the com!e *#"8 rogramming mode! and high2!e$e! contro!s o$er

    $arious otimizations and *#"82re!ated arameters. 9enM)* inc!uded fu!!% automatic

    comi!ation and user2assisted tuning s%stem suorting 9enM)*. 7n addition to a range of

    comi!er transformations and otimizations, the s%stem inc!udes tuning caabi!ities for

    generating, runing, and na$igating the search sace of comi!ation $ariants.

    9enM)* use the comi!er cetus="a$e, Bae, Min, > 1ee, 4::@D for automatic

    ara!!e!ization source to source. 'he Source code on * has L !e$e! of ana!%zing

    )ri$atization

    &eduction ?ariab!e &ecognition

    7nduction ?ariab!e substitution

    9enM)* adding a numbers of ragmas for annotate 9enM) ara!!e! regions and se!ect

    otimization regions. 'he ragmas added has the fo!!owing form+ ragma cuda

    PPfunctionQQ.

    D. OpenMP release 4

    9enM) is the most used framework for rogramming ara!!e! software with shared

    memor% and suort on most of the eisting comi!ers. Cith the e!osion of mu!ticore and

    man%core s%stem, 9enM) gain accetance on ara!!e! rogramming communit% and

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    10/26

    hardware $endors. From his creation to $ersion L the focus of 8)7 was the *)#s

    en$ironments, but with the introduction of )#s and $ector acce!erators the new < re!ease

    inc!ude suort for eterna! de$ices=9enM), 4:3LD. Historica!!% 9enM) has suort

    Sim!e 7nstruction Mu!ti!e "ata =S7M"D mode! on!% focus on fork2oin mode! =Figure LD,

    but in this new re!ease the task2base mode! ="uran et a!., 4::J )odobas, Brorsson, > Fan,

    4:3:D has been introduced to gain erformance with more ara!!e!ism on eterna! de$ices .

    'he most imortant directi$es introduced were+ target, teams and distributed. 'his

    directi$es ermit that a grou of threads was distributed on a secia! de$ices and the resu!t

    was been co% to host memor% =Figure LD.

    Figure ). Oe%MP * e!e"u#i$% m$&e'

    III. JACO+I ITERATIE METHOD

    7terati$e methods are suitab!e for !arge sca!e !inear e6uations. 'here are three common!%

    used iterati$e methods+ /acobiGs method, auss method and S9& iterati$e

    methods=ra$$anis, Fi!e!is2)aadoou!os, > 1iitakis, 4:3L Huang, 'eng, Cahid, > Ro,

    4::@D.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    11/26

    BAx= 'he !ast two iterati$e methods con$ergence seed is faster than /acobiGs

    iterati$e method but !ack of ara!!e!ism. 'he% ha$e ad$antages to /acobi method on!% when

    im!emented in se6uentia! fashion and eecuted on traditiona! *)#s. 9n the other hand,

    /acobiGs iterati$e method has inherent ara!!e!ism. 7tGs suitab!e to be im!emented on *#"8

    or $ector acce!erators to run concurrent!% on man% cores. 'he basic idea of /acobi method

    is con$ert the s%stem+ into e6ui$a!ent s%stem then we so!$ed E6uation =3D

    and E6uation =4D+

    LLLL4L4LLL3

    4L4L4444443

    3L3L4343333

    bxaxaxa

    bxaxaxa

    bxaxaxa

    =++

    =++

    =++

    LL

    L4

    LL

    L43

    LL

    L3L

    44

    4L

    44

    4L3

    44

    434

    33

    3L

    33

    3L4

    33

    343

    a

    bx

    a

    ax

    a

    ax

    a

    bx

    a

    ax

    a

    ax

    a

    bx

    a

    ax

    a

    ax

    +=

    +=

    +=

    E6uation =3D

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    12/26

    =

    ij

    jk

    jii

    ii

    ik xab

    ax 3

    ,

    ,

    3

    on each iteration we so!$e +

    E6uation =4D

    Chere the $a!ues from the =k-1Dst iteration are used to comute the $a!ues for the kth

    iteration. 'he seudo code for /acobi method ="ongarra, o!ub, rosse, Mo!er, > Moore,

    4::JD+

    *hoose an initia! guest to the so!ution .

    for k3,4,T

    for i3,4,Tni:

    for 3,4,T,i23,iU3,Tn

    i iU ai,=k23D

    end

    i =biU iDO ai,

    end=kDcheck con$ergence continue if necessar%

    end

    'his iterati$e method can be im!emented on a ara!!e! form, using shared or distributed

    memor% =Margaris, Soura$!as, > &oume!iotis, 4:3

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    13/26

    Figure *. P-r-''e' $rm $ J-"$/i me#0$& $% 0-re& mem$r2=8!semmeri, n.d.D

    I. OPENMP IMPLEMENTATION

    int n17M7'VN

    int m17M7'VM8WnXWmX OO the "23 matri

    8new WnXWmX

    %V$ectorWnX OOb $ectorOOfi!! the matriz with initia! conditions

    T

    ragma om ara!!e! for shared=m, n, 8new, 8D for= int 3 P n23 UUD

    Y

    for= int i 3 i P m23 iUU D

    Y 8newWXWiX :.4Kf Z = 8WXWiU3X U 8WXWi23X

    U 8W23XWiX U 8WU3XWiXD

    error fmaf= error, fabsf=8newWXWiX28WXWiXDD [

    [

    ragma om ara!!e! for shared=m, n, 8new, 8D

    for= int 3 P n23 UUD Y

    for= int i 3 i P m23 iUU D Y

    8WXWiX 8newWXWiX

    [

    [

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    14/26

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    15/26

    ragma acc kerne!s

    for= int 3 P n23 UUD Y

    for= int i 3 i P m23 iUU D

    Y 8WXWiX 8newWXWiX

    [

    [ if=iter \ 3:: :D rintf=]\Kd, \:.^f_n], iter, errorD

    iterUU

    [TOOrint the resu!t

    7n this im!ementation aear a new annotation ragma acc kerne!s, where itGs indicate that

    the code wi!! be eecuted. 'his sim!e annotation hide a com!e im!ementation of *#"8

    kerne!, the co% of data from host to de$ices and de$ices to )# and definition of grid of

    threads and the data maniu!ation =8morim > Haase, 4::@ Sanders > Randrot, 4:33

    `hang et a!., 4::@D.

    I. PURECUDA IMPLEMENTATION=Cang, n.d.D

    T

    int dimB, dim'

    dim' 4K^

    dimB =dim O dim'D U 3

    f!oat err 3.:

    OO set u the memor% for )# f!oat Z 1#Vd

    f!oat Z BVd f!oat Z diagVd

    f!oat ZVd, ZVo!dVd

    f!oat Z tm

    cudaMa!!oc= =$oid ZZD >BVd, sizeof=f!oatD Z dim D

    cudaMa!!oc= =$oid ZZD >diagVd, sizeof=f!oatD Z dim D

    cudaMa!!oc= =$oid ZZD >1#Vd, sizeof=f!oatD Z dim Z dimD

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    16/26

    cudaMemc%= 1#Vd, 1#, sizeof=f!oatD Z dim Z dim, cudaMemc%Host'o"e$iceD

    cudaMemc%= BVd, B, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD

    cudaMemc%= diagVd, diag, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD

    cudaMa!!oc= =$oid ZZD >Vd, sizeof=f!oatD Z dimD

    cudaMa!!oc= =$oid ZZD >Vo!dVd, sizeof=f!oatD Z dimD

    cudaMa!!oc= =$oid ZZD >tm, sizeof=f!oatD Z dimD

    T

    OOca!! to cuda kerne!s

    OO 4. *omute b% 8 Vo!d

    cudaMemc%= Vo!dVd, Vo!d, sizeof=f!oatD Z dim, cudaMemc%Host'o"e$iceD

    matMu!t?ecPPPdimB, dim'QQQ=1#Vd, Vo!dVd, tm, dim, dimD OO use Vo!d to comute 1#

    Vo!d and store the resu!t in tm

    substractPPPdimB, dim'QQQ=BVd, tm, Vd, dimD OO get the =B 2 1# Vo!dD, which is

    stored in Vd

    diaMu!t?ecPPPdimB, dim'QQQ=diagVd, Vd, dimD OO get the new

    OO L. co% the new back to the Host Memor%

    cudaMemc%= , Vd, sizeof=f!oatD Z dim, cudaMemc%"e$ice'oHostD

    OO

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    17/26

    [

    row7d U grid"im. Z b!ock"im.

    [

    VVs%ncthreads=D[

    OO cuda kerne! for $ector subtraction

    VVg!oba!VV $oid substract=f!oat ZaVd,

    f!oat ZbVd,

    f!oat ZcVd,

    int dimD

    Y

    int tid thread7d. U b!ock7d. Z b!ock"im.

    whi!e = tid P dim D

    Y

    cVdWtidX aVdWtidX 2 bVdWtidX

    tid U grid"im. Z b!ock"im.

    [

    [

    VVg!oba!VV $oid ?ecMa= f!oat Z $ec, int dimD

    Y int tid thread7d. U b!ock7d. Z b!ock"im.

    whi!e =dim Q 3D

    Y

    int mid dim O 4 OO get the ha!f size

    if = tid P midD OO fi!ter the acti$e thread

    Y

    if = $ecWtidX P $ecWtidUmidX D OO get the !arger one between $ecWtidX and $ecWtidUmidX

    $ecWtidX $ecWtidUmidX OO and store the !arger one in $ecWtidX [

    OOdea! with the odd case

    if = dim \ 4 D OO if dim is odd...we need care about the !ast e!ement

    Y

    if = tid : D OO on!% use the $ecW:X to comare with $ecWdim23X

    Y

    if = $ecWtidX P $ecWdim23X D

    $ecWtidX $ecWdim23X

    [ [

    VVs%ncthreads=D OO s%nc a!! threads

    dim O 4 OO make the $ector ha!f size short.

    [

    [

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    18/26

    'he effort for write three kerne!s, management the !ogic of grids dimensions, co% data

    from hosts to )# and )# to host, the asect of s%nchronization thread on the grous of

    'hreads on )# and some asects as 'hread7" ca!cu!ation re6uired high effort of

    know!edge of hardware de$ices and code rogramming.

    II. TEST TECHNIQUE

    Ce take the tree im!ementations of /acobi method =)ure 9enM), 9en8**, 9enM)*D

    and running its on S#' =S%stem #nder 'estD of 'ab!e 3, and make mu!ti!es running with

    s6uare matrices of incrementa! sizes ='ab!e 4D, take rocessing time for ana!%zing the

    erformance =Rim, n.d. Sun > ustafson, 3@@3D against the code number !ines needed on

    the a!gorithm.

    T-/'e 1. C0-r-"#eri#i" $ S2#em u%&er Te#

    ystem Suermicro S(S23:4A&2'&F

    CPU 7nte!I eonI 3:2core EK24^J: ?4 *)#s - 4.J: Hz

    !emory L4B ""&L 3^::MHz

    "PU N?7"78 Re!er R

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    19/26

    4: &am, 4:3LD.

    I4. CONCLUSION

    'he new de$ices for high erformance comuting needs standardized methods and

    !anguages that ermits interoerabi!it%, eas% rogramming and we!! defined interfaces for

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    20/26

    integrated the data interchange between memor% segments of the rocessors =*)#sD and

    de$ices. 7s necessar% too, that !anguages has suort for work with two or more de$ices on

    ara!!e! using the same code but running segments of high ara!!e!ism in automatica!!%

    form. 9enM) is the facto standard for shared memor% rogramming mode! but the

    suort for heterogeneous de$ices =us, acce!erators, fga, etcD is in $er% ear!% stage, the

    new frameworks and industria! 8)7 need he! to grown and maturation of standard.

    4. FINANCING

    'his this research was de$e!oment with own resources, in the "octora! thesis rocess on

    "istrita! #ni$ersit%

    4I. FUTURE 5OR6S

    7s necessar% ana!%zing the imact of frameworks for mu!ti!e de$ices on the same host for

    management rob!ems of memor% !oca!it%, unified memor% between *)# and de$ices and

    7O9 oerations from de$ices using the most recent $ersion of it standard framework. Cith

    the suort of )#s de$ice on the fo!!owing $ersion of **K, is ossib!e a comi!er

    erformance comarison between commercia! and standards oensource so!utions. 7tGs

    necessar% too re$iew the imact too of use c!uster with mu!ti2de$ices, messaging ass and

    M)7 integration for high erformance comuting using the !anguage etensions =Schaa >

    Rae!i, 4::@D.

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    21/26

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    22/26

    "annert, '., Marek, 8., > &am, M. =4:3LD. )orting 1arge H)* 8!ications to )#

    *!usters+ 'he *odes ENE and ?E&'E. &etrie$ed from

    htt+OOari$.orgOabsO3L3:.3 Moore, R. =4::JD. Net!ib and N82Net+

    Bui!ding a Scientific *omuting *ommunit%.Annals of *e 0is*or2 of Comp)*ing

    333, ,=4D, L:

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    23/26

    uta, S., iang, )., > `hou, H. =4:3LD. 8na!%zing !oca!it% of memor% references in )#

    architectures. & of *e ACM (8PLA5 :orksop on Memor2 &, 34.

    doi+3:.33

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    24/26

    1=3LAD, 3 1ez2&odr5guez, 7. =4:34D. acc#11+ 8n 9en8** im!ementation with

    *#"8 and 9en*1 suort.3)ro-Par +,1+ Parallel &, +=44JL@JD, JA3JJ4.

    &etrie$ed from htt+OO!ink.sringer.comOchaterO3:.3::AO@AJ2L2^ Randrot, E. =4:33D. *#"8 b% Eam!e.An n*ro!)/*ion *o 8eneral-P)rpose

    8PU Programming &. &etrie$ed from htt+OOscho!ar.goog!e.comOscho!ar

    h!en>btnSearch>6intit!e+*#"8Ub%UEam!e3

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    25/26

    Schaa, "., > Rae!i, ". =4::@D. E!oring the mu!ti!e2)# design sace. +,, 333

    n*erna*ional (2mposi)m on Parallel ' Dis*rib)*e! Pro/essing, 334.

    doi+3:.33:@O7)")S.4::@.K3^3:^J

    Sun, ., > ustafson, /. =3@@3D. 'oward a better ara!!e! erformance metric.Parallel

    Comp)*ing, 3:@L33:@. &etrie$ed from

    htt+OOwww.sciencedirect.comOscienceOartic!eOiiOS:3^AJ3@3:KJ::4J^

    ?etter, /. =4:34D. 9n the road to Easca!e+ !essons from contemorar% sca!ab!e )#

    s%stems. & 7A C@C :orksop on A//elera*or e/nologies for &. &etrie$ed from

    htt+OOwww.acrc.a2star.edu.sgOastaratiregV4:34O)roceedingsO)resentation 2 /effre%

    ?etter.df

    Cang, (. =n.d.D. /acobi iteration on )#. &etrie$ed Setember 4K, 4:3

  • 7/25/2019 Parallel Programming languages on heterogeneous architectures4 .docx

    26/26