Slurm Migration

Embed Size (px)

Citation preview

  • 7/17/2019 Slurm Migration

    1/19

    SLURM Migration Experienceat

    Lawrence Berkeley National Laboratory

    Presented by

    Jacqueline Scoggins

    System Analysts, HPCS Group

    Slurm Conference September 22-23, 2014

    Lugano, Switzerland

  • 7/17/2019 Slurm Migration

    2/19

    Who are we?

    National Laboratory ringing Science

    Solution! to t"e world

  • 7/17/2019 Slurm Migration

    3/19

    HPCS ro!p

    High Per"or#ance Co#p!ting Ser$ice%

    # $e began pro%iding &'CS in 2003 !upporting 10departmental clu!ter!(

    # )oday we manage t"roug"out LNL campu! and a

    !ub!et of *C er+eley campu!

    o%er 30 clu!ter! ranging from 4 node clu!ter! to240 node clu!ter!(

    o%er 2000 u!er account!

    o%er 140 proect!

  • 7/17/2019 Slurm Migration

    4/19

    &UR EN'(R&NMEN)

    $e u!e t"e term - .SuperClu!ter/ Single a!ter Node - pro%i!ioning node and

    !c"eduler !y!tem

    ultiple interacti%e login node! ultiple Clu!ter!

    # n!titutional and epartmental purc"a!ed node!

    lobal &ome ile!y!tem!

    latten Networ+ )opology !o e%eryt"ingcommunicate wit" t"e management5interacti%e

    node!

  • 7/17/2019 Slurm Migration

    5/19

    lobal ile!y!tem nteracti%e *!er

    Node! 67)' acce!!8

    anagement

    9 Sc"eduler

    Node

    Compute Node!

    :;C&)

  • 7/17/2019 Slurm Migration

    6/19

    &UR SE) UP

    # Node! are !tatele!! and pro%i!ioned u!ing

    $arewulf 6interacti%e, compute and !toragenode!8

    # =ob! could not !pan acro!! clu!ter!

    # :ll u!er account! e>i!t on t"e interacti%enode!

    # :ll u!er account! e>i!t on t"e !c"eduler node

    but wit" nologin acce!!(# *!er account! are only added to t"e compute

    node! t"ey are allowed to run on(

  • 7/17/2019 Slurm Migration

    7/19

  • 7/17/2019 Slurm Migration

    8/19

    Moab Con"ig!ration ,cont-*.

    /CL on u!er!5group! re!ource! t"ey could

    u!e in t"e CL:SSC and S;C( )"eywere al!o configured wit"in tor?ue for eac"

    ?ueue

    Li#it%0Policie% -

  • 7/17/2019 Slurm Migration

    9/19

    Moab Con"ig!ration ,cont-*.

    /cco!nt /llocation% old ban+ing forc"arging C'* cycle! u!ed, !etting up rate!

    for node type!5feature!

    Back"ill

    Stan*ing Re%er$ation%

    Con*o Co#p!ting

  • 7/17/2019 Slurm Migration

    10/19

    What #a*e !% #igrate?

    # )oo many unpredictable !c"eduler i!!ue!

    # i!ting Support contract wa! up for renewal

    # Slurm "ad t"e feature! and capabilitie! we

    # needed

    # Slurm arc"itectural de!ign wa! le!! comple>

    #

  • 7/17/2019 Slurm Migration

    11/19

    What *i* we nee* to #igrate?

    # Single !c"eduler capable of managing

    department owned and in!titutional owned

    node!

    # Sc"eduler needed to be able to !eparate

    u!er! acce!! from node! t"ey do not "a%eaccount! on(

    # Sc"eduler needed to be able to a%oid

    !c"eduling conflict! wit" u!er! from ot"er

    group!

  • 7/17/2019 Slurm Migration

    12/19

    What *i* we nee* to #igrate?

    # Sc"eduler needed to be able to "a%e limit! on

    ob!5partition!5node! independently

    # Needed an :llocation management !c"eme

    to be able to continue c"arging u!er!

    # Need to be able to u!e Node &ealt" C"ec+ met"odfor managing un"ealt"y node!

  • 7/17/2019 Slurm Migration

    13/19

    What were %o#e o" o!r Challenge%?

    igrating o%er A00B u!er! and 1 different clu!ter!wit"in a 4 mont" period to a new !y!tem

    mplementation de!ign

    &ow will we configure t"e ?ueue!D

    &ow will we get fair!"are and bac+fill to wor+D

    &ow will we implement our :ccounting5C"arging

    modelD 6Condo %! Non-condo u!age on in!titutional

    clu!ter8

    Can we control *!er :cce!! to t"e node!D

  • 7/17/2019 Slurm Migration

    14/19

    How *i* we con"ig!re %l!r#?

    # Jobs run on nodes exclusively or shared

    # Multiple Partitions to separate the clusters from each other

    # QoSs to setup the limits for the jobs/partition and which one

    a user can use within a partition# Backfill

    # Fairshare

    #Priority Job Scheduling

    # Network topology configured to group the nodes and

    switches to best place the jobs

  • 7/17/2019 Slurm Migration

    15/19

    How *i* we #igrate?

    $e ran bot" !c"eduler! during t"e 4 mont" proce!!

    !imultaneou!ly

    $e migrated a !et of clu!ter! to !lurm or we did a !ingle

    clu!ter o%er to !lurm

    Create !cript! for adding account! and u!er!$e "ad to educate our u!er!

  • 7/17/2019 Slurm Migration

    16/19

    Proble#% Enco!ntere*

    :ccounting "ad to create a !eparate mec"ani!m for

    managing c"arging( *!ed or e>i!ting u!er data te>tfile by adding a few field! to identify a u!er wit" all of

    t"e proect! t"ey were a!!ociated wit"( $e need to

    "a%e +ey %alue!E u!ername 'roect id and account

    name for billing t"e u!er!( Slurm doe! not "a%e t"e

    a!!ociation in it $F

  • 7/17/2019 Slurm Migration

    17/19

    Proble#% enco!ntere*

    ac+fill wa! not properly getting all of t"e ?ueued ob!

    t"at could fit into t"e !c"eduler( &ad to c"ange !ome

    of t"e parameter!

    )ree$idt" I 40JK

    Sc"eduler'arameter I bfcontinue

    efault5in ob limit! mi!!ing t"e ability to !et t"e

    default or min limit! are not a%ailable( t would be

    good to "a%e at lea!t t"e default !etting a%ailable for$allcloc+ limit! in ca!e a u!er forget! to re?ue!t it(

  • 7/17/2019 Slurm Migration

    18/19

    Con*o $% Non2Con*o Co#p!ting

    Condo Computing - a PI purchase nodes, and we incorporatethem into the compute node environment. The users of the

    condo are subject to run for free only on their contribution of

    nodes. If they run outside of their contributed condo resources,

    they are subjected to the same recharge rates as all other users.

    Non-condo Computing Institutional compute nodes + condo

    compute nodes not in use by the condo users. Users are

    charged a recharge rate of $0.01 per Service Unit (1 cent per

    service unit, SU).

    NOTE: We reduce the charge rate by 25% for each generation of nodes

    lr1 - 0.50 SU per Core CPU hour ($0.005 per core/hr)

    mako and lr2 - 0.75 SU per Core CPU hour ($0.0075 per core/hr)

    lr3 1 SU per Core CPU hour ($0.01 per core/hr)

  • 7/17/2019 Slurm Migration

    19/19

    Pl!% %i*e o" SLURM

    Capabilitie! for !eparating u!er! re!ource pool %ia

    accounting a!!ociation

    Capabilitie! for !etting up air!"are from a parent le%el

    and t"e c"ildren in"erit e?ual !"are! and get

    indi%idually lower priority if t"ey u!e t"e re!ource!

    o%er a window of time

    So far it "a! been !imple and ea!y to u!e