Torrent Distribution

Embed Size (px)

Citation preview

  • 8/12/2019 Torrent Distribution

    1/13

    C o s t i n . G r i g o r a s @ c e r n . c h

    Torrent-based

    Software Distribution in ALICE

  • 8/12/2019 Torrent Distribution

    2/13

    Outline

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    2

    Motivation

    How it works

    Site requirements

    History Migration status

  • 8/12/2019 Torrent Distribution

    3/13

    Motivation

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    3

    ALICE was using site shared areas for installing the pre-compiled experiment software packages

    Large sites suffered from AFS/NFS/ scalability issuesand being a single point of failure

    Large space needed for the many active versions Old model needed a site local service to manage the

    installation, unpacking and deletion of the packages Requirement for strict site configuration to support

    operation excludes use of opportunistic

    resources/centres From the very beginning, the shared SW area and its

    access from the VO-box was considered a security risk All of the above and more are solved by the use of the

    Torrent protocol to distribute the software packages

  • 8/12/2019 Torrent Distribution

    4/13

    Torrent terminology

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    4

    package.tar.gz

    Chunks of equal size

    package.tar.gz.torrent

    Clients

    Metadata of the original file-SHA1 of chunks-SHA1 of entire file-Tracker location

    Tracker

    Initial seeder

    Seeder

    Leech

    Leech

    Exchange chunks

    Prefer high-speed peers

  • 8/12/2019 Torrent Distribution

    5/13

    How it works

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    5

    Buildservers

    Software repository( one tar.gz / version )

    AliEn file cataloguetorrent://alitorrent.cern.ch/

    Torrent trackeralitorrent.cern.ch:8088

    Torrent seederalitorrent.cern.ch:8092

    Site X

    WN 1

    WN 2

    WN n

    Site Y

    WN 1

    WN 2

    WN n

    No seeding between sites

  • 8/12/2019 Torrent Distribution

    6/13

    How it works (2)

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    6

    Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b),Mac OS X, Ubuntus

    Software repository: 150GB in 600 archives

    Total size of a compressed (4x factor) software set per job is~300MB (this is what is downloaded to the WN)

    One central tracker and seeder

    Limited to 50MB/s to the world

    Fallback to other download methods if torrentdownload fails for any reason

    wget, xrdcp

    But seed them nevertheless

  • 8/12/2019 Torrent Distribution

    7/13

    How it works (3)

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    7

    Bootstrap

    Pilot job script fetches and installs on the local node (`pwd`)the latest AliEn build by Torrent (20MB)

    AliEn JobAgent gets a real job from the central queueand downloads the required software packages Continuing to seed them in background for other local agents

    to quickly get them by LAN

    The JA will run more jobs of the same type (user andSW requirements) within the TTL of the job

    Everything is downloaded in the sandbox of the job,so is wiped at the end of its execution

  • 8/12/2019 Torrent Distribution

    8/13

    Torrent features we use

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    8

    Clients explicitly publish their private IP in thecentral tracker

    Allowing the discovery of LAN peers via this common serviceeven behind NAT

    Local Peer Discovery

    Multicast to discover peers on same network

    Peer exchange

    Peer lists are distributed between the local peers Distributed Hash Tables Decentralized seeder lookup seeders are trackers

  • 8/12/2019 Torrent Distribution

    9/13

    Site requirements

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    9

    How to allow this to happen iptables rules accepting:

    Outgoing to alitorrent.cern.chTCP/8088,8092

    WN-to-WN on

    TCP, UDP / 6881:6999 aria2c default listening ports UDP, IGMP -> 224.0.0.0/4 local peer discovery

    Typically this is already the case, in some cases the ports had tobe whitelisted (very smart firewalls )

    Implicitly sites do notexchange any torrent traffic between

    them No service to run on the site or on the machines, no

    shared area any more, no SPF, essentially no localsupport for this

  • 8/12/2019 Torrent Distribution

    10/13

    History

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    10

    The deployment has faced only policy difficulties Eventually accepted after understanding the technology

    There is no evil technology, only evil use

    First tests at CERN in 02.2009

    Site deployments starting 06.2009 As the shared areas were proving insufficient

    First at the large sites, in operation since 2 years

    Presented in various forums within the collaborationand at CHEPs

    Large awareness call in 01.2012 at ALICE T1/T2Workshop in Karlsruhe

  • 8/12/2019 Torrent Distribution

    11/13

    Migration status

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    11

    First transitions done in close collaboration with thesites

    debugging on the WNs, following up the consequences on thelocal network, firewalls and such

    One month ago we have asked allsites for

    permission to enable torrent Most have confirmed that the policy allows the torrent

    protocol and checked the firewall policies and now they runtorrent

    Working with the rest to solve the (mostly) non-technicalissues

    Some mails went to unread mailboxes

  • 8/12/2019 Torrent Distribution

    12/13

    Migration status

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    12

    T0 in operation since 3 years

    T1s 5 / 6 migrated

    T2s 36 / 78 migrated

    Currently covering 2/3 of the resources, so on averagemore than 20K concurrent jobs are using torrent Rock solid, very efficient technology

    No incidents reported

    Aiming for full migration until next AliEn version isdeployed, to completely drop the PackMan VoBox serviceand the need for shared SW area and caches

  • 8/12/2019 Torrent Distribution

    13/13

    Conclusion

    GDB, Annecy 10.10.2012Torrent-based software distribution in ALICE

    13

    Torrents have enabled us to Simplify site operations by removing a VoBox service and the shared

    SW areas Significantly reduce problems associated with SW deployment,

    relieves the sites support staff

    Have quick software release cycles (both experiment and Gridmiddleware)

    The migration process was carefully staged Policy limitation clarified discussion with security experts Discussions and deployment at T0/T1s and selected T2s (regional

    coverage) Presently towards complete site coverage

    Lifts some of the requirement for a site VoBox, specificconfigurations and services Forward-looking system - towards opportunistic use of resources and

    clouds!