Pallet Big Data - JClouds Meetup 2013

Embed Size (px)

Citation preview

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    1/26

    JClouds Meetup Feb 2013

    Toni Batchelli -- co-founder -- PalletOps.com

    Pallet Big Data

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    2/26

    Programmable

    Infrastructure

    The Cloud!

    Flexible

    Powerful

    Dynamic

    ... and then what?

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    3/26

    ... and then what?

    Configure all the things!

    configure the servers

    configure the local systems

    configure the distributed systems

    Configuration Managers:

    build a configuration database

    wait for nodes to pull config

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    4/26

    Programmatic

    Infrastructure

    In a programatic infrastructure, the

    systems are provisioned and configured by

    running a program (*)

    jclouds takes care of the provisioning part

    Pallet takes care of the configuration part

    (*) as opposed to configuring a server to coordinate the config, or

    using templatesSaturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    5/26

    Why programs?

    With a program you can do many things:

    Run it anywhere

    Keep it in GitHub

    Parametrize it

    Have it run by another program

    Make it a library

    Extend it

    etc

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    6/26

    e.g. Hadoop Clusters

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    7/26

    JobTracker

    NameNode TaskTracker

    DataNode

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    8/26

    JobTracker

    NameNode

    Master

    TaskTracker

    DataNode

    Slave

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    9/26

    JobTracker

    NameNode

    Master

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    10/26

    JobTracker

    NameNode

    Master

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    11/26

    JobTracker

    NameNode

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    NameNode

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    12/26

    Java

    Hadoop

    Task

    Tracker

    Job

    Tracker

    Data

    Node

    Name

    Node

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    13/26

    Java

    Hadoop

    Job

    Tracker

    Task

    Tracker

    Data

    Node

    Name

    Node

    .jar

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    14/26

    Java

    Hadoop

    Job

    Tracker

    Task

    Tracker

    Data

    Node

    Name

    Node

    .jar

    Master

    Node

    Slave

    Node

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    15/26

    Java

    Hadoop

    Job

    Tracker

    Task

    Tracker

    Data

    Node

    Name

    Node

    .jar

    Master

    Node

    Slave

    Node

    Hadoop

    Cluster

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    16/26

    JobTracker

    NameNode

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    TaskTracker

    DataNode

    Slave

    NameNode

    SSH

    SSH

    SSH

    SSH

    SSH

    SSH

    Caution: Major oversimplification in progress!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    17/26

    function:authorize-node(node,group)

    (public-key,private-key)=gen-key(node)

    fortarget-nodeinnodes(group)doauth-key(public-key,target-node)

    done

    function:auth-key(key,node)

    when-not./sshdo

    create-dir(./ssh)

    done

    when-not./ssh/authorized_keysdo

    create-file(./ssh/authorized_keys)done

    append-to-file(./ssh/authorized_keys,key)

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    18/26

    function:build-cluster(infra,slave-count,RAM)

    slave-spec=build-slave-spec(RAM)

    master-spec=build-master-spec(RAM)

    slaves=procure(infra,slave-spec,slave-count)master=procure(infra,master-spec,1)

    master.configure()

    forslaveinslavesdo

    slave.configure()

    done

    authorize-node(master,slaves)

    ...

    ec2c=build-cluster(ec2,100,8GB)

    rsc=build-cluster(rackspace,100,16GB)

    vbc=build-cluster(virtualbox,3,2GB)

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    19/26

    e.g. Pallet Big Data

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    20/26

    Pallet Big Data

    We decided wed build something useful

    with all this power: liberating Amazon EMR

    users :)

    Build Hadoop clusters anywhere and everywhere

    Use your preferred Hadoop distro and version

    Build your own workflows

    I just saved a bunch of $$$ by switching to

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    21/26

    {:cluster-prefix"hc1":groups{:master{:node-spec{:hardware{:hardware-id"m1.medium"}}:count1

    :roles#{:namenode:jobtracker}}:slave{:node-spec{:hardware{:hardware-id"m1.medium"}}:count2

    :roles#{:datanode:tasktracker}}}:node-spec{:image{:os-family:ubuntu:os-version-matches"12.04":os-64-bittrue}}

    :hadoop-settings{:dist:cloudera}}

    a Hadoop Cluster

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    22/26

    {:steps[{:script-file"bootstrap/setup.sh"}{:script"/bin/start-daemon"}{:jar{:remote-file"//usr/.../image-parse.jar"}

    :main"parse" :input"s3n://sources/satellite-data" :outputhdfs://parsed-sat-img} {:jar {:remote-file

    //usr/.../outline-detection.jar} :maindetect :inputhdfs://parsed-sat-img :output"s3n://results/weather-data"}]

    :on-completion:terminate-cluster}

    a Hadoop workflow

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    23/26

    $ bin/hadoop start

    $ bin/hadoop job job_spec.clj

    $ bin/hadoop destroy

    run hadoop, run!

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    24/26Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    25/26

    PalletOps

    Saturday, March 16, 13

  • 7/29/2019 Pallet Big Data - JClouds Meetup 2013

    26/26

    backlog

    Feature parity with Amazon EMR

    Server Rack support

    Extended workflows

    Central Management service

    Interested in giving it a try?

    [email protected]

    [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]