Off the Grid

Preview:

DESCRIPTION

Grid computing is a form of distributed computing that is increasing in popularity in fields that have high computation and/or data storage requirements. In the presentation we give an overview of grid computing, describe our experiences using grid tools on a real project and develop a working grid across a cluster of two nodes using GridGain, an open source grid toolkit.

Citation preview

Off the Grid

QJUGFebruary 2007

Nick PartridgeVeitch Lister Consulting

Tom AdamsWorkingmouse

Introduction to Grid Computing with GridGain

Why are we here?

Large distributed application

Grid-based solution worked

Flow

Grid?•Multiple independent computing clusters which act like a

"grid" (Wikipedia)

•Many nodes, each node is indistinguishable from other nodes

•Complete machines over co-located CPUs?

•Multiple processes?

•Commodity hardware?

•Homogenous machines?

A tale of two grids

Partition data across grid

Partition processing across grid

http://www.jroller.com/nivanov/entry/grid_computing_compute_grid_data

Selection

Requirements•Callable from a Rails webapp

•Real-time - synchronous responses less than 30 seconds

•Large dataset - 100 GB (computation runs across all data)

Rails webapp•Simple document-literal web service

•Ruby - soap4r

•Java - GlassFish, Spring-WS

•Not really interesting for this talk... see Brisbane.rb

Data•Read-only

•Full control

•45 TB (became 100 GB with pre-processing)

•SQL? 3 tables, one query w/ 2 joins

Don’t want to roll our own

(Row) database good enough

And we can federate them

Result?

What about BigTable?

Column database

Result?

Where are we?

Progress•Don’t need to distribute data ⇒ no data grid

•No off the shelf solutions that scale/go fast

•Understand data better ⇒ happy to roll our own as fallback

Data solution

Data•CSV files on filesystem (now binary)

•Directories form indices

•Data files broken up into chunks

What about the code?http://giapet.net/wp-content/uploads/2007/05/luluwtf.gif

Need to distribute the computation

Options?

Erlang

Scala

Java

GridGain

GridGain•“fully open source full-stack grid computing platform for Java”

•Map/reduce-based computation

•Easy to setup and use

•Can be extended via SPI implementations

•Just works

•“Scalable” (we’ve had it up to 32 nodes)

Map/reduce

When does it work•When data is independent (pure/referentially transparent)

•When data can be combined (reduce) based solely on input

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

map

foo barbarbazquuxbarbazbar

foo:1bar:1bar:1baz:1quux:1bar:1baz:1bar:1

reducesplit

GridGain grid

Grid

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

?

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

foo barbar baz

MasterNode

Node

foo bar

Node

quux bar

bar: 2baz: 1quux: 1

MasterNode

quux barbaz bar

bar baz

foo: 1bar: 2baz: 1

baz bar

Did you say map/reduce?

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

reduce

map map

Show me the types!

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node

foo barbar baz

foo: 1bar: 2baz: 1

Node

quux barbaz bar

bar: 2baz: 1quux: 1

map[A, B](List[A], A → B) → List[B]

reduce[B, C](List[B], C, (C, B) → C) → List[C]

Terminology

Task

foo barbar bazquux barbaz bar

MasterNode

Node

Job

foo barbar baz

foo: 1bar: 2baz: 1

Node

bar: 2baz: 1quux: 1

Job

quux barbaz bar

Result

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node Node

Jobbar baz

Job

foo bar

Task

foo barbar bazquux barbaz bar

Jobquux bar

Job

baz bar

Result

foo: 1bar: 4baz: 2quux: 1

Result

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

What defines a grid?

Node

Node Node

IP MCast: 228.1.2.4

Node

Node Node

IP MCast: 228.1.2.5

Failover

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobfoo bar

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

X

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobbar baz

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

XJob

foo bar

foo: 1bar: 4baz: 2quux: 1

MasterNode

Node NodeNode Node

Jobquux bar

Jobbaz bar

Task

foo barbar bazquux barbaz bar

XX

Jobbar baz

Jobfoo bar

Task execution

http://www.gridgain.com/javadoc/org/gridgain/grid/GridTask.html

GridGain demo

The good, the bad, the ugly

Just works, fast, easy, extensible, scalable

Error messages, doco, code quality, coupling, odd APIs,

management overview

Nomenclature, JMS?

References•http://wiki.workingmouse.com/

•http://www.gridgain.com/

•http://labs.google.com/papers/mapreduce.html

Recommended