Off the Grid
QJUGFebruary 2007
Nick PartridgeVeitch Lister Consulting
Tom AdamsWorkingmouse
Introduction to Grid Computing with GridGain
Why are we here?
Large distributed application
Grid-based solution worked
Flow
Grid?•Multiple independent computing clusters which act like a
"grid" (Wikipedia)
•Many nodes, each node is indistinguishable from other nodes
•Complete machines over co-located CPUs?
•Multiple processes?
•Commodity hardware?
•Homogenous machines?
A tale of two grids
Partition data across grid
Partition processing across grid
http://www.jroller.com/nivanov/entry/grid_computing_compute_grid_data
Selection
Requirements•Callable from a Rails webapp
•Real-time - synchronous responses less than 30 seconds
•Large dataset - 100 GB (computation runs across all data)
Rails webapp•Simple document-literal web service
•Ruby - soap4r
•Java - GlassFish, Spring-WS
•Not really interesting for this talk... see Brisbane.rb
Data•Read-only
•Full control
•45 TB (became 100 GB with pre-processing)
•SQL? 3 tables, one query w/ 2 joins
Don’t want to roll our own
(Row) database good enough
And we can federate them
Result?
http://battellemedia.com/archives/2007_01.php
What about BigTable?
Column database
Result?
http://failblog.wordpress.com/2008/01/29/satellite/
Where are we?
Progress•Don’t need to distribute data ⇒ no data grid
•No off the shelf solutions that scale/go fast
•Understand data better ⇒ happy to roll our own as fallback
Data solution
Data•CSV files on filesystem (now binary)
•Directories form indices
•Data files broken up into chunks
What about the code?http://giapet.net/wp-content/uploads/2007/05/luluwtf.gif
Need to distribute the computation
Options?
Erlang
Scala
Java
Java frameworks•Hadoop
•GridGain
•Oracle Coherence
•GigaSpaces
•Terracotta
•JavaSpaces/Jini
•Shoal
GridGain
GridGain•“fully open source full-stack grid computing platform for Java”
•Map/reduce-based computation
•Easy to setup and use
•Can be extended via SPI implementations
•Just works
•“Scalable” (we’ve had it up to 32 nodes)
Map/reduce
When does it work•When data is independent (pure/referentially transparent)
•When data can be combined (reduce) based solely on input
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
map
foo barbarbazquuxbarbazbar
foo:1bar:1bar:1baz:1quux:1bar:1baz:1bar:1
reducesplit
GridGain grid
Grid
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
?
Node
foo barbar baz
foo: 1bar: 2baz: 1
Node
quux barbaz bar
bar: 2baz: 1quux: 1
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node
foo barbar baz
foo: 1bar: 2baz: 1
Node
quux barbaz bar
bar: 2baz: 1quux: 1
foo barbar baz
MasterNode
Node
foo bar
Node
quux bar
bar: 2baz: 1quux: 1
MasterNode
quux barbaz bar
bar baz
foo: 1bar: 2baz: 1
baz bar
Did you say map/reduce?
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node
foo barbar baz
foo: 1bar: 2baz: 1
Node
quux barbaz bar
bar: 2baz: 1quux: 1
reduce
map map
Show me the types!
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node
foo barbar baz
foo: 1bar: 2baz: 1
Node
quux barbaz bar
bar: 2baz: 1quux: 1
map[A, B](List[A], A → B) → List[B]
reduce[B, C](List[B], C, (C, B) → C) → List[C]
Terminology
Task
foo barbar bazquux barbaz bar
MasterNode
Node
Job
foo barbar baz
foo: 1bar: 2baz: 1
Node
bar: 2baz: 1quux: 1
Job
quux barbaz bar
Result
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node Node
Jobbar baz
Job
foo bar
Task
foo barbar bazquux barbaz bar
Jobquux bar
Job
baz bar
Result
foo: 1bar: 4baz: 2quux: 1
Result
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node NodeNode Node
Jobbar baz
Jobfoo bar
Jobquux bar
Jobbaz bar
Task
foo barbar bazquux barbaz bar
What defines a grid?
Node
Node Node
IP MCast: 228.1.2.4
Node
Node Node
IP MCast: 228.1.2.5
Failover
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node NodeNode Node
Jobbar baz
Jobfoo bar
Jobquux bar
Jobbaz bar
Task
foo barbar bazquux barbaz bar
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node NodeNode Node
Jobbar baz
Jobfoo bar
Jobquux bar
Jobbaz bar
Task
foo barbar bazquux barbaz bar
X
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node NodeNode Node
Jobbar baz
Jobquux bar
Jobbaz bar
Task
foo barbar bazquux barbaz bar
XJob
foo bar
foo: 1bar: 4baz: 2quux: 1
MasterNode
Node NodeNode Node
Jobquux bar
Jobbaz bar
Task
foo barbar bazquux barbaz bar
XX
Jobbar baz
Jobfoo bar
Task execution
http://www.gridgain.com/javadoc/org/gridgain/grid/GridTask.html
GridGain demo
The good, the bad, the ugly
Just works, fast, easy, extensible, scalable
Error messages, doco, code quality, coupling, odd APIs,
management overview
Nomenclature, JMS?
References•http://wiki.workingmouse.com/
•http://www.gridgain.com/
•http://labs.google.com/papers/mapreduce.html