View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Grid Computing:Grid Computing:Technology and Technology and
SociologySociologyat Large Scalesat Large Scales
Douglas ThainDouglas Thain
University of Notre DameUniversity of Notre Dame
5 November 20045 November 2004
The Top 500 The Top 500 SupercomputersSupercomputers
1 - Earth Simulator1 - Earth Simulator 5120 * NEC SX-6 (35860 GFLOPS)5120 * NEC SX-6 (35860 GFLOPS)
2 - Thunder2 - Thunder 4096 * Itanium Tiger (19940 GFLOPS)4096 * Itanium Tiger (19940 GFLOPS)
3 - ASCI Q3 - ASCI Q 8192 * Alpha (13880 GFLOPS)8192 * Alpha (13880 GFLOPS)
4 - IBM BlueGene/L Prototype4 - IBM BlueGene/L Prototype 8192 * PowerPC (11680 GFLOPS)8192 * PowerPC (11680 GFLOPS)
5 - NCSA Tungsten5 - NCSA Tungsten 2400 * Intel Xeon (9819 GFLOPS)2400 * Intel Xeon (9819 GFLOPS)
445 – Notre Dame BoB445 – Notre Dame BoB 212 * Intel Xeon212 * Intel Xeon
500 - “Retailer B”500 - “Retailer B” 184 * PowerPC (684 GFLOPS)184 * PowerPC (684 GFLOPS)
http://www.top500.org
The Bad NewsThe Bad News
Rag-Tag Computers are Hard to UseRag-Tag Computers are Hard to Use Differing shapes, sizes, reliability.Differing shapes, sizes, reliability. Issues of machine-user trust.Issues of machine-user trust. Have to re-write software to fit.Have to re-write software to fit.
Big Supercomputers are Big Supercomputers are AlsoAlso Hard Hard to Useto Use For exactly the same reasons!For exactly the same reasons!
The GridThe Grid
Ian Foster,Ian Foster,University of Chicago:University of Chicago:
Suppose that big Suppose that big computing computing facilities were as facilities were as easy to use as easy to use as electrical power!electrical power!
http://www.globus.org
Is the Grid Real?Is the Grid Real?
THE GRID – not THE GRID – not yet.yet.
But, many But, many groups fairly groups fairly claim to have claim to have built A GRID built A GRID for a given for a given purpose.purpose.
Grid Computing is not Grid Computing is not Easy!Easy!
SecuritySecurity Keeping out the bad guys, identifying the good Keeping out the bad guys, identifying the good
guys.guys. PerformancePerformance
A problem of mapping the right jobs to the right A problem of mapping the right jobs to the right resources.resources.
ReliabilityReliability The Internet is not known for its 24/7 reliability.The Internet is not known for its 24/7 reliability.
AccountabilityAccountability You used 100 hours of compute time at $1000/hour!You used 100 hours of compute time at $1000/hour!
DebuggingDebugging Who is to blame when a program crashes?Who is to blame when a program crashes?
Social EffectsSocial Effects At large scales, computers have human At large scales, computers have human
problems!problems!
SETI@HomeSETI@Home
Users 5,233,380
Results received 1,622,392,472
Total CPU time 2,113,893 years
Performance 68520 GFLOPS/s
The Social IssuesThe Social Issues
As a scientist, can you trust a random user?As a scientist, can you trust a random user? So, you must duplicate work units.So, you must duplicate work units.
What is the motivation to participate?What is the motivation to participate? Fame! (Not Fortune)Fame! (Not Fortune)
How do users maximize their enjoyment?How do users maximize their enjoyment? Get on the leader board in any way possible!Get on the leader board in any way possible! Virus that changes the identity of the sender.Virus that changes the identity of the sender. Hack the code to run faster. (Ollie,Microsoft)Hack the code to run faster. (Ollie,Microsoft)
NameResultsreceived
CPU time time/work unit
1) The Ministry of Serendipity
6444222 4327 years5 hr 52 min
55.8 sec
2) Sneezy 4164390 2694 years5 hr 40 min
06.0 sec
3) Pigalak 3182980 2625 years7 hr 13 min
29.1 sec
Auditing of ResultsAuditing of Results
WorkUnit
First, I checked Galaxy 1,and it only rated a 5.
Then, I checked Galaxy 2,and it rated a 10,
so I did the more detailedexamination of the
lower quadrant,but there was no
signal there.
No aliens here.
What if youWhat if youare doing good are doing good
science,science,but it doesn’t have but it doesn’t have
aaglamorous story?glamorous story?
AMANDAAMANDA
A “Time Telescope”A “Time Telescope” Distant Cosmic SourcesDistant Cosmic Sources Neutrinos Travel FarNeutrinos Travel Far Neutrino+Earth = Neutrino+Earth =
MuonMuon Detector in IceDetector in Ice
http://amanda.berkeley.edu
http://www.cs.wisc.edu/condor
I need some Windowsmachines in order to domy senior thesis!
I need a LOT of smallmachines for AMANDA.
I need TEN Linuxmachines for one week.
Anyone can use these machines,but ND users have priority
These machines can only be usedat night by only Jane and Betty.
MatchMaker
Social ConcernsSocial Concerns
The Owner is BOSS!The Owner is BOSS! Solution: Submit lots of independent jobs.Solution: Submit lots of independent jobs. Solution: Save your work at short intervals.Solution: Save your work at short intervals.
Users compete for popular machines.Users compete for popular machines. Solution: Program for less common Solution: Program for less common
machines.machines. Unusual Requests may be Rejected!Unusual Requests may be Rejected!
““I need a large, fast, machine that is I need a large, fast, machine that is available for one full year and isn’t in the available for one full year and isn’t in the Western hemisphere...”Western hemisphere...”
A Fundamental A Fundamental ProblemProblem
of Grid Computing:of Grid Computing:
Why Don’t You Love Why Don’t You Love Me?Me?
But There is More!But There is More!
Summary so far:Summary so far: The Grid: Computing Power on DemandThe Grid: Computing Power on Demand Big Science has Big Computing Needs.Big Science has Big Computing Needs. Key Problems are Social InteractionKey Problems are Social Interaction
But there is more:But there is more: The Grid: Bringing people and The Grid: Bringing people and
equipment together.equipment together. The Grid: Bringing lots of people The Grid: Bringing lots of people
together!together!
NEESGridNEESGridAn Earth-Shaking Grid An Earth-Shaking Grid
ApplicationApplication Simulation of earthquakes:Simulation of earthquakes:
Flexible, repeatable, cheap.Flexible, repeatable, cheap. Accurate at large scales.Accurate at large scales. Inaccurate for small objects.Inaccurate for small objects.
Physical emulation of earthquakes:Physical emulation of earthquakes: Fixed, one-time, expensive.Fixed, one-time, expensive. Perfectly reproduce small items.Perfectly reproduce small items.
http://www.neesgrid.org
Take Home MessageTake Home Message
Grid Computing is...Grid Computing is... ...harnessing many computers in order to ...harnessing many computers in order to
attack scientific problems of enormous scale.attack scientific problems of enormous scale. ...bringing large numbers of people and ...bringing large numbers of people and
resources together over long distances.resources together over long distances.
The Hardest Problem:The Hardest Problem: As computing systems grow to larger, social As computing systems grow to larger, social
issues become more important than issues become more important than technical problems.technical problems.