30
Grid Computing: Grid Computing: Technology and Technology and Sociology Sociology at Large Scales at Large Scales Douglas Thain Douglas Thain University of Notre Dame University of Notre Dame 5 November 2004 5 November 2004

Grid Computing: Technology and Sociology at Large Scales Douglas Thain University of Notre Dame 5 November 2004

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Grid Computing:Grid Computing:Technology and Technology and

SociologySociologyat Large Scalesat Large Scales

Douglas ThainDouglas Thain

University of Notre DameUniversity of Notre Dame

5 November 20045 November 2004

ComputingNeeds

ofBig Science

AvailableComputing

Power

Computing Research

Computing Computing Power Power

is is Everywhere!Everywhere!

The Top 500 The Top 500 SupercomputersSupercomputers

1 - Earth Simulator1 - Earth Simulator 5120 * NEC SX-6 (35860 GFLOPS)5120 * NEC SX-6 (35860 GFLOPS)

2 - Thunder2 - Thunder 4096 * Itanium Tiger (19940 GFLOPS)4096 * Itanium Tiger (19940 GFLOPS)

3 - ASCI Q3 - ASCI Q 8192 * Alpha (13880 GFLOPS)8192 * Alpha (13880 GFLOPS)

4 - IBM BlueGene/L Prototype4 - IBM BlueGene/L Prototype 8192 * PowerPC (11680 GFLOPS)8192 * PowerPC (11680 GFLOPS)

5 - NCSA Tungsten5 - NCSA Tungsten 2400 * Intel Xeon (9819 GFLOPS)2400 * Intel Xeon (9819 GFLOPS)

445 – Notre Dame BoB445 – Notre Dame BoB 212 * Intel Xeon212 * Intel Xeon

500 - “Retailer B”500 - “Retailer B” 184 * PowerPC (684 GFLOPS)184 * PowerPC (684 GFLOPS)

http://www.top500.org

The Bad NewsThe Bad News

Rag-Tag Computers are Hard to UseRag-Tag Computers are Hard to Use Differing shapes, sizes, reliability.Differing shapes, sizes, reliability. Issues of machine-user trust.Issues of machine-user trust. Have to re-write software to fit.Have to re-write software to fit.

Big Supercomputers are Big Supercomputers are AlsoAlso Hard Hard to Useto Use For exactly the same reasons!For exactly the same reasons!

The GridThe Grid

Ian Foster,Ian Foster,University of Chicago:University of Chicago:

Suppose that big Suppose that big computing computing facilities were as facilities were as easy to use as easy to use as electrical power!electrical power!

http://www.globus.org

The Grid = The Grid = InternetInternet + + FacilitiesFacilities

Is the Grid Real?Is the Grid Real?

THE GRID – not THE GRID – not yet.yet.

But, many But, many groups fairly groups fairly claim to have claim to have built A GRID built A GRID for a given for a given purpose.purpose.

Grid Computing is not Grid Computing is not Easy!Easy!

SecuritySecurity Keeping out the bad guys, identifying the good Keeping out the bad guys, identifying the good

guys.guys. PerformancePerformance

A problem of mapping the right jobs to the right A problem of mapping the right jobs to the right resources.resources.

ReliabilityReliability The Internet is not known for its 24/7 reliability.The Internet is not known for its 24/7 reliability.

AccountabilityAccountability You used 100 hours of compute time at $1000/hour!You used 100 hours of compute time at $1000/hour!

DebuggingDebugging Who is to blame when a program crashes?Who is to blame when a program crashes?

Social EffectsSocial Effects At large scales, computers have human At large scales, computers have human

problems!problems!

SetiSetihttp://setiathome.ssl.berkeley.edu/

SETI@HomeSETI@Home

Users 5,233,380

Results received 1,622,392,472

Total CPU time 2,113,893 years

Performance 68520 GFLOPS/s

The Social IssuesThe Social Issues

As a scientist, can you trust a random user?As a scientist, can you trust a random user? So, you must duplicate work units.So, you must duplicate work units.

What is the motivation to participate?What is the motivation to participate? Fame! (Not Fortune)Fame! (Not Fortune)

How do users maximize their enjoyment?How do users maximize their enjoyment? Get on the leader board in any way possible!Get on the leader board in any way possible! Virus that changes the identity of the sender.Virus that changes the identity of the sender. Hack the code to run faster. (Ollie,Microsoft)Hack the code to run faster. (Ollie,Microsoft)

NameResultsreceived

CPU time time/work unit

1) The Ministry of Serendipity  

6444222 4327 years5 hr 52 min

55.8 sec

2) Sneezy 4164390 2694 years5 hr 40 min

06.0 sec

3) Pigalak  3182980 2625 years7 hr 13 min

29.1 sec

Auditing of ResultsAuditing of Results

WorkUnit

First, I checked Galaxy 1,and it only rated a 5.

Then, I checked Galaxy 2,and it rated a 10,

so I did the more detailedexamination of the

lower quadrant,but there was no

signal there.

No aliens here.

What if youWhat if youare doing good are doing good

science,science,but it doesn’t have but it doesn’t have

aaglamorous story?glamorous story?

AMANDAAMANDA

A “Time Telescope”A “Time Telescope” Distant Cosmic SourcesDistant Cosmic Sources Neutrinos Travel FarNeutrinos Travel Far Neutrino+Earth = Neutrino+Earth =

MuonMuon Detector in IceDetector in Ice

http://amanda.berkeley.edu

Independent SimulationIndependent Simulation

How do you calibrateHow do you calibratea new measuring device?a new measuring device?

The Answer: Simulate!The Answer: Simulate!

x=123y=456

x=123y=457

x=223y=450

x=305y=904

x=123y=456

http://www.cs.wisc.edu/condor

I need some Windowsmachines in order to domy senior thesis!

I need a LOT of smallmachines for AMANDA.

I need TEN Linuxmachines for one week.

Anyone can use these machines,but ND users have priority

These machines can only be usedat night by only Jane and Betty.

MatchMaker

CondorCondor50,000 CPUs1000 sites

Social ConcernsSocial Concerns

The Owner is BOSS!The Owner is BOSS! Solution: Submit lots of independent jobs.Solution: Submit lots of independent jobs. Solution: Save your work at short intervals.Solution: Save your work at short intervals.

Users compete for popular machines.Users compete for popular machines. Solution: Program for less common Solution: Program for less common

machines.machines. Unusual Requests may be Rejected!Unusual Requests may be Rejected!

““I need a large, fast, machine that is I need a large, fast, machine that is available for one full year and isn’t in the available for one full year and isn’t in the Western hemisphere...”Western hemisphere...”

A Fundamental A Fundamental ProblemProblem

of Grid Computing:of Grid Computing:

Why Don’t You Love Why Don’t You Love Me?Me?

But There is More!But There is More!

Summary so far:Summary so far: The Grid: Computing Power on DemandThe Grid: Computing Power on Demand Big Science has Big Computing Needs.Big Science has Big Computing Needs. Key Problems are Social InteractionKey Problems are Social Interaction

But there is more:But there is more: The Grid: Bringing people and The Grid: Bringing people and

equipment together.equipment together. The Grid: Bringing lots of people The Grid: Bringing lots of people

together!together!

NEESGridNEESGridAn Earth-Shaking Grid An Earth-Shaking Grid

ApplicationApplication Simulation of earthquakes:Simulation of earthquakes:

Flexible, repeatable, cheap.Flexible, repeatable, cheap. Accurate at large scales.Accurate at large scales. Inaccurate for small objects.Inaccurate for small objects.

Physical emulation of earthquakes:Physical emulation of earthquakes: Fixed, one-time, expensive.Fixed, one-time, expensive. Perfectly reproduce small items.Perfectly reproduce small items.

http://www.neesgrid.org

Modeling a Single Door!Modeling a Single Door!

+ +

Coordinator

Interface Interface

Interface

Modeling a Single Door!Modeling a Single Door!

http://www.accessgrid.org

The Access Grid The Access Grid ExperienceExperience

Take Home MessageTake Home Message

Grid Computing is...Grid Computing is... ...harnessing many computers in order to ...harnessing many computers in order to

attack scientific problems of enormous scale.attack scientific problems of enormous scale. ...bringing large numbers of people and ...bringing large numbers of people and

resources together over long distances.resources together over long distances.

The Hardest Problem:The Hardest Problem: As computing systems grow to larger, social As computing systems grow to larger, social

issues become more important than issues become more important than technical problems.technical problems.