Upload
ethan-hines
View
213
Download
0
Embed Size (px)
DESCRIPTION
Public-resource computing (cont.) ● 1 billion Internet-connected PCs in 2010 ● 50% privately owned ● If 10% participate: – At least 100 PetaFLOPs, 1 Exabyte (10^18) storage public computing Grid computing cluster computing supercomputin g p CPU power, storage capacity cost
Citation preview
David P. AndersonSpace Sciences Laboratory
University of California – Berkeley
Public Distributed Computingwith BOINC
Public-resource computing
Advantages:• scale• free• growth• public education• no policy issues
Challenges:• low BW at client• costly BW at server• firewall/NAT issues• sporadic connection• untrustworthy, insecure clients• server security• heterogeneity• need PR, glitzy GUI
your computers
academicbusine
ss
home PCs
Public-resource computing (cont.)
● 1 billion Internet-connected PCs in 2010● 50% privately owned● If 10% participate:
– At least 100 PetaFLOPs, 1 Exabyte (10^18) storage
public computing
Grid computingcluster
computingsupercomputing
p
CPU power,storage capacity
cost
SETI@home● Running since May 1999● ~500,000 active participants● ~60 TeraFLOPs● Problems with current software
– hard to change/add algorithms– can't share participants w/ other projects– inflexible data architecture
SETI@home data architectureideal:current:
commercialInternet
Berkeley
participants
tapesInternet2
commercialInternet
Berkeley Stanford USC
participants
50 Mbps
BOINC: Berkeley Open Infrastructure for Network Computing
● Goals for computing projects– easy/cheap to create and operate DC
projects– wide range of applications possible– no central authority
● Goals for participants– easy to participate in multiple projects– invisible use of disk, CPU, network
General structure of BOINC
● Project:
● Participant:
Scheduling server (C++)
BOINC DB(MySQL) Work
generation
data server (HTTP)
App agentApp agentApp agent
data server (HTTP)data server
(HTTP)
Web interfaces
(PHP)
Core agent (C++)
Project back endRetry
generation
Result validation
Result processing
Garbage collection
Data model● Immutable files● Replication across servers● Can originate on clients or servers● Can be retained on clients● Computations can have multiple input
and output files● Applications can consist of multiple files
Computation model● Redundant computing:
work generation assimilation
validationdistribution
canonical result
Computation model (cont.)● Scheduling
– task resource estimates (disk/mem/CPU)– soft deadlines
● Long-running tasks– trickle messages, preemption
● API– minimal (file I/O, checkpoint, graphics)
Participant features● Can register with multiple projects,
control resource allocation● Preferences
– global, per-project– edited via web interface
● Platforms: Windows, Mac OS/X, Unix/Linux
● Anonymous platform mechanism● Views
– GUI, screensaver, Windows service
Participant Credit● Goals:
– credit for work actually done (CPU, network, storage)
– don't know workunit size in advance– cheat-proof
● Integration with redundancy– claimed credit = benchmark * CPU time– granted credit = minimum claimed credit
● Handling graphics coprocessors– project-specific benchmarks
Participant web features● User profiles● Forums● Self-moderating FAQs● Teams● XML data export (3rd party statistics
reporting)
Projects● Current (at Space Sciences Lab)
– Astropulse (black hole / pulsar search)– SETI@home
● In progress– Folding@home (Stanford)– Climateprediction.net (Oxford)
● Planned– LIGO (physics)– CERN– DIMES (network performance study)
Summary and status● Public distributed computing● BOINC: a platform for PDC● BOINC is funded by NSF● Source code is free for noncommercial
use: http://boinc.berkeley.edu