David P. Anderson Space Sciences Laboratory University of California Berkeley Public Distributed Computing with BOINC

David P. AndersonSpace Sciences Laboratory

University of California – Berkeley

Public Distributed Computingwith BOINC

Public-resource computing

Advantages:• scale• free• growth• public education• no policy issues

Challenges:• low BW at client• costly BW at server• firewall/NAT issues• sporadic connection• untrustworthy, insecure clients• server security• heterogeneity• need PR, glitzy GUI

your computers

academicbusine

ss

home PCs

Public-resource computing (cont.)

● 1 billion Internet-connected PCs in 2010● 50% privately owned● If 10% participate:

– At least 100 PetaFLOPs, 1 Exabyte (10^18) storage

public computing

Grid computingcluster

computingsupercomputing

p

CPU power,storage capacity

cost

SETI@home● Running since May 1999● ~500,000 active participants● ~60 TeraFLOPs● Problems with current software

– hard to change/add algorithms– can't share participants w/ other projects– inflexible data architecture

SETI@home data architectureideal:current:

commercialInternet

Berkeley

participants

tapesInternet2

commercialInternet

Berkeley Stanford USC

participants

50 Mbps

BOINC: Berkeley Open Infrastructure for Network Computing

● Goals for computing projects– easy/cheap to create and operate DC

projects– wide range of applications possible– no central authority

● Goals for participants– easy to participate in multiple projects– invisible use of disk, CPU, network

General structure of BOINC

● Project:

● Participant:

Scheduling server (C++)

BOINC DB(MySQL) Work

generation

data server (HTTP)

App agentApp agentApp agent

data server (HTTP)data server

(HTTP)

Web interfaces

(PHP)

Core agent (C++)

Project back endRetry

generation

Result validation

Result processing

Garbage collection

Data model● Immutable files● Replication across servers● Can originate on clients or servers● Can be retained on clients● Computations can have multiple input

and output files● Applications can consist of multiple files

Computation model● Redundant computing:

work generation assimilation

validationdistribution

canonical result

Computation model (cont.)● Scheduling

– task resource estimates (disk/mem/CPU)– soft deadlines

● Long-running tasks– trickle messages, preemption

● API– minimal (file I/O, checkpoint, graphics)

Participant features● Can register with multiple projects,

control resource allocation● Preferences

– global, per-project– edited via web interface

● Platforms: Windows, Mac OS/X, Unix/Linux

● Anonymous platform mechanism● Views

– GUI, screensaver, Windows service

Participant Credit● Goals:

– credit for work actually done (CPU, network, storage)

– don't know workunit size in advance– cheat-proof

● Integration with redundancy– claimed credit = benchmark * CPU time– granted credit = minimum claimed credit

● Handling graphics coprocessors– project-specific benchmarks

Participant web features● User profiles● Forums● Self-moderating FAQs● Teams● XML data export (3rd party statistics

reporting)

Projects● Current (at Space Sciences Lab)

– Astropulse (black hole / pulsar search)– SETI@home

● In progress– Folding@home (Stanford)– Climateprediction.net (Oxford)

● Planned– LIGO (physics)– CERN– DIMES (network performance study)

Summary and status● Public distributed computing● BOINC: a platform for PDC● BOINC is funded by NSF● Source code is free for noncommercial

use: http://boinc.berkeley.edu

Documents

David P. Anderson Space Sciences Laboratory University of California Berkeley Public Distributed Computing with BOINC