44
Data, Data Everywhere Why We Need Broadband Connectivity By Ruzena Bajcsy

Data, Data Everywhere Why We Need Broadband Connectivity By Ruzena Bajcsy

Embed Size (px)

Citation preview

Data, Data Everywhere

Why We Need Broadband Connectivity

By Ruzena Bajcsy

Who Generates the Data?

• Astronomers• Biologists• High Energy Physicists• Geophysicists• Archeologists and Anthropologists• Psychologists• Engineers• Artists

A Year of Innovation and A Year of Innovation and AccomplishmentAccomplishment

UC Santa CruzUC Santa Cruz

Center for Information Technology Research Center for Information Technology Research in the Interest of Societyin the Interest of Society

Solving SocietSolving Societal-Scale Problemsal-Scale Problems

Energy Conservation Emergency Response and

Homeland Defense Transportation Efficiency

Solving Societal-Scale ProblemsSolving Societal-Scale Problems

Monitoring Health Care Land and Environment Education

Societal-Scale SystemsSocietal-Scale Systems

““Client”Client”

““Server”Server”

Clusters

Massive Cluster

Gigabit Ethernet

Secure, non-stop utilitySecure, non-stop utilityDiverse componentsDiverse componentsAdapts to interfaces/usersAdapts to interfaces/usersAlways connectedAlways connected

MEMSMEMSSensorsSensors

Scalable, Reliable,Scalable, Reliable,Secure ServicesSecure Services

InformationInformationAppliancesAppliances

February 2000February 2001

February 2002August 2001

$8,000 each$8,000 each

Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: Before CITRISBefore CITRIS

Seismic Monitoring of Buildings: Seismic Monitoring of Buildings: With CITRIS Wireless MotesWith CITRIS Wireless Motes

$70 each$70 each

Ad-hoc sensor networks work• 29 Palms Marine Base, March 2001

– 10 Motes dropped from an airplane landed, formed a wireless network, detected passing vehicles, and radioed information back

• Intel Developers Forum, Aug 2001– 800 Motes running TinyOS hidden

in auditorium seats started up and formed a wireless network as participants passed them around

• tinyos.millennium.berkeley.edu

Recent Progress:

Energy Efficiency and

Smart Buildings

Arens, Culler, Pister, Orens, Rabaey, Sastry

The Inelasticity of California’s Electrical Supply

0

100

200

300

400

500

600

700

800

20000 25000 30000 35000 40000 45000MW

$/M

Wh

Power-exchange market price for electricity versus load(California, Summer 2000)

How to Address the Inelasticity of the Supply

• Spread demand over time (or reduce peak)– Make cost of energy

• visible to end-user• function of load curve (e.g. hourly pricing)

– “demand-response” approach

• Reduce average demand (demand side)– Eliminate wasteful consumption– Improve efficiency of equipment and appliances

• Improve efficiency of generation and distribution network (supply side)

Enabled by Information!

Energy Consumption in Buildings (US 1997)

End Use Residential Commercial

Space heating 6.7 2.0

Space cooling 1.5 1.1

Water heating 2.7 0.9

Refrigerator/Freezer 1.7 0.6

Lighting 1.1 3.8

Cooking 0.6 -

Clothes dryers 0.6 -

Color TVs 0.8 -

Ventilation/Furnace fans 0.4 0.6

Office equipment - 1.4

Miscellaneous 3.0 4.9

Total 19.0 15.2

Source: Interlaboratory Working Group, 2000

(Units: quads per year = 1.05 EJ y-1)

A Three-Phase Approach• Phase 1: Passive Monitoring

– The availability of cheap, connected (wired or wireless) sensors makes it possible for the end-user to monitor energy-usage of buildings and individual appliances and act there-on.

– Primary feedback on usage– Monitor health of the system (30% inefficiency!)

• Phase 2: Quasi-Active Monitoring and Control– Combining the monitoring information with instantaneous feedback on the cost of usage

closes the feedback loop between end-user and supplier.

• Phase 3: Active Energy-Management through Feedback and Control—Smart Buildings and Intelligent Appliances– Adding instantaneous and distributed control functionality to the sensoring and

monitoring functions increases energy efficiency and user comfort

Cory Hall Energy Monitoring Network

50 nodes on 450 nodes on 4thth floor floor30 sec sampling30 sec sampling250K samples to database over 6 weeks250K samples to database over 6 weeksMoved to Intel Lab – come play!Moved to Intel Lab – come play!

Smart Buildings

Dense wireless network of Dense wireless network of sensor, control, andsensor, control, and

actuator nodesactuator nodes

• Task/ambient conditioning systems allow conditioning in small, localized zones, to be individually controlled by building occupants and environmental conditions

• Joint projects among BWRC/BSAC, Center for the Built Environment (CBE), IEOR, Intel Lab, LBNL

Control of HVAC systemsUnderfloor Air DistributionConventional Overhead System

Control of HVAC Systems

• Underfloor system can save energy because it can get hotter near ceiling

• Project with CBE (Arens, Federspiel)• Need temperature sensors at different heights• Simulation results

– Hot August day in Sacramento

– Underfloor HVAC saves 46% of energy

• Future: test in instrumented room

More sensors – air velocity

• Uses time of flight of sound to determine 3D air velocity

• Significance – Heat transfer (energy)– Air quality– Perception of temperature

Smart Dust Goes NationalSmart Dust Goes National Academia: UCSD, UCLA, USC, MIT,

Rutgers, Dartmouth, U. Illinois UC, NCSA, U. Virginia, U. Washington, Ohio State

Industry: Intel, Crossbow, Bosch, Accenture, Mitre, Xerox PARC, Kestrel

Government: National Center of Supercomputing, Wright Patterson AFB

Why Broadband Connectivity When Memory Is So Cheap?

• Because users want to interact with the data in real time

• Users need to access the data at the right time and at the right place

• They need to access data in the right format

• They want the right amount of data

Examples

• Distributed computation

• Cluster technology

• The Berkeley Millenium Project

Cluster Counts

• NOW (circa 1994) 4proc HP ->36proc SPARC10 ->100proc Ultra1

• Millennium Central Cluster (Intel Donation)– 99 Dell 2300/6400/6450 Xeon Dual/Quad: 332 processors– Total: 211GB memory, 3TB disk– Myrinet 2000 + 1000Mb fiber ethernet

• OceanStore/ROC cluster, Astro cluster, Math cluster, Cory cluster, more

• CITRIS Pilot Cluster : 3/2002 deployment (Intel Donation)– 4 Dell Precision 730 Itanium Duals: 8 processors– Total: 20GB memory, 128GB disk– Myrinet 2000 + 1000Mb copper ethernet

Current Network

CITRIS Network Rollout

Network Rollout

• Millennium Cluster– Keep existing Nortel 1200/1100/8600– New Foundry FastIron 1500

• CITRIS Cluster– New Foundry FastIron 1500

• Backbone– 2 Foundry BigIron 8000

• Cost of expansion $280K (SimMillennium)

Millennium Cluster Tools

• Rootstock Installation

• Ganglia Cluster Monitoring

• gEXEC – remote execution/load balancing

• Pcp – parallel copying/job staging

All in production, open source, cluster community development on sourceforge.net

Rootstock Installation Tool

• Installation configuration stored centrally

• Build local cluster specific root from central root

• Install/reinstall cluster nodes from local rootstock

• http://rootstock.millennium.berkeley.edu/

• Has become basis for http://rocks.npaci.edu/ cluster distribution.

Ganglia Monitoring

• Coherent distributed hash of cluster information– Static: cpu speed, total memory, software versions, boottime, upgradetime etc.– Dynamic: load, cpu idle, memory available, system clock, etc.– Heartbeat– Customizable with simple API for any other metric

• Data is exchanged in well defined XML and XDR• Lightweight – small memory footprint and minimal communication

(tunable).• Scalable – tested on several 512+ node clusters• Trusted hosts - feature allows clusters of clusters to be linked within a single

monitoring and execution domain.• Ported to Linux, FreeBSD, Solaris, AIX, and IRIX, +active development

by community for other ports• Dell Open Cluster Group seriously evaluating this as basis for their cluster

computing tool distribution. “The only monitoring that scales over 64 nodes”

gEXEC – remote execution

• History– Glunix from NOW– rEXEC from Millennium– gEXEC UCB/CalTech collaboration

• Lightweight – minimal number of threads on frontend + fanout• Decentralized – no central point of failure• Fault tolerant – fallback ability + failure checks at runtime• Interactive – feels like a single machine• Load balanced from Ganglia Monitoring data • Scalable to at least 512 nodes.• Unix authorization plus cluster keys

e.g. gexec –n 3 hostnamegexec –n 0 render –in input.${VNN} –out output.${VNN}

Pcp – parallel copy

• Newest addition to cluster suite

• Fanout copy of files/directories to nodes

• Scalable

• Used for job staging

• Future of this tool is to wrap it up as an option into gEXEC.

• Centre National De La Recherche Scientifique http://www.in2p3.fr• SDSC http://www.sdsc.edu• IE&M http://iew3.technion.ac.il/• GMX http://www.gmx.fr• CAS, Chemical Abstracts Service http://www.cas.org• Keldysh Institute of Applied Mathematics (Russia) http://www.kiam1.rssi.ru• LUCIE (Linux Universal Config. & Install Engine) http://matsu-

www.is.titech.ac.jp/~takamiya/lucie/• Mellanox Technologies http://www.mellanox.co.il/• TerraSoft Solutions (PowerPC Linux) http://terraplex.com/tss_about.shtml• Intel http://www.intel.com/• BellSouth Internet Services http://services.bellsouth.net/external/• ArrayNetworks http://www.clickarray.com/• MandrakeSoft http://www.mandrakesoft.com• Technische Universitat Graz http://www.TUGraz.at/• GeoCrawler http://www.geocrawler.com/• Cray http://www.cray.com/• Unlimited Scale http://www.unlimitedscale.com/• UCSF Computer Science http://cs.usfca.edu/• RoadRunner http://www.houston.rr.com• Veritas Geophysical Integrity http://www.veritasdgc.com• Dow http://www.dow.com/• The Max Planck Society for the Advancement of Science http://www.mpg.de• Lockheed Martin http://www.lockheedmartin.com• Duke University http://www.duke.edu• Framestore Computer Film Company http://www.framestore-cfc.com• nVidia http://www.nvidia.com/• SAIC http://www.saic.com• Paralogic http://www.plogic.com/• Singapore Computer Systems Limited http://www.scs.com.sg/• Hughes Network Solutions http://www.hns.com• University of Washington, Computer Science http://www.cs.washington.edu• Experian http://www.experian.com• L'Universite de Geneva http://www.unige.ch• Purdue Physics Department http://www.physics.purdue.edu/• Atos Origin Engineering Services http://www.aoes.nl/• Teraport http://www.teraport.se• Daresbury Laboratory http://www.dl.ac.uk

• Clinica Sierra Vista http://www.clinicasierravista.org• LondonTown http://www.londontown.com/• National Hellenic Research Foundation http://www.eie.gr• RightNow Techologies http://www.rightnow.com/• Idaho National Engineering and Environmental Laboratory http://www.inel.gov• WesternGeco http://www.westerngeco.com• 80/20 Software Tools http://rc.explosive.net• Optiglobe Brazil http://www.optiglobe.com.br• Brunel University http://www.brunel.ac.uk• Cinvestav Instituto Politecnico Nacional http://www.ira.cinvestav.mx• Conexant http://www.hotrail.com• Dell http://www.dell.com/• SuSE Linux http://www.suse.de• Arabic on Linux http://www.planux.com• Delgado Community College, New Orleans http://www.dcc.edu• Boeing http://www.boeing.com• RedHat http://www.redhat.com/• University of Pisa, Italy http://www.df.unipi.it• Ecole Normale Superieure De Lyon http://www.ens-lyon.fr• iMedium http://www.imedium.com• Moving Picture Company http://www.moving-picture.com• Professional Service Super Computers http://www.pssclabs.com• AlgoNomics http://www.algonomics.com• Ocimum Biosolutions http://www.ocimumbio.com• Caltech http://www.caltech.edu• VitalStream http://www.publichost.com• Sandia National Laboratory http://www.sandia.gov/• UC Irvine http://www.uci.edu• Guide Corporation http://www.guidecorp.com/• Matav http://www.matav.hu• Math Tech, Denmark http://www.math-tech.dk• Istituto Trentino Di Cultura http://www.itc.it• Compaq http://www.compaq.com/• National Research Council Canada http://www.nrc.ca• Overture http://www.overture.com• Petroleum Geo-Services http://www.pgs.com• National Research Laboratory of the US Navy http://www.nrl.navy.mil• White Oak Technologies, Inc. http://www.woti.com/

Known Sites Using Ganglia Cluster ToolkitMost popular cluster and distributed computing software on sourceforge.netOver 7000 downloads since release of 1/2002

Grid computing

• Working with key cluster software developers from research and industry to standardize cluster tools within the Global Grid Forum (GGF).

CITRIS Cluster

• Goal is to build a production level cluster environment that supports and is driven by CITRIS applications– NOW mostly experimental– Millennium ½ developmental ½ production

• Clusters adopted as primary compute platform– ~800 current Millennium users– 65% average CPU utilization on Millennium cluster,

many times 100% utilization– 50% of top 20 PACI users compute on Linux clusters for

development and production runs.

Foundry8000

1TFlop 1.6TB memory100 Dual ItaniumCompute Nodes

10 Storage Nodes

50TB Fibre ChannelStorage

Myrinet2000

Foundry8000

Foundry1500

CampusCore

100

10

100

10

10

1 Gigabit Ethernet

Myrinet

Fibre Channel

2 Frontend Nodes2

2

Steve Brenner ProjectLarge Molecular Sequence and

Structure Databases

• These databases are in gigabytes• They provide web services in which low latency is

important• They often work remotely• The campus 70Mbit limit is increasingly saturated,

making it impossible to effectively provide services and do the work

• They need tele/video conferencing over IP

Background of the Brain Imaging Center at Berkeley• Campus-wide resource dedicated to Functional

Magnetic Resonance Imaging (FMRI) research• Non-invasive “neuroimaging” technique used to

investigate the blood flow correlates of neural activity

• BIC houses a Varian 4 Tesla scanner and Neuroimaging Computational Facility providing collaboration among neuroscientists, physicists, chemists, statisticians, ee and cs scientists

Currant LAN

• Due to high volume of data, we established high speed connections between computers in buildings around the campus

• LAN consists of two Cisco Catalyst 6500 switches connected with optic fiber and communicate at Gigabit Ethernet speed

• Workstations connected to network at Fast Ethernet speed (100 Mbits/sec, full duplex)

WAN Needs

• Geographically distributed collaborative researchers and immense data sets make high speed networking a priority.

• Collaborations exist between researchers at UCSD, UCSF, UC Davis, Stanford, Varian Inc. and NASA Ames.

• With spiral imaging, we will soon be capable of generating data in excess of 1MB/s per scanner

NASDAQ vs. O'Reilly Tech Book Sales at Amazon January 1, 1999 through September 30, 2001

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Date

No

rmal

ized

Un

its

So

ld V

alu

e

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized O'Reilly Unit Sales atAmazon

Normalized NASDAQ Index Value

CITRIS Network in Smart ClassroomCITRIS Network in Smart Classroom