Upload
ian-foster
View
2.658
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A talk given in January 2012 at a wonderful conference organized in Zakopane, Poland, by colleagues from the erstwhile GridLab project. I talked about how increasing data volumes demand radically new approaches to delivering research computing. Lively discussion ensued.
Citation preview
www.ci.anl.govwww.ci.uchicago.edu
Rethinking how we provide science IT in an era of
massive data but modest budgets
Ian Foster
www.ci.anl.govwww.ci.uchicago.edu
2
Exploding data volumes in biology
x107 in 14 years
www.ci.anl.govwww.ci.uchicago.edu
3
Exploding data volumes in astronomy
100,000 TB
MACHO et al.: 1 TB
Palomar: 3 TB2MASS: 10 TBGALEX: 30 TBSloan: 40 TB
Pan-STARRS: 40,000 TB
www.ci.anl.govwww.ci.uchicago.edu
4
Exploding data volumes in climate science
Climate model intercomparisonproject (CMIP) of the IPCC
2004: 36 TB
2012: 2,300 TB
www.ci.anl.govwww.ci.uchicago.edu
5
The challenge of staying competitive
"Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”
"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
www.ci.anl.govwww.ci.uchicago.edu
6
Ways of running faster (1)
Enhance human capabilities
Civilization advances by extending the number of important operations which we can perform without thinking about them
Alfred North Whitehead, 1911
www.ci.anl.govwww.ci.uchicago.edu
7
Ways of running faster (2)
Enhance human capabilities
Outsource automatable
tasks
Utility computing“[t]he computing utility could become the basis for a new and important industry” – McCarthy, 1960
Grid computing“provide access to computing on demand” – The Grid: Blueprint for a New Computing Inf., 1999
Cloud computing“delivery of computing as a service rather than a product” [Wikipedia, 2012]
www.ci.anl.govwww.ci.uchicago.edu
8
Ways of running faster (3)
Enhance human capabilities
Join forceswith others
Collaboratories, P2P, crowdsourcing
Virtual organizations“flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources”, Anatomy of Grid, 2001
Outsource automatable
tasks
www.ci.anl.govwww.ci.uchicago.edu
9
Big science has been keeping up
LIGO: 1 PB data in last science run, distributed worldwide
ESG: 1.2 PB climate datadelivered to 23,000 users; 600+ pubs
OSG: 1.4M CPU-hours/day, >90 sites, >3000 users, >260 pubs in 2010
Robust production solutionsSubstantial teams and expenseSustained, multi-year effortApplication-specific solutions, built on common technology
www.ci.anl.govwww.ci.uchicago.edu
10
But small science is struggling
More data, more complex dataAd-hoc solutionsInadequate software, hardwareData plan mandates
www.ci.anl.govwww.ci.uchicago.edu
11
Medium science struggles too
• Dark Energy Survey receives 100,000 files each night in Illinois
• They transmit files to Texas for analysis … then move results back to Illinois
• Process must be reliable, routine, and efficient
• The IT team is not large Image credit: Roger Smith/NOAO/AURA/NSF
Blanco 4m on Cerro Tololo
www.ci.anl.govwww.ci.uchicago.edu
12
Science IT crisis demands new approaches
• We have exceptional infrastructure for the 1% (e.g., supercomputers, LHC, …)
• But not for the 99% (e.g., the vast majority of the 1.8M publicly funded researchers in the EU)
We need new approaches to providing science IT, that:— Reduce barriers to entry— Are cheaper— Are sustainable
www.ci.anl.govwww.ci.uchicago.edu
13
You can run a company from a coffee shop
www.ci.anl.govwww.ci.uchicago.edu
14
Because businesses outsource their IT
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt
Software as a Service
(SaaS)
www.ci.anl.govwww.ci.uchicago.edu
15
And often their large-scale computing too
Web presence Email (hosted Exchange) Calendar Telephony (hosted VOIP) Human resources and payroll Accounting Customer relationship mgmt Data analytics Content distribution
Infrastructure as a Service
(IaaS)
Software as a Service
(SaaS)
Consumers also outsource much of their IT
www.ci.anl.govwww.ci.uchicago.edu
17
Let’s rethink how we provide research IT
Accelerate discovery and innovation worldwide by providing research IT as a service
Leverage software-as-a-service to• provide millions of researchers with
unprecedented access to powerful tools; • enable a massive shortening of cycle times in
time-consuming research processes; and• reduce research IT costs dramatically via
economies of scale—and address sustainability?
www.ci.anl.govwww.ci.uchicago.edu
18
Also address administrative costs?
42% of the time spent by an average PI on a federally funded research project was reported to be expended on administrative tasks related to that project rather than on research — Federal Demonstration Partnership faculty burden survey, 2007
www.ci.anl.govwww.ci.uchicago.edu
19
Time-consuming tasks in science
• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
www.ci.anl.govwww.ci.uchicago.edu
20
Time-consuming tasks in science
• Run experiments• Collect data• Manage data• Move data• Acquire computers• Analyze data• Run simulations• Compare experiment
with simulation• Search the literature
• Communicate with colleagues
• Publish papers• Find, configure, install
relevant software• Find, access, analyze
relevant data• Order supplies• Write proposals• Write reports• …
www.ci.anl.govwww.ci.uchicago.edu
21
Scientific data delivery, 2012
• “[A] majority of users at BES facilities … physically transport data to a home institution using portable media … data volumes are going to increase significantly in the next few years (to 70 TB/day or more) – data must be transferred over the network”
• “the effectiveness of data transfer middleware [is] not just on the transfer speed, but also the time and interruption to other work required to supervise and check on the success of large data transfers”
• “It took two weeks and email traffic between network specialists at NERSC and ORNL, sys-admins at NERSC, … and combustion staff at ORNL and SNL to move 10 TB from NERSC to ORNL”
[ESNet Network Requirements Workshops, 2007-2010]
Major usability, productivity, performance problems
1980
www.ci.anl.govwww.ci.uchicago.edu
22
The challenge: Moving big data easily
What should be trivial …
… can be painfully tedious and time-consuming
“I need my data over there – at my _____”
( supercomputing center, campus server,
etc.)
Data Source
Data Destination
! Config issues
! Unexpected failure = manual retry
Data Source
Data Destination
“GAAAH!
%&@#&” ! Firewall issues
GO PICTURE
www.ci.anl.govwww.ci.uchicago.edu
24
GO-Transfer: Data transfer as SaaS• Reliable file transfer.
– Easy “fire-and-forget” transfers– Automatic fault recovery– High performance– Across multiple security domains
• No IT required.– Software as a Service (SaaS)
o No client software installationo New features automatically available
– Consolidated support & troubleshooting– Works with existing GridFTP servers– Globus Connect solves “last mile problem”
GO-Transfer is the initial offering of the US National Science Foundation’s XSEDE User Access Services (XUAS)
www.ci.anl.govwww.ci.uchicago.edu
26
Statistics and user feedback
• Launched November 2010>3500 users registered>2500 TB user data moved>130 million user files moved>300 endpoints registered
• Widely used on TeraGrid/XSEDE; other centers & facilities; internationally
• >20x faster than SCP• Comparable to hand-tuned
“Last time I needed to fetch 100,000 files from NERSC, a graduate student babysat the process for a month.”
“I expected to spend four weeks writing code to manage my data transfers; with Globus Online, I was up and running in five minutes.”
“Transferred my data in 20 minutes instead of 61 hours. Makes these global climate simulations manageable.”
www.ci.anl.govwww.ci.uchicago.edu
27
Common research data management steps
• Dark Energy Survey• Galaxy genomics• LIGO observatory
• SBGrid structural biology consortium• NCAR climate data applications• Land use change; economics
www.ci.anl.govwww.ci.uchicago.edu
28
Towards “research IT as a service”
www.ci.anl.govwww.ci.uchicago.edu
29
Research data management as a service• GO-User
– Credentials and other profile information
• GO-Transfer– Data movement
• GO-Team– Group membership
• GO-Collaborate– Connect to collaborative
tools: Jira, Confluence, …
• GO-Store– Access to campus, cloud,
XSEDE storage• GO-Catalog
– On-demand metadata catalogs
• GO-Compute– Access to computers
• GO-Galaxy– Share, create, run
workflows
Today
Beta
Prototype
www.ci.anl.govwww.ci.uchicago.edu30
Collaboration Management
www.ci.anl.govwww.ci.uchicago.edu
32
Other innovative science SaaS projects
www.ci.anl.govwww.ci.uchicago.edu
33
Other innovative science SaaS projects
www.ci.anl.govwww.ci.uchicago.edu
34
Other innovative science SaaS projects
www.ci.anl.govwww.ci.uchicago.edu
35
Other innovative science SaaS projects
www.ci.anl.govwww.ci.uchicago.edu
36
SaaS economics: A quick tutorial
• Lower per-user cost (x10 or more?) via aggregation onto common infrastructure
• Initial “cost trough” due to fixed costs
• Per-user revenue permits positive return to scale
• Further reduce per-user cost over time
$
Time0
Lower per-user costs suggest new approaches to sustainability
www.ci.anl.govwww.ci.uchicago.edu
37
A 21st C science IT infrastructure strategy
LL
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LL
L
LP P P P
Research data management Collaboration, computationResearch administration
• To providemore capability formore people at less cost …
• Create infrastructure – Robust and universal– Economies of scale– Positive returns to scale
• Via the creative use of– Aggregation (“cloud”)– Federation (“grid”)
Small and medium laboratories and projects
aaS
P
www.ci.anl.govwww.ci.uchicago.edu
38
Acknowledgments
• Colleagues at UChicago and Argonne– Steve Tuecke, Ravi Madduri, Kyle Chard, Tanu Malik,
Rachana Ananthakrisnan, Raj Kettimuthu, and others listed at www.globusonline.org/about/goteam/
• Carl Kesselman and other colleagues at other institutions
• Participants in the recent ICiS workshop on “Human-Computer Symbiosis: 50 Years On”
• NSF OCI and MPS; DOE ASCR; NIH for support
www.ci.anl.govwww.ci.uchicago.edu
39
For more information
• www.globusonline.org; Twitter: @globusonline• Foster, I. Globus Online: Accelerating and
democratizing science through cloud-based services. IEEE Internet Computing(May/June):70-73, 2011.
• Allen, B., Bresnahan, J., Childers, L., Foster, I., Kandaswamy, G., Kettimuthu, R., Kordas, J., Link, M., Martin, S., Pickett, K. and Tuecke, S. Software as a Service for Data Scientists. Communications of the ACM, Feb, 2012.
www.ci.anl.govwww.ci.uchicago.edu
Thank you!
www.globusonline.orgTwitter: @globusonline, @ianfoster