Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
SPARC 2 ConsultationsJanuary-February 2016
1
Outline
● Introduction to Compute Canada
● SPARC 2 Consultation Context
● Capital Deployment Plan
● Services Plan
● Access and Allocation Policies (RAC, etc.)
● Discussion
2
Introduction to Compute Canada
3
Compute Canada (CC)An Effective Provider of Essential Digital Research Infrastructure
Compute Canada, working through a federated partnership with regional organizations ACENET, Calcul Québec, Compute Ontario and WestGrid, leads the acceleration of research and innovation by deploying advanced research computing (ARC) systems, storage and software solutions.
CC is a not-for-profit corporation. The membership includes most of Canada’s major research universities. CC acts as a steward of Canada’s ARC platform:● Compute and storage resources, data centres● Team of ~200 experts in utilization of ARC for research● 100s of research software packages ● Cloud compute and storage (openstack, owncloud)● National services
CC is a proud ambassador for Canadian excellence in advanced research computing nationally and internationally.
4
Canada’s ARC Platform Today & TomorrowA Distributed Partnership
Distributed Across Canada today50 Systems27 Data Centres200,000 cores, 2 Pflops, 20 PB200 Experts
Consolidation & Concentration by 20185-10 Data Centres300,000 cores, 12 Pflops, 50+ PB (Challenge 2)200 Experts
Continued Investment RequiredFor Canadian Science to Compete Globally CANARIE and regional
Networks
5
Services
Member locations and new national hosting sites
Services Too...
7
8
Access and Allocations
● All Canadian faculty members have access to Compute Canada systems and can sponsor others in their name.
● Each system has resources set aside for users with “default priority”. No special vetting or application process required.
● Researchers with larger needs can apply to two different resource allocation competitions:○ RAC: 1-year, mostly individual faculty members○ RPP: up to 3-years, platforms and portals, shared datasets
● Storage is a dedicated allocation. Compute is a priority allocation.
● Allocation decisions made based on peer review.
9
Serving Researchers in all Disciplines
10
The Funding ModelFigures for 2014-2015
Roughly $30M/year operating in 2014/15Partner funding model ensures alignment of objectives. Capital and operating funded with the same model:
● 40% funded through the Canada Foundation for Innovation (MSI programme for operations, Cyberinfrastructure for capital)
● 60% from Universities, Provinces, other sources
National leadership ensures strategic focus and accountability
SPARC 2 Consultation Context
11
Current Status - New Systems Coming● Compute Canada received good news from CFI in July 2015.
$30M in new infrastructure investments ($75M total project cost)!
● Some RFPs are already issued, new equipment is coming. New major systems to be deployed this year.
● However, many existing systems nearing (or past!) end-of-life.
● 2016-17 is about commissioning new systems while decommissioning old systems.
● Systems will be more powerful, # cores will not rise significantly.
● Storage capacity will increase dramatically.
12
Current Status - Times are Tight● Demand continues to grow. 2016 competition just completed:
○ 366 applications○ 16% increase in CPU ask (after correction)○ 34% increase in storage ask (after correction)○ 123% increase in GPU ask (after correction)
● New storage is coming soon, granted some delayed allocations.
● 42 projects (13%) that requested compute allocations were not awarded any compute allocation. 4% last year. (note: all are funded researchers)
● Average award:○ 57% of compute request (65% last year, 84% in 2012)○ 82% of storage request○ 19% of GPU request
● The 2017 competition will also be tough.13
Funding Opportunities - 2016 and beyond
● Operating:○ Current operations funding (CFI MSI) expires March 31, 2017○ CC (through Western University) has submitted an NOI for the next
competition 2017-2022.○ Full CC MSI proposal due May 20, 2016
● Capital:○ Currently purchasing infrastructure through CFI Cyberinfrastructure
Initiative - Challenge-2, Stage-1. Expect to be fully deployed by end of 2017.
○ Expect to be given opportunity to apply for additional capital funds in conjunction with MSI renewal proposal - May 20, 2016.
○ Expect additional capital funding opportunities in connection with mid-term report on next MSI (likely required by spring 2020)
The next 3-4 months are critical for planning Canada’s ARC future through 2022!
14
Ways to Provide Feedbackwww.computecanada.ca/sparc2/
● In person: ○ Speak up in this meeting!○ Virtual - video conferenced consultations (Feb. 3, 22 in English)
● Via a White Paper
● Via a brief (5 minute) survey:○ www.surveymonkey.com/r/V59ZDGV
● Via email (any time):○ [email protected]
Note: 2014 White Paper responses from 20+ disciplinary organizations, universities and individuals had a strong influence on current technology plan.
15
White Papers● Updates to 2014 SPARC v1 White Papers welcome!
● Introduction to your disciplinary use of ARC
● Status quo for utilization of current resources
● What challenges have you encountered with your use of the ARC that Compute Canada provides?
● What are your anticipated resource needs into the future (ideally, through 2022):○ Computation○ Storage○ Services○ Support
● What are some of the new technologies, services, support, etc., that you would like Compute Canada to investigate or provide? On what timeline?
16
White Papers - Guide Included on Website
17
SPARC Surveywww.surveymonkey.com/r/V59ZDGV
18
Technology Deployment Plan
19
Capital Planning Timeline
● CFI Challenge-2 Stage-1 (announced)○ $30M CFI investment announced, July 2015○ 2015: National Data Infrastructure RFP launched; deployment in 2016○ 2016: 3 new systems to be deployed○ 2017: 1 new system to be deployed, potentially 2 systems upgraded○ April 1, 2018 - spending complete
● CFI Challenge-2 Stage-2 (assumed for planning purposes)○ Deadline May 20, 2016. Decision September 2016○ Site selection process underway now.○ 2017: first purchases○ April 1, 2020 - spending complete
● CFI Challenge-2, Stage-3 (assumed for planning purposes)○ Coincident with MSI mid-term review - 2019/2020○ First spend in 2020/2021 (roughly replacement timeline for stage-1
purchases)
20
Capital Deployment Plan 2016-17www.computecanada.ca/wp-content/uploads/2015/11/Compute-Canada-Technology-Briefing-2015.pdf
21
● CC submitted a capital proposal to CFI in April 2015, including an investment plan for four national sites.
● Key components:○ Addresses pressing and urgent needs as older systems are
defunded○ Concentrated investment in 4 large sites, national procurement
process○ National Storage Architecture (60+PB of new storage)○ Greatly expanded cloud (OpenStack) capacity○ Greatly expanded accelerator (GPU) capacity○ Some heterogeneous systems with large memory (1TB+) nodes
Note: In parallel, CFI has run a Challenge-1 competition. The investments in the CC capital deployment plan include infrastructure and tool development designed to support those projects.
Capital Deployment Plan 2016-17www.computecanada.ca/wp-content/uploads/2015/11/Compute-Canada-Technology-Briefing-2015.pdf
22
Note: over the same time period we will be decommissioning an existing 82,000 CPU cores and a large fraction of existing disk storage.
Capital Deployment Plan 2016-17www.computecanada.ca/wp-content/uploads/2015/11/Compute-Canada-Technology-Briefing-2015.pdf
23
Capital Plan 2017-19 (Stage 2)
24
● The capital plan for Stage 2 will be built between now and May 20, 2016.
● CFI expected to require CC to propose 3 different technology options, with science justifications for each.
● Expectations:○ Addition of 1-3 new national sites○ Expansion of some existing national sites○ Expansion of national storage infrastructure
● Decisions need to be made:○ Balance of Large Parallel, General Purpose and Cloud?○ Emphasis on new architectures?○ Emphasis on accelerators?○ Memory per node?○ Services - Databases, storage platforms, private networks?
Services Plan
25
Compute Canada Services - Middleware
26
● We are service providers, not just infrastructure providers.
● The CC user base is broadening, bringing a broader set of needs.
● We have seen tremendous interest in services enabling Research Data Management (RDM)
● Through Challenge-1 and our Research Platforms and Portals competitions we have identified an additional list of middleware services CC will implement in common across our sites:○ Authentication and ID Management○ Data Transfer○ Software Distribution○ Monitoring (system status)○ Resource publishing (capacity available)
CC Services - Disciplinary Support
27
● Compute Canada expert research support is built around excellent local services - experts on your campus.
● In 2015 we augmented this through creation of our first national disciplinary support team - in digital humanities.
● Disciplinary support teams:○ encourage sharing of best practices across the country○ work on discipline specific documentation○ perform outreach to Canadian practitioners○ identify weakness in the support model or infrastructure plan with
respect to each disciplinary group
● We are happy to take feedback on where you think more support is needed:○ Should we create a new team in a certain area?○ Should the list of responsibilities above (per team) be expanded?
CC Services - Research Support
28
● Currently, expert support is generally:○ local (on campus)○ short-term (days, not months)
● We get requests for long term (embedded) research support.
● Currently offered on a competitive basis in some regions but not a national service.
● Should we offer embedded (long term) support? On what basis? Paid? Competitive?
CC Services - Training
29
● Compute Canada current offers training across the country:○ Code optimization○ Use of specific hardware platforms or software services○ Basic and advanced HPC techniques
● Most training is local/regional. Local courses offered by local staff.
● National initiatives include:○ National partnership with Software Carpentry○ International partner in International HPC Summer School○ Discussions with Data Carpentry
● We welcome feedback on training emphasis. Where are the gaps today?
CC Services - Security and Privacy
30
● More and more ARC is being used to do research involving personal info (e.g., health, social sciences, industry data).
● Policies must be in place to protect personal information.
● Physical and network security must be in place to protect held on CC systems.
● Data isolation has to be assured for special projects that require it.
● CC has adopted a new security framework - the ISMS follows ISO/IEC 27001 (operations and standards ISO/IEC 27002).
● Minimum standard in all CC data centres, some will be designated for higher security data sets.
● New network, storage design to support data isolation.
Access and Allocation Policies
31
CC Access Policy
32
● The current access policy is organized by “sponsor.” CC approves the sponsor, the sponsor approves any and all group members.
● Group members can be students, postdocs, external collaborators, etc. All usage “charged” to the sponsor.
● There is no fee for usage charged to Canadian university faculty.
● When the sponsor is from private industry, all usage is subject to a fee.
● When the sponsor is from a federal laboratory or other not-for-profit, a reduced fee applies.
● Teaching is not an eligible use (though training is).
● Has this policy ever been an impediment to your research? Suggestions?
CC Resource Allocation Policies
33
● All users have “default access” to each CC system (compute, storage).
● Users can apply for special resource allocations for:○ Compute (priority in shared system, in core-years)○ Storage (dedicated, short-term or long term)○ Cloud resources (virtual machines, public IP addresses, etc.)
● CC allocates about 80% of the available core-years each year through competitive processes. This leaves up to 20% for default access.
● Two categories of competition, one competition period per year:○ RAC: generally single investigator projects○ Research Platforms and Portals (RPP): shared datasets, possible
multi-year allocations
CC Resource Allocation Policies
34
● Competition is based on peer review:○ Technical review to correct “asks”○ 7 disciplinary panels (78 panelists this year)○ multiple independent reviews per proposal○ panel review meeting to set science score○ multidisciplinary panel review of (about 30) largest proposals
● If panel process does not result in a “balanced budget”, CC applies scaling function based on science score from panel process. 2016 example (compute):
Default Priority
CC Resource Allocation Questions
35
● Competition frequency: once per year plus ad-hoc out-of-round enough?
● Award duration: single year with fast track long enough? Note that CC must report every year, so progress report always needed.
● CCV introduced for 2016 competition. How can we improve the CCV experience?
● Compute scaling based on science score. Alternatives: rank-and-cut, different function shape?
● The connection between tri-council research grants and CC resource allocations means that successful grant recipients still need to apply for Compute Canada resources. Double jeopardy unavoidable?
● We are sometimes asked if users can contribute additional resources. Is there significant demand to provide a price-list?
Ways to Provide Feedbackwww.computecanada.ca/research-portal/feedback/sparc2/
● In person: ○ Speak up in this meeting!○ Virtual - video conferenced consultations (Feb. 3, 22 in english)
● Via a White Paper
● Via a brief (5 minute) survey:○ www.surveymonkey.com/r/V59ZDGV
● Via email (any time):○ [email protected]
An Aside: Account renewal is coming in March. Intend to collect CCVs.
36
Thank You!
37
At the Limit of Our Capacity
38
Projecting Increased Compute Demand Based on SPARC whitepaper projections (research roadmaps)
White Paper Predicted Increase from Current to 2020
Numerical Relativity 3x
Subatomic Physics 3x
Materials Research 5x
Canadian Genome Centres 8x
Canadian Astronomical Society 10x
Theoretical Chemistry 12x
● Also projected:○ Clear need for accelerators.○ Clear need for mix of memory sizes.
39
Projecting Compute Demand: 7x / 5 yearsAveraging SPARC whitepaper projections (research roadmaps)
40
Projecting Storage Demand: 15x / 5 yearsAveraging SPARC whitepaper projections (research roadmaps)
41