20
Compute Canada Technology Briefing November 2016

Compute Canada Technology Briefing · Compute Canada, in partnership with regional organizations ACENET, Calcul Québec, Compute Ontario and WestGrid, leads the acceleration of research

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Compute Canada Technology Briefing

November 2016

Compute Canada Technology Briefing - November 20162

OverviewThis technology briefing is intended for Compute Canada stakeholders and suppliers. It provides a snapshot of status for the technology refresh program resulting from CFI’s cyberinfrastructure initiative, planned for implementation from 2015-2019. It also anticipates planning for future growth.

About Compute CanadaCompute Canada, in partnership with regional organizations ACENET, Calcul Québec, Compute Ontario and WestGrid, leads the acceleration of research innovation by deploying state-of-the-art advanced research computing (ARC) systems, storage and software solutions. Together we provide essential digital research services and infrastructure for Canadian researchers and their collaborators in all academic and industrial sectors. Our world-class team of more than 200 experts employed by 37 partner universities and research institutions across the country provide direct support to research teams and industrial partners.

Advanced research computing accelerates research and discovery and helps solve today’s grand scientific challenges. Using Compute Canada resources, research teams and their international partners work with industry giants in the automotive, ICT, life sciences, aerospace and manufacturing sectors to drive innovation and new products to market. Canadian researchers leverage their access to expert support and infrastructure to participate in international initiatives. Researchers using Compute Canada’s advanced research computing resources rate significantly higher in citations than the average from Canada’s top research universities and any international discipline average.

Compute Canada Technology Briefing - November 2016 3

Technology Investment Key Facts ņ The “Stage 1” investment, valued at $75 million in funding from the Canada Foundation

for Innovation (CFI), provincial and industry partners, is underway. These investments are addressing urgent and pressing needs and replacing aging high performance computing systems across Canada.

ņ Planning is underway for the outcomes of the “Stage 2” proposal for a further $50 million, which will continue to address capacity needs, as well as providing expansions of secure cloud and other services.

ņ Compute Canada and its regional partners have more than 18 years of experience in accelerating results from industrial partnerships in advanced research computing and Canada’s major science investments.

ņ Compute Canada currently manages more than 20 legacy systems, which are being replaced in 2017-2018 by new systems and storage. Compute Canada operates these resources and supports all of Canada’s major science investments and programs.

ņ With the implementation of the Stage 1 and 2 combined technology deployments, Compute Canada anticipates capacity of over 100 petabytes of persistent storage and 20 petaflops of computing resources.

Investment Impacts on Canada’s Research Community ņ Stage 1 and Stage 2 improvements will allow Compute Canada to continue to support

the full range of excellent Canadian research. The purchase of significantly more storage, deployed as part of an enhanced national data cyberinfrastructure, will accelerate data-intensive research in Canada. The ability to purchase a single Large Parallel (LP) machine of over 65,000 cores will provide Canada’s largest compute-intensive users with a new resource that far exceeds any machine in the Compute Canada fleet today.

ņ Investments in technology refresh are more than an opportunity to increase the size of storage systems and the number of cores. The new systems replace old technology with new, and will be deployed with national services, coherent policies and a new operational model for the organization. This enhanced service level will allow more researchers to exploit the new systems in an efficient and effective way.

Compute Canada Technology Briefing - November 20164

New Systems at Four Stage 1 National Hosting SitesThrough a formal competition among Compute Canada member institutions, four sites were selected to host the Stage 1 systems and associated services. They are the University of Victoria (UVic), Simon Fraser University (SFU), the University of Waterloo (Waterloo), and the University of Toronto (UofT). System specification, procurement, and deployment is ongoing, from 2016-2017.

Table 1: Stage 1 procurement status

Stage 1 Computational SystemsUniversity of Victoria: The ARBUTUS system (previously known as “GP1”) is an OpenStack cloud, with emphasis on hosting virtual machines and other cloud workloads. The system, provided by Lenovo, has 6,944 CPU cores across 248 nodes, each with on-node storage and 10Gb networking. It accesses 1.6PB of persistent storage, primarily via Ceph in a triple-redundant configuration. The system became operational in September 2016, as an expansion to the Compute Canada “Cloud West” system.

SYSTEM RFP ISSUED RFP CLOSED DELIVERED IN PRODUCTION

National Data Cyberinfrastructure

(Ongoing delivery) Fall 2016

ARBUTUS - UVic Cloud Fall 2016

CEDAR - SFUGeneral Purpose Early 2017

GRAHAM - WaterlooGeneral Purpose Spring 2017

NIAGARA- UofTLarge Parallel Late 2017

Compute Canada Technology Briefing - November 2016 5

Figure 3.1: The ARBUTUS system in operation at the University of Victoria (September 2016)

Simon Fraser University: The CEDAR system (previously known as “GP2”) is a heterogeneous cluster, suitable for a variety of workloads. The system will be liquid cooled, and will be installed in the newly renovated SFU Water Tower data centre. It is expected to include over 20,000 CPU cores. Node types are anticipated to include “base” and “large” compute nodes with 128GB and 256GB of memory, as well as bigmem nodes with 512GB, 1.5TB and 3TB of memory. GPUs will be included in approximately 15% of nodes. When deployed in early 2017 this will be one of Canada’s most powerful research computing systems.

University of Waterloo: The GRAHAM system (previously known as “GP3”) will have a similar design to CEDAR, and it is anticipated that CEDAR and GRAHAM together will provide features for workload portability and resiliency. Both will have a small OpenStack partition, and both include local storage on nodes. Anticipated specifications for GRAHAM include over 20,000 CPU cores across a diverse set of node types, including GPU nodes. The system is anticipated for deployment in early 2017.

University of Toronto: The NIAGARA system (previously known as “LP”) will be deployed by approximately mid-2017, anticipated to have some 66,000 CPU cores1. This will be a balanced, tightly coupled high performance computing resource, designed mainly for large parallel workloads.

1 All future plans for nodes, CPUs and other specifications are intended as conservative estimates. CPU core counts are based on “Haswell” technology

Compute Canada Technology Briefing - November 20166

Figure 3.2: Stage 1 National Hosting Sites

National Data CyberinfrastructureA new national data cyberinfrastructure (NDC) will span all Stage 1 and Stage 2 sites, providing a variety of data access mechanisms and performance levels. Major components of the NDC were purchased in late 2016, and will be expanded over time.

A. Storage Building Blocks (SBBs). Commodity storage systems that are flexible, configurable, and will evolve over time as technology improves.

a. aProvider: Scalar Decisions, Inc. (Toronto).

b. Technologies: SBB systems from Seagate and Dell.

c. Configurations to be provided: Multiple, for different performance tiers and capacities.

B. Object Storage Software. To provide automated, efficient data replication across the wide-area network, S3-compatible interface to data objects, and POSIX-style access to object storage.

a. Provider: DDN Storage.

b. Technologies: Web Object Storage (WOS) software.

c. Configurations to be provided: Software to be installed at all four Stage 1 sites, and other future technology hosting sites.

Compute Canada Technology Briefing - November 2016 7

C. Backup capabilities. To provide cost-efficient bulk storage of data copies, including archives and near-line storage.

a. Provider: IBM Canada and Glasshouse

b. Technologies: Spectrum Protect software; TS3500 tape silos and LTO7 tapes+drives; supporting infrastructure systems

c. Configurations to be provided: Multi-site redundant backups to SFU & Waterloo; other configurations and uses as needed.

D. Parallel filesystem software. To provide persistent filesystem-based capacity on SBBs.

a. Provider: TBD (RFP closed in October 2016).

Table 3.1: Persistent online storage capacity projections

The national data cyberinfrastructure capacity does not include temporary storage on computational systems (i.e., /scratch). NDC systems will be linked via a high-speed network, described below. In addition to capacity growth, other components are under consideration for the NDC. These may include capacity management systems, to automatically migrate data among performance tiers, from online to nearline and back again. Mechanisms for data management, information lifecycle management, and data resiliency may also be of interest. Compute Canada strives to achieve the maximum value from its investments, and may seek to balance purchased solutions with self-developed or self-supported solutions.

NetworkingWide-area connectivity among sites is undergoing major upgrades in capacity and features. Each Stage 1 and Stage 2 site will be connecting to the CANARIE wide-area research network, via regional area networks, at 100Gb or greater speeds. A Science DMZ design will be used to enable rapid and reliable transit of data among sites. Key uses of the new network will include:

ņ Stage-in and stage-out of data sets for HPC computations;

ņ Backups, including redundant multi-site backups;

ņ Data replication, notably via WOS for access via S3;

ņ Cross-mounted filesystems for ease of access to persistent data, such as /project;

Per-site deployed usable online capacity that is likely to be based on software-defined storage software and SBBs:

2016 2017 2018 2019 2020

Total deployed online capacity (PB) across sites 40 62 100 150 225

Number of national hosting sites 4 4 TBD TBD TBD

Compute Canada Technology Briefing - November 20168

ņ Workload portability, including for virtual machine migration among hosts, and for metascheduler placement of HPC jobs;

ņ Guaranteed quality of service and availability for license servers and other critical services.

A single provider of networking equipment will be identified by the end of 2016, to support the Science DMZ and other networking needs for the hosting sites.

Figure 2: Science DMZ Concept with National Data Cyberinfrastructure Components for Stage 1 Hosting Institution

Status and Planning for Stage 2 InvestmentsCompute Canada submitted a proposal on May 20, 2016 to the Canada Foundation for Innovation (CFI) for the Cyberinfrastructure Challenge 2 Stage 2 competition. Stage 2 has a similar structure to Stage 1, with a total value of $50 million (including $20M from CFI, $20M from provinces and partners, and $10M in vendor in-kind). The results of Stage 2 have not yet been announced publicly. In this section, highlights of the submission are described. The Stage 2 program is planned for implementation over approximately a two-year period, beginning by mid-2017.

Stage 2 Hosting Site SelectionAs with Stage 1, an open solicitation invited proponents for national hosting sites. It is anticipated that several hosting sites will join the four Stage 1 sites. In addition, Stage 1 sites were eligible to seek additional Stage 2 funding. Final selection of Stage 2 sites, along with the specific technology mix and level of investment for each site, will occur during the Stage 2 finalization process.

Compute Canada Technology Briefing - November 2016 9

Stage 2 ComponentsThe systems, storage and software which will comprise Stage 2 build directly on Stage 1, with enhancements identified during user needs analysis and consideration of Stage 2 proponent site strengths.

GP1x: OpenStack cloud system. Building on the successes of Cloud East (Sherbrooke) and ARBUTUS(Cloud West at UVic), the goal is to extend the federated on-premises private research cloud across Compute Canada. The cloud primarily provides Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), to a rapidly growing constituency. Software as a Service (SaaS) is also available, and expected to grow. Cloud federation will benefit users with workload and storage portability and resiliency, single sign-on and namespace, and a common software stack. The GP1x component was updated during the hosting site evaluation process and proposal writing, to instead refer to Elastic Secure Cloud (ESC) systems, which are described in more detail below.

GPx: Heterogeneous cluster with elastic OpenStack partitions. Clusters with a variety of node types, including nodes suitable for OpenStack, big memory nodes, and nodes with GPUs; most nodes will have local storage. The high-performance interconnect might not be fully non-blocking for all nodes, but will have some partitions suitable for multiple jobs of at least 1024 cores. The systems will grow over time as funding allows, including via contributed systems. The Stage 2 site selection RFP solicited GP4/GP5 hosts, as well as possible expansion of Stage 1 systems at SFU and Waterloo.

Experimental systems: Compute Canada’s strives to provide access to new resource types, with this in mind, it is envisioned that a number of relatively small experimental systems will be deployed. These may be purchased, loaned, or developed, over different durations. Some experimental systems may become production resources, or guide future procurements of larger systems. Stage 2 hosting proponents could select this as an additional option to the main three system types. Compute Canada has been in communication with numerous vendors who may participate in an experimental system program.

A component of experimental systems is commercial cloud hosting. Compute Canada is often asked about outsourcing to commercial hosting services. To explore this area, Compute Canada may run an open RFP to select one or more in-Canada cloud providers, and then work to develop easy mechanisms for users to span their workload among Compute Canada resources and commercial clouds. Companies with cloud offerings based entirely in Canada have expressed interest in working with Compute Canada on this initiative. The cost/node for individual purchases of cloud computing is high (at least 4x greater than Compute Canada’s in-house systems for retail pricing); therefore, this resource must be deployed carefully. this may be mitigated via an RFP for bulk purchase and partnership. At the same time, this will add capabilities of interest from commercial clouds, which tend to be more feature-rich than our OpenStack environment. Emphasis would be on providing ease-of-use for constituents who wish to move between Compute Canada’s cloud and a commercial cloud, or in the other direction. This will include situations where users pay for the commercial cloud capacity themselves, but Compute Canada enables workload and feature portability.

Compute Canada Technology Briefing - November 201610

Deep storage and persistent storage: One new site will be identified to augment SFU and Waterloo in hosting backups and other nearline storage. These will consist mainly of tape libraries and associated software and infrastructure, building on the National Data Cyberinfrastructure procurement outcomes.

Local/regional data caches: There will be relatively small resources to provide local/regional access to the National Data Cyberinfrastructure. These would be distributed roughly in proportion to storage need (as readers or as writers), and would expand the efficiencies of large-scale procurement and operations from the National Data Cyberinfrastructure to smaller sites.

Services infrastructure: Further investments in the service infrastructure development efforts funded through Stage 1 are envisioned. Stage 1 investments are focused on personnel to develop or adapt common services. The philosophy is that if multiple users/groups express a need for a service, as identified via user surveys or white papers, then Compute Canada should consider making it a national offering. Investments to-date have included a software partnership with Globus, to develop Globus data publication services to better serve the needs of the Canadian research community for data curation, preservation and discovery. Major areas of effort for services infrastructure include:

ņ Identification and Authorization Service: Provide common login across systems.

ņ Software Distribution Service: Version-controlled software distribution to multiple sites.

ņ Data Transfer Service: To move datasets among collaborators and their repositories.

ņ Monitoring Service: Track uptime and availability of services and platforms.

ņ Resource Publishing Service: Current information about available resources.

Elastic Secure Cloud ServicesIn Compute Canada’s Stage 2 proposal, notions of GP1x, federated cloud sites, and local/regional data caches were expanded to incorporate elastic secure cloud services. Stage 2 site selection RFP responses indicated strong need, as well as existing capabilities, for secure cloud. The main current use case for these services is hosting of health information, including personally identifiable information (PII). PII is reflected in one of the largest data growth areas (genetic sequences and brain imaging, which are also major elements of other CFI-funded projects). Emergent use cases are in the social sciences, where controlled access to datasets is the norm. Researchers in criminology, labour statistics, and other areas have similar needs.

Compute Canada plans to build a secure multi-tenant environment based on the concepts of OpenStack cloud and local/regional data caches. The intention is that the same OpenStack cloud environment as other Compute Canada cloud resources (with the same storage environment) will implement logical partitioning such that the needed levels of isolation for data and compute are enforced. This design is informed by the highly successful HPC4Health

Compute Canada Technology Briefing - November 2016 11

implementation by Compute Canada member institutions in Ontario (www.hpcforhealth.ca). This model will be expanded and enhanced to meet the needs of other provinces. It is proposed that secure cloud capabilities will be part of all OpenStack systems or partitions on Stage 1 and Stage 2 sites.

The “elastic secure cloud services” label is chosen to convey several qualities. First, any of the cloud partitions on GPx systems are intended to be resized as needed in response to user demand, with allocation of appropriate computational/storage resources. As mentioned above, all cloud systems will be able to provide a secure environment, via logical partitioning of compute and storage resources. Such logical partitioning is used by HPC4Health and some other current implementations by Compute Canada members, and is in most cases adequate (i.e., physical partitioning and air gaps are not necessary, but separate filesystem mount points and VLANs are). The secure partitions within a cloud will, generally, be assigned to a particular tenant (such as a hospital department, or a data analysis research portal). The tenant would have the needed control over authentication, authorization, logging, etc. Those secure partitions would also be elastic as needed over time, so that they can expand, shrink, or gain access to a different resource mix.

Anticipated Level of InvestmentThe Stage 2 investment is anticipated to have approximately the following financial investment levels:

SYSTEM/SERVICE TYPE CASH EXPENSE NOTES

Deep storage $5,000,000 One additional deep storage site, plus additional capacity for the current two sites.

Experimental systems $1,500,000 Small experimental systems at some Stage 2 sites; modest investment in commercial cloud.

Services infrastructure $500,000 1 FTE for 2 years, plus small purchases of existing software and/or services.

Elastic secure cloud (ESC) $1,500,000 One standalone ESC site.

GPx $31,500,000 Expansion of one or more GPx systems, and addition of one or more new GPx systems. All GPx systems will have ESC partitions.

TOTAL $40,000,000 Value includes provincial/partner match; does not include vendor in-kind, which brings the value to $50M.

Compute Canada Technology Briefing - November 201612

In this planning, investment is focused on general purpose computing (i.e., GPx-type systems, with multiple nodes types), with the addition of a standalone ESC site. The GPx systems will address the needs of the majority of users/projects, adding needed capacity. The node configurations of new GPx systems and expansion of Stage 1 GP systems will be adjusted to reflect early experiences with the Stage 1 systems - for example, it may be desirable to have larger partitions for tightly-coupled workloads, or to have larger bigmem nodes, or different configurations for local storage, alternate GPU configurations or quantities, or variations on the cloud partition sizes or node configurations.2

The strength of the ESC addition is to develop a new model for local/ provincial/ regionally-focused systems, at a relatively low cost but with very high value. ESC systems highlight capabilities of CC’s systems and staff, give needed features to stakeholders, and provide on-ramps to larger computational and storage resources.

Compute Canada’s Need for New InfrastructureCompute Canada supports a vibrant program of research spanning all disciplines and regions of Canada. This support is delivered by providing Canadian researchers access to world-class cyberinfrastructure and expert personnel.

The advanced research computing (ARC) needs of the Canadian research community are growing. Growth comes from new scientific instruments and experiments, from cyberinfrastructure use by a broadening list of disciplines, from generation and access to new datasets and the innovative analysis and mining of those datasets, and from the mutual reinforcement of technological and scientific advances that inspire researchers to construct ever-more precise models of the world around us. Canada’s ARC infrastructure requires constant updating, to keep pace with the needs of its researchers.

2 Includes planned Arbutus expansion in 2017.

ESTIMATED CAPACITY STAGE 1 STAGE 2 TOTAL

Ncores (Elastic Secure Cloud) (GP1) 8,500.2 5,486. 13,986.

Ncores (LP) (LP) 66,000. - 66,000.

Ncores (GPx) (GP2+3) 52,000. 89,250. 141,250.

Total cores 126,500. 94,736. 221,236.

New persistent storage (PB online) 62. 38. 100.

Compute Canada Technology Briefing - November 2016 13

Existing Usage InformationCompute Canada has studied usage information from the past 5 years. For example, the chart below shows CPU usage from 2010 through the end of 2015. CPU usage and allocations of computational resources are measured in core years, representing a single CPU core’s utilization for one calendar year.

The different colours show the usage broken down by discipline. The decrease in 2015 was expected, due to a reduction in available compute resources as Compute Canada decommissioned older systems that exceeded their normal life span (the largest single system contributing to this supply is now 7 years old). This chart illustrates that a significant number of different disciplinary areas share the Compute Canada facility, each bringing their own resource needs.

Figure 5.1: CPU usage by discipline as a function of time

The Compute Canada federation supports a wide range of computational needs on its shared infrastructure. One way to examine this is through the number of cores used in a single batch job, which is the dominant method for use of these resources, and the types of jobs for which resources allocations are granted (contributed systems, cloud systems, platforms & portals, and other modalities not included here). The chart shows the number of core years used in Compute Canada resources per year. The colours illustrate the fractions of those core years in bins of cores-per-job. It shows, for example, that the largest single category in 2015 is serial or low-parallel computation (fewer than 32 cores), which represents about 30% of the total. Meanwhile, nearly 50% of CPU consumption in 2015 was by jobs using at least 128 cores.

Compute Canada Technology Briefing - November 201614

Figure 5.2: CPU usage binned by number of cores used per job as a function of time.

It should be noted that the size and configuration of Compute Canada’s current systems limits the ability of Canadian researchers to submit jobs at the largest scales, and this has limited the growth of the highly parallel bins. Even for the larger resources, queue wait times (via the “fairshare” workload management policy in effect for most systems) create challenges for completing large multi-job computational campaigns.

As noted, the overall capacity within Compute Canada is currently inadequate to meet the growing need of the Canadian research and innovation community. After technical validation, for 2016 Compute Canada was only able to allocate 54% of the requested computational resources (down from 85% in 2012) and 20% of GPU requests. With respect to storage, 93% of requests were granted in 2016, although this was enabled by deferred allocation of storage to as-yet-uninstalled Stage 1 resources. Without Stage 1 storage, the allocation rate for storage would have been 65%.

Projecting Future Needs of the Canadian ARC CommunityExtensive community consultation was undertaken to ensure that the Stage 2 proposal was anchored in the anticipated future needs of the Canadian ARC community. Consultation included in-person community meetings, online surveys, collection of community white papers, and user interviews.

The aggregated community need for computational resources has been projected based on this input. Survey analysis predicts 12x growth in computational need over 5 years, while the white paper analysis predicts a 7x increase over the same period, with different annual increase rates among submissions. The chart below shows these need projections assuming an exponential

Compute Canada Technology Briefing - November 2016 15

growth profile, in units of allocatable Haswell-equivalent core years. The shaded band covers the range between the 7x and 12x 5-year projections. Three supply curves are shown: 1) (red) assuming only the Stage 1 award, 2) (light blue) assuming the Stage 1 award and success of the Stage 2 proposal, and 3) (dark blue) projecting a $50M Stage 3 award in 2018.

This chart shows that Stage 1 alone leads to a short-term increase in core count (by about 50%), followed by a marked decrease based on the decommissioning of older pre-existing systems by 2018. Stage 2 funding will lead to an approximate doubling of allocatable cores by 2019 with respect to the baseline. Stage 3 funding would be required to allow the supply to approach the need curve in 2019.

Figure 5.3: Supply and Need projections for compute (in core-years/year) and storage (in PBs).

The aggregated need for storage resources has also been projected. The survey analysis predicts 19x growth in storage need over 5 years, while the white paper analysis predicts a 15x increase over the same period. The storage projection chart below shows the 15x-19x range for the three stages of investment described above. The storage supply and need both represent allocatable disk storage. However, replication factors, potential future object storage adoption rates and usage of tape (for nearline storage) to alleviate disk need are difficult to predict prior to significant Stage 1 storage experiences. As a result, in the chart below we scale raw storage supply downward by a factor of 1.4 (i.e., we assume approximately 70% disk usage efficiency).

Compute Canada Technology Briefing - November 201616

In addition to aggregated need information, the user survey responses revealed specific requests for additional features, new architectures and special node types. They include requests for:

ņ Overall increased compute capacity,

ņ Better support for Big Data use-cases,

ņ Encrypted cloud storage and other steps to enable research on sensitive datasets,

ņ Increased access to large memory nodes,

ņ Specialized resources to support bioinformatics,

ņ Greater accelerator (e.g. GPU) capacity,

ņ Better support for interactive and visualization-focused use-cases,

ņ Better support for long-term data storage and enterprise-class data backup,

ņ Platforms to support new hardware development (IT and computer engineering-related research),

ņ Increased training and improved documentation.

Figure 5.4: Supply and Need projections for compute (in core-years/year) and storage (in PBs).

Compute Canada Technology Briefing - November 2016 17

White paper submissions also revealed a number of emerging trends that help to drive the technology choices laid out in this proposal. These include need for:

ņ Large data storage driven by improved instrumentation in genomics, neuroimaging, astronomy, light microscopy and subatomic physics,

ņ Large memory nodes (at least 512GB) from astronomy, theoretical subatomic physics, quantum chemistry, some use-cases in bioinformatics, humanities, some use-cases in AMO physics, and institutional responses,

ņ Expanded accelerator capacity (primarily GPUs) from subatomic physics, chemistry, artificial intelligence,

ņ Robust, secure storage options from the digital humanities,

ņ Expanded cloud services from digital humanities and astronomy,

ņ Expanded capacity for tightly coupled processing, including jobs that exceed 1,024 cores.

There is evidence of need for systems with far larger homogeneous partitions than reflected in Stage 1 planning for the LP system. This includes researchers who have offshored or outsourced their computation away from Compute Canada resources. In 2016, the SCINET consortium (Ontario) contacted 58 Canadian faculty members who run large parallel jobs. Respondents were primarily users who had submitted at least one job requiring at least 1,024 cores, either on Compute Canada resources, the 66,000 core Blue Gene/Q at SOSCIP, or international facilities. Of these, 26 were interviewed to discuss their usage patterns and future needs. If resources were available today, in total they would use approximately 250,000 cores per year on a homogeneous, tightly-coupled, large parallel machine - with much larger jobs, and requiring many more cores, than the LP system planned via Stage 1 for mid- to late-2017 (below). One individual within the group had already run a 330,000 core job on the Tianhe-2 machine (China), and expressed need to scale to 1M cores in the future.

Due to lack of current availability in Compute Canada, it can be reasonably assumed that the LP demand is significantly underestimated and researchers are tailoring their areas of investigation and ARC usage to maximize their productivity. It is envisioned by Compute Canada that, over time, larger systems with larger homogeneous partitions will be provided, thereby enhancing the ability for users to pursue larger-scale investigations.

Compute Canada Technology Briefing - November 201618

Information Not GuaranteedPlans described here will be modified as needed, based on discussions among the hosting institutions, CFI, Compute Canada and its partners, and provincial funding agencies. There will be ongoing assessment of anticipated user demand, including for new technologies or configurations. Consultation will be via the SPARC process described above, as well as through discussions with funding agencies and their researchers. Planning will also be responsive to any new information concerning additional funding, the selection of additional hosting sites, shifts to Canada’s digital research infrastructure strategy, or other factors.

Procurement ProcessesAll hosting institutions are working with Compute Canada to ensure open and fair acquisition processes. Resources will be purchased and owned by each site. Formation of specifications, and evaluation of bids, will be by Compute Canada’s national teams with full engagement by site procurement officers.

Compute Canada Technology Briefing - November 2016 19

Vision 2020Compute Canada, as a leading provider of digital research infrastructure (DRI), is taking an integrated approach to data and computational infrastructure in order to benefit all sectors of society. As a result of the technology refresh and modernization supported by CFI’s Challenge 2 Stages 1 and 2, world-class Canadian science will benefit from modern and capable resources for computationally-based and data-focused research.

Compute Canada is cooperating with government funding agencies and with other digital research infrastructure (DRI) providers to provide the world’s most advanced, integrated and capable systems, services and support for research. Future researchers will have seamless access to DRI resources, integrated together for maximum efficiency and performance, without needing to be concerned with artificial boundaries based on different geographical locations or providers.

By 2020, Compute Canada will offer a comprehensive catalog of resources to support the full data research cycle, allowing researchers and their industrial and international partners to compete at a global scale. In cooperation with Canada’s other DRI providers, Compute Canada’s systems and services will facilitate workflows that easily span different resources: from the lab or campus, to national computational resources, analytical facilities, publication archives, and with collaborators. Local support and engagement will remain a hallmark of delivering excellent service to all users. The pathway to this future has begun, with the modernization of Compute Canada’s national data cyberinfrastructure through the CFI Challenge 2 investments.

155 University Avenue, Suite 302, Toronto, Ontario, Canada M5H 3B7

www.computecanada.ca | www.calculcanada.ca | @ComputeCanada

416-228-1234 | 1-800-716-9417