20
A Data Centre for Science and Industry Roadmap

A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY

Embed Size (px)

Citation preview

A Data Centrefor

Science and Industry

Roadmap

INNOVATION

NET

WO

RKIN

G

DATA

PRO

CESS

ING

DATA

REP

OSI

TORY

Challenges

• Technological– High speed network providing high bandwidth and low latency– High performance computing– Massive data sources

• Organizational– Inclusiveness and Capillarity to access shared resources

• Economical– High investments needed to match technological

requirements– Resources may have to be shared among several actors as

research and academy will never have enough money to sustain by themselves the required infrastructural development

Data Processing:National Laboratory for HPC

Consortium of seven universities and REUNA. Nine more universities joining the consortium in 2014.

Mission• To consolidate a national facility for HPC by offering top

quality services and advanced training to answer the national demand for scientific computing, developing links between research groups, the industry and the public sector.

Vision• Participants visualize the NLHPC as a highly competitive center

with a range of research services in world-class-quality high performance computing.

Networking: REUNA

• Since a few years REUNA has aggressively– Pursued a plan to put its infrastructure

at the leading edge of the technology• Leased bandwidth –> Lambdas –>

Dark fiber– Extended its infrastructure to connect

with fibre the main research centres in Chile

• Has made a joint effort with advanced computing initiatives to integrate them into the network

Data Repository:

???

The Data Challenge• Data are being produced at an exponentially growing rate

• Data production and collection is expensive and requires that:– Data must be accessible– Data must be preserved for a very long time (many decades)

during operations…– … and often beyond the end of the specific project that

provided the funding

• Historical data could be more important than the actualized data

The Data Challengefor Science

• … The above remarks apply …

• Moreover– Long time data maintenance cost is beyond the economical

possibility of any science– Nobody can foresee what could be the relevance of data

collected today with respect of theories to come

• Many sciences have not yet started to consider the implications of data storage at the global level– Medicine– Pharmacy– …

Scientific Data Repository Requirements

• To be connected to a national research and education network infrastructure ensuring access to/from all relevant actors (science data producers, scientific community, academia, …)

• High capacity communication backbone to remove the need to have computing (processing) resources co-located with the storage facilities

• To be designed to last for many decades, even beyond the boundaries of the original funding of a scientific facility (i.e. economically sustainable in the long term)

• To have data backed up routinely as it is valuable and expensive to recapture

• To take into account physical media renewal

The Data Challengefor Private Enterprise

• Data archival is not within the core business of the large majority of the enterprises, although data analysis is still required to increase profit

• Data analysis requires access to a broad range of competences not easily available within each enterprise domain

• External competences are often not the right solution

• Small to Medium size companies face an economical barrier as they do not have funds to invest in R&D digital infrastructure

VisionA Data Centre for Science AND Industry will drive innovation by:• enabling access to a massive heterogenous

collection of data to both scientist and private entreprenuers

• supporting the development of mathematical models, computing technologies and software solutions accross disciplines

• allowing cost efficient access of a wider range of users to modern technology data storage

ASTR

ON

OM

YG

EOLO

GY

BIO

LOG

YM

EDIC

AL

IND

UST

RY

SCIE

NCE

PRIVATE/PUBLIC PARTNERSHIP

Funding Model

• Capital Investment (CAPEX) shared at (say) 50% between scientific partners and private sector.

• Operational costs (OPEX) bared only by the private partner(s) with (say) 30% of infrastructure reserved to scientific use.

• Therefore: – No operational cost for scientific institutions– Reduced investment costs for the commercial

partner(s)

A Data Centre for Science and Industry

• Shall provide the capacity to store 1 EB (1000 PB) worth of data

• Shall connect to the global academic network through a dedicated backbone

• Shall connect in a transparent way to open internet exchange points for commercial access

• Shall have a reduced environmental impact

Phase 1 – Pilot

• Target storage capacity: 100 PB• Target network capacity: 100 Gbps• Target completion time: 2- 3 years• Physical space: ~ 500 m2

– Location must take into account need for minimizing environmental impact

– Use a modular approach to minimize expansion costs• Power consumption: ~ 0.5 MW

– Non conventional renewable energy source– Backed up with conventional (renewable) energy source

• Estimated investment: ~ USD 20 to 30 M (including space, connectivity, storage, power, etc.)

Phase 2 – Full scale DC

• Target storage capacity: 1000 PB• Target network capacity: multiple 100

Gbps• Target completion time: 5 to 7 years

according to demand and fund availability

Other DC Related Initiatives

• Focused in astronomical data• Focused in creating initial competences• Use existing facilities (universities)• Use existing REUNA capacity• Funded by academic research grants• Development within one to two years

At present there is more than one initiative being developed: to coordinate with them will

create sinergy and increase effectiveness

WBS

• Project Magement• Legal framework• Construction– Site Infrastructure– Network– Power– Storage

• Operational Model• Local Community• Outreach

Local Community

• Environmental impact• Cultural aspects• Byproduct benefits– work opportunities– visibility– local connectivity– …

Actors

• Science communities– Represented by scientific steering

committee– Scientific facilities (data producers)

• Co-investors– Public: …– Private: …

• Local communities