Upload
simon-price
View
13
Download
1
Embed Size (px)
Citation preview
Advanced Computing Research Centre
http://www.bris.ac.uk/acrc
Managing research data at Bristol
25 February 2013
Dr Ian Stewart, Director of Advanced Computing
Simon Price, Assistant Director (R&D/ILRT)
BlueCrystal Facility
• Phase 1: 384 AMD Opteron cores, 15TBytes storage (GPFS);
• 2TFlops
• Phase 2: 3500 Intel Harpertown cores, GPGPU accelerators, 100TB storage attached + 300TB near line (GPFS);
• 40TFlops
• Phase 3: 5400+ Intel SandyBridge cores, GPGPU accelerators, 400+TB storage attached (Panasas);
• 200TFlops
• < 100 users Phase1 to > 600 users now
• More compute => much more data
2
UoB Research Data• 2010 UoB survey indicated storage requirements of 1PB and growing
– large data holdings spread throughout University – not just HPC– Arts & Humanities, Social Science, Social Medicine,…
• Projected requirements ~16PB by 2018 (Moore’s law)– But growing evidence that assets are increasing at faster rate
• Research grants 3 years– storage of data > 7 or 10 years minimum– EPSRC policy – implies “forever”– drug design: 30 years; airframe: 50 - 100 years
• Q: Who pays?– HEIs? RC? By access?
• Q: Can we afford to keep it all? Should we keep it all?• Not just storage - also need to manage our data assets
3
4 Research Data Storage Facility (RDSF)
HPC & RDSF Management
55
BoardPro Vice-Chancellors (Research, Learning & Teaching), Deans of Science & Engineering, Deputy
Finance Director, Director ITS, Chair and permanent members of HPC Exec
HPC Executive2 permanent members (Director ACRC and senior CS/HPC academic), 5 rotating members from User Group, representative of Estates + non-exec member, Research Facilitator
User GroupsHPC StakeholdersStorage users
Technical Advisory Board
HPC SysAdminsNominated HPC stakeholdersPermanent members of ExecExternal HPC experts
ACRCDirector2 x HPC SysAdmins1 .5 x Storage SysAdminResearch Facilitator
Research Data Storage and Management Board
Directors ITS and ACRC, Research Facilitator, representatives from HPC Exec, 6 Faculties, Library and RED
RDSF Design
• BluePeta – Research Data Storage Facility
• Project thinking began in 2007/8
• Resilient, expandable, secure, enterprise-grade
• Three machine rooms– Two disk mirrors (DDN9900) in separate purpose built HPC rooms– Tape backup (IBM TS4500) in UoB corporate machine room– Tape archive in planning (offsite? security? access?)– GPFS/TSM? (considering LTFS, Arkivum & Filetek)
• Cross Site SAN– Redundant routes– Three filesystems (single copy - 2 block sizes, mirrored)
• Desktop and Departmental exports– CIFS, NFS, GPFS
6
RDSF Policy documents
• Involved IT Services, RED, Academics, Data Security, Secretary’s Office,
Finance, and others.
• Policy for the use of the RDSF– Scope of RDSF– Responsibilities– Processes
• Terms of Use– Detailed companion to policy– Covers FOI, Ethical, Legal, Costs, Ownership– Technical Issues
• FAQ
• External User Agreement– Incorporates all the above in a contract
7
Using the RDSF
• Data Steward (usually PI) applies - has responsibility for the data; can
then register one or more projects.
• On-line application form asks all relevant questions upfront:– Provide DMP if available (ties in with funder policy.)– Personal data? FOI exemption? Security level?
• Academic review for new applications.
• 800TB allocated across all faculties in 20 months.
• Usage currently around 30% of allocation.
• Storage policy – subject to on-going revision.
• Annual asset holding review planned from 2013.
8
Costs of using the RDSF
• Previous model - £400 per TB per annum on disk, but with funders
requiring long term retention of data (e.g. 10 years+), how do
researchers fund data storage after the end of the project?
• New Pay Once, Store Forever (POSF) model addresses this. Applies
Moore’s Law to storage costs through multiplier.
• Encouraging researchers to include costs in grant applications (line
being added to fEC tool).
• Q: How to cost long term data curation in POSF?
• Q: Alternatives?– Cloud? Expensive. Data access costs. How to mine cloud data?– HE sector facility/facilities? EPSRC regional consortia?
9
data.bris – research data service• Building on storage expertise.
• JISC funded pilot in Arts to March 2013.– Developing portal to make subsets of RDSF data accessible. – Researcher training and advice plus website.– Metadata guidance.– RDM principles.– Exploring ways of sharing data (e.g. Project Moonshot).
• Business case to develop a research data service.– “Minimal” service from April 2013 and then incrementally developing
a wider service by 2016.– Process will be led by Library with support from IT Services and
RED (research services).– Integration of Pure (RIS) and data.bris will be explored.– Curation cost recovery models to be considered.
10
data.bris – systems environment1111
data.bris – types of data access
• Research project space– Read/Write access by members of research project
• Mounted drive
• Research data publication– Read-only access to published data
• DOI + data discovery
• Research-active data sharing– Read-only access to unpublished data
• "Web Sharing"– Read/Write access to research-active data
• "Collaborative Sharing"
12
Web Sharing
• Create links to files in your project space that can then beforwarded to collaborators.– Suitable for sharing with collaborators who only need read access.– Low security: traffic is not encrypted, links are not secured.
13
Collaborative Sharing
• Create project file space that can be shared securely with your collaborators.– Supports both read-only and read/write access for collaborators via SFTP.– High security: traffic is encrypted, only accessible by registered
collaborators.– Supports resumption of interrupted file transfers.
14
Further Information15
ACRC – http://www.bris.ac.uk/acrc
data.bris – http://data.bris.ac.uk