Research · 2017-02-14 · • User / IT • Easily accessible ... Group permissions. Emerging...

Preview:

Citation preview

Research

Life Cycle

Acquire

Plan

Analyse

Access Collaborate

Manage Archive

Publish Reuse

Data

HPC Cloud Virtual labs

Dataset transfer Databases Web-based file sharing Collaborative sites

Automated ingest and management

RDM support

Technical advice Costing Grant assistance

Comprehend

Visualisation facilities

Institutional repository

Emerging Researcher Series

What is eResearch?

Thursday, 9 February 2017

Research Liaison Jason van Rooyen, PhD

Overview

1. Research data and generators

2. Why eResearch?

3. Research challenges 3. Service catalogue

4. Summary

http://datablog.is.ed.ac.uk/files/2013/12/bitsissue8_2.png

Emerging Researcher Series #1 9 Feb 2017

Research Data and Generators

Emerging Researcher Series #1 9 Feb 2017

Data origins

Emerging Researcher Series #1 9 Feb 2017

Data role-players • Producers:

• Student/researcher • Facilities • Downstream and re-analyses

• Consumers:

• Lab/group • Collaborators • Discipline / Community

• Managers:

• IT • Data managers / libraries • Administration

• …….((((((Owners))))))……. researchers libraries

admin

collaborators

IT

data scientists

platforms/facilities

community

P

M C

Emerging Researcher Series #1 9 Feb 2017

Data’s value to the community

• Intrinsic value: • Evidence • IP/commodity • Productivity metrics

• Drivers for sharing data:

• Validation • Re-use • Publicity

• Community and journal standards • Funding agency mandates • Innovation regulations

Top-down

Bottom-up Findable Accessible Interoperable Re-usable

DATA

Emerging Researcher Series #1 9 Feb 2017

Types of researchers • Researchers differ in:

• Resources • Skills and experiences • Risk appetites

“Rock Star” researchers

• Scarce • Well-funded • Prioritised • Large skilled teams • Early adopters / innovators • Risk takers • Networked

“Long Tail” researchers

• Under-resourced • Sometimes isolated • Cost ($ & minutes) / risk averse • Abundant

http://vignette1.wikia.nocookie.net/walkerjourn515/images/6/60/Rogers_adoption_curve_deaderick_version.png

Emerging Researcher Series #1 9 Feb 2017

Fields / Labs / Groups / Units all differ in capacity • Types of groups

• Academic groups / solitary PIs • Facilities • Programmes

• Staffing structures:

• Students • Research staff • IT/data engineers • Programmers

• Infrastructure differences

• Desktops • Server rooms

Jeannie T. Lee, M.D., Ph.D. Professor of Genetics and Pathology, Harvard Medical School

Lee Lab:

different needs

Emerging Researcher Series #1 9 Feb 2017

Why eResearch?

Emerging Researcher Series #1 9 Feb 2017

Why eReseach?

Pace and scale increasing

New tools and methodologies Internationalisation

Research Revolution

Emerging Researcher Series #1 9 Feb 2017

Who is eResearch UCT?

UCT eResearch partners with research groups to accelerate and transform research, connecting them to the most appropriate services to support the research

life cycle.

• New research strategy (2015-2025) • Research life cycle:

− Forecasting and grant

writing − Data collection − Analysis and computation − Publication − Data management − Sharing & collaboration − Profile-raising

Emerging Researcher Series #1 9 Feb 2017

ICTS Engineers Research support Project management

Research Office: Communications

Libraries: Digital/Data services Digital scholarship

Emerging Researcher Series #1 9 Feb 2017

Research Challenges

Emerging Researcher Series #1 9 Feb 2017

Managing volumes - challenges at scale Lee Lab: • Movement

• from processing to storage

• Findability and recoverability • context (metadata)

• Privacy & access

• Infrastructure

• Local, central • Support, admin • Backup, security

• Education in best practise and tools

• Costs

• Consumer/enterprise, lifecycle

Emerging Researcher Series #1 9 Feb 2017

Managing volumes @ UCT Lee Lab:

0

50

100

150

200

250

300

350

400

2014/09/18 2014/12/27 2015/04/06 2015/07/15 2015/10/23

TB

• Sources: instruments, processing, collaborations

• 400 TB allocated

• Average allocated vs. provision ration 2:1

• Current rate 40 TB/m provisioned

• 90 TB fast parallel storage on HPC (fhgfs)

Uptake Rate

Storage Provisioned arceibo (74 TB)

CASA (74 TB)

SATVI (70 TB)

astronomy

Emerging Researcher Series #1 9 Feb 2017

Managing volumes – data deluge Lee Lab:

Field Technique Data rate Geomatics Laser scanning ~ 4 TB / year Neurosciences MRI ~ 5 TB / year

Biosciences Next-Gen Sequencing > 10 TB / year Biophysics Direct electron detectors TEMs > 200 TB / year Super-resolution microscopes > 1 PB / year

Emerging Researcher Series #1 9 Feb 2017

Sharing data - challenges Lee Lab: • Managing access permissions

• Who and how

• Internal vs. external collaborators

• Privacy and POPI act

• Small vs. large • Tools • Firewalls • Bandwidth

• Costs

• Bandwidth • Importance vs. other traffic

• Licence fees

//researchdata

Emerging Researcher Series #1 9 Feb 2017

Sharing data @ UCT Lee Lab: • Data is shared with:

• Project members • Collaborators

• Internal & external • Local & international

• Journals & repositories • Using array of tools

0.00

10.00

20.00

30.00

40.00

50.00

60.00

Sep-14O

ct-14N

ov-14Dec-14Jan-15Feb-15M

ar-15Apr-15M

ay-15Jun-15Jul-15Aug-15Sep-15O

ct-15N

ov-15Dec-15Jan-16Feb-16M

ar-16Apr-16M

ay-16Jun-16Jul-16Aug-16Sep-16O

ct-16N

ov-16

Tera

byte

s Tra

nsfe

rred

Month

ARC Globus Endpoint

heinedej#ARC-Ubuntu

heinedej#H3ABioNet

heinedej#eResearchUCT

+ 140 TB to Wits

Emerging Researcher Series #1 9 Feb 2017

Efficient analyses - challenges Lee Lab:

• Appropriate infrastructure • Local, central, cloud

• Staff

• Time & skills • Hardware & systems • Training students

• Managing and storing processing results

• Standardizing workflows

• Resourcing / sustainability

• Costs for start-up • Lifecycle & upgrades • Seed funding

• Suitability of central compute resources

• Allocations, system, support

Emerging Researcher Series #1

9 Feb 2017

Efficient analyses @ UCT Lee Lab:

2011 2015

http://hpc.uct.ac.za/ • HPC @ UCT: • Inception in 2009 • 5-fold expansion to 1 450 cores (end

2013) -> exponential increase in usage • GPU servers • Community has consumed 12 million

compute hours

• VMs • ± 40

• ARC

• 15 compute nodes • 256GB of RAM per node • 360 processing cores • Over 400TB storage • NW - 500TB object storage

Emerging Researcher Series #1 9 Feb 2017

Archiving and preserving - challenges

• Deciding what to keep • triage –raw, processed, versions

• Metadata

• Which metadata to keep • How to keep associated with data

• Replication, vs. backup vs. archiving

• Best systems • Infrastructure

• Sustainability

• Ownership of data • Long-term costs

Emerging Researcher Series #1 9 Feb 2017

Publishing data – challenges Lee Lab: • Staying compliant :

• Agencies / owners • Deriving the most value for investors

• Deciding what to share

• Tension between competitiveness and

openness (patents)

• Where to put the data and how to fund it?

• Sharing large data

• Tracking impact, attribution, and proving compliance.

Emerging Researcher Series #1 9 Feb 2017

Supporting data-intensive research with ICT Lee Lab:

• Increased connectivity of

researchers • Security vs. convenience • Policy challenges (data

ownership)

• Sustainability • Charge model or subsidy • Brokerage (connecting to

competitors) • Seed funding

Imag: permabit.com/data-affordability-gap/

Emerging Researcher Series #1 9 Feb 2017

Ideal Services for a Modern University

Emerging Researcher Series #1 9 Feb 2017

Data management and planning Lee Lab:

• Planning assistance

• Costing • DMP team and tool • Funder guidelines • Data policy

• Acquisition / ingest

• Tools (iRods/MyTardis) • Support

• Training

• Compliance monitoring

• eRA integration

• Institutional repositories

• Collaboration spaces

Emerging Researcher Series #1 9 Feb 2017

Data storage Lee Lab:

• Convenient

• User / IT • Easily accessible • Shareable

• Applicable

• Fast HPC • Archival • Open

• Secure • Backups • Private

• Scalable

• Tiered storage

• Affordable • Graduated costs

Emerging Researcher Series #1 9 Feb 2017

Data movement Lee Lab:

• Intuitive

• Non-sysadmins

• Scale appropriate

• Convenient • One-sided?

• Optimized transfers

• DMZs • Scheduled

• Performance monitoring

• Sustainable service

• Impact on network • Costs

Wikipedia

Emerging Researcher Series #1 9 Feb 2017

Data analysis • Suitable

• Scale (cores & memory) • Flexible (efficiency) • Fit for purpose (service, HPC, big

data)

• Supported • Admin • Porting • Teaching

• Permit learning

• Free allocations • Suitable rights

• Integrated

• Storage • Sharing services

• Governed • Flexible • Transparent • Accommodation for

collaborations and groups

• Shareable • Common workflows • Group permissions

Emerging Researcher Series #1 9 Feb 2017

Enabling Open Data

Lee Lab:

Emerging Researcher Series #1 9 Feb 2017

• Assistance with research data management (RDM):

• RDM policy • Funder guidelines • DMPonline • Guidelines for depositing data • Guidelines for sharing data

• Implementation of preservation infrastructure

• Preservation of research data via UCT Libraries preservation infrastructure, Archivematica

• Storage of research data via storage facilities at ICTS • Dissemination, access and reuse via UCT online repository

• Institutional repository

Training and education Lee Lab:

• Data science

• HPC • Data analytics courses • Data carpentry • Digital humanities

• Data management

• Library carpentry

• Scientific software development • Software carpentry

• Sysadmins

• Research IT

• Storage, network, compute, cloud

Emerging Researcher Series #1 9 Feb 2017

Data visualisation

• Interrogation • Scale &

resolution • Immersion & 3D

• Collaboration

• Outreach

VR

Visualisation wall

Digital Dome

Emerging Researcher Series #1 9 Feb 2017

Summary Why eResearch?

To accelerate outputs and competitiveness in support of UCT’s research agenda

How do I get hold of eResearch? eresearch@uct.ac.za or www.eresearch.uct.ac.za

What do eResearch services cost? Our cost model is available on the website at: http://www.eresearch.uct.ac.za/billing-model

Can staff and students both make use of eResearch services? Absolutely, if you are a researcher you can work with eResearch

Do you work with individual researchers or only communities? We prefer to work with communities of researchers because in this way our efforts have the greatest impact for the least cost

Do you work with Humanities and Social Sciences, or only with Sciences? We are happy to assist any researcher

Emerging Researcher Series #1 9 Feb 2017

Questions ?

Emerging Researcher Series #1 9 Feb 2017

https://tinyurl.com/ztoug6s www.eresearch.uct.ac.za

Recommended