28
Managing Data using Globus Rachana Ananthakrishnan [email protected]

Managing Data using Globus

Embed Size (px)

Citation preview

Page 1: Managing Data using Globus

Managing Data using Globus

Rachana Ananthakrishnan [email protected]

Page 2: Managing Data using Globus

“I need a good place to store / backup / archive my (big) research data, at a reasonable price.”

Public Cloud Archive Mass Store Campus Store

Page 3: Managing Data using Globus

“I need to easily, quickly, & reliably move or mirror portions of my data to other places.”

Research  Compu.ng  HPC  Cluster  

Lab  Server  

Campus  Home  Filesystem  

Desktop  Worksta.on  

Personal  Laptop  

XSEDE  Resource  Public  Cloud  

Page 4: Managing Data using Globus

“I need to easily and securely share my data with my colleagues at other institutions.”

Page 5: Managing Data using Globus

“I need to get data from a scientific instrument to my analysis server.”

Next Gen Sequencer

Light Sheet Microscope

MRI Advanced Light Source

Page 6: Managing Data using Globus

Challenge: Manage research data as easily as…

…our  pictures  

…home  entertainment  …our  e-­‐mail  

Page 7: Managing Data using Globus

What is Globus?

Big data transfer, and sharing… … delivered via SaaS …

… that is simple, secure, and fast… … directly from your own storage

systems

Page 8: Managing Data using Globus

Reliable, secure, high-performance file transfer

•  “Fire-and-forget” transfers

•  Automatic fault recovery

•  Seamless security integration

•  Powerful GUI and APIs

Data Source

Data Destination

User initiates transfer request

1

Globus moves and syncs files

2

Globus notifies user

3

Page 9: Managing Data using Globus

Simple, secure sharing off existing storage systems

Data Source

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus and

accesses shared file

3

•  Easily share large data with any user or group

•  No cloud storage required

Page 10: Managing Data using Globus

Globus is SaaS

•  Web, command line, and REST interfaces •  Reduced IT operational costs •  New features automatically available •  Consolidated support & troubleshooting •  Easy to add your laptop, server, cluster,

supercomputer, etc. with Globus Connect

Page 11: Managing Data using Globus

8,000 active endpoints

(in the past year)

Page 12: Managing Data using Globus

Demonstration

Page 13: Managing Data using Globus

Globus Connect Server

•  Create endpoint in minutes; no complex GridFTP install •  Enable all users with local accounts to transfer files •  Native packages: RPMs and DEBs •  Also available as part of the Globus Toolkit

Local Storage System (HPC cluster, campus server, …)

Globus Connect Server

MyProxy Online CA

GridFTP Server

Local system users

Page 14: Managing Data using Globus

Globus Platform-as-a-Service

Identity, Group, Profile Management Services

Sharing Service

Transfer Service

Globus Toolkit

Glo

bus

API

s

Glo

bus

Con

nect

Page 15: Managing Data using Globus

globus genomics

Flexible, scalable, affordable

genomics analysis for all biologists

Page 16: Managing Data using Globus

+ Data management

PaaS

Next-gen sequence analysis SaaS

+ Scalable IaaS

Page 17: Managing Data using Globus

Globus is moving beyond transfer and sharing to data publication and

discovery

Page 18: Managing Data using Globus

Globus Data Publication (coming soon)

•  SaaS for publishing large research data •  Bring your own storage •  Extensible metadata •  Publication and curation workflows •  Public and restricted collections •  Rich discovery model

Page 19: Managing Data using Globus

Identified Described Curated

Verifiable Accessible Preserved

Enables data to be easily…

Page 20: Managing Data using Globus

Search Browse Access

…across collections, endpoints

…and facilitates rich discovery

Page 21: Managing Data using Globus

Metadata Access Control

License Storage

Curation Workflow

Policies Collection

Globus’ view of data publishing

Metadata

Data Metadata

Data

Metadata

Data

Dataset Dataset

Dataset

Community

Page 22: Managing Data using Globus

Argonne Storage System

Univ. of Chicago Argonne IIT UIUC

Exemplar Use Case

3. Assemble Dataset (Transfer Data)

Argonne Curator

2. Describe Submission

Scientist

Shared Endpoint

4. Curate Dataset

1. Publish Data 6. Download

5. Search

Page 23: Managing Data using Globus

To Learn More…

•  Transfer and share: www.globus.org •  Email: [email protected] •  Globus Genomics:

www.globus.org/genomics/ •  Globus Publication:

www.globus.org/data-publication

Page 24: Managing Data using Globus

1.  Go to: globus.org/signup 2.  Create your Globus account 3.  Validate e-mail address 4.  Optional: Login with your

campus/InCommon identity

Exercise 1: Account Signup

Page 25: Managing Data using Globus

1.  Choose ALCF endpoint: alcf#dtn and authenticate with your credentials.

2.  Move file(s) your ALCF account from ) from esnet#anl-diskpt1

3.  Move file(s) from ALCF account to go#ep1

Exercise 2: Transfer to/from ALCF

Page 26: Managing Data using Globus

1.  Install Globus Connect Personal 2.  Move file(s) from esnet#anl-diskpt1 to your

laptop 3.  Sign up for a free Globus Plus trial 4.  Create a shared endpoint on your laptop 5.  Grant your neighbor permissions on your shared

endpoint 6.  Access your neighbor’s shared endpoint 7.  Optional: Create group, and grant share access

Exercise 3: Transfer, Sharing, Group Management

Page 27: Managing Data using Globus

1.  Configure SSH public key in Globus profile

2.  Log into the Globus CLI: ssh cli.globusonline.org

3.  Transfer files from alcf#dtn to your laptop using scp command

4.  Check status of your transfer using status command

Exercise 4: Using the CLI

Page 28: Managing Data using Globus

Thank you to our sponsors!

U . S . D E PA RT M E N T O F

ENERGY