19
An Introduction to the Open Science Data Cloud Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium October 10, 2013

An Introduction to the Open Science Data Cloud

  • Upload
    brent

  • View
    48

  • Download
    0

Embed Size (px)

DESCRIPTION

An Introduction to the Open Science Data Cloud. October 10, 2013. Heidi Alvarez Florida International University Robert L. Grossman University of Chicago Open Cloud Consortium. 1. Open Science Data Cloud (OSDC). Open Science Data Cloud (OSDC). - PowerPoint PPT Presentation

Citation preview

Page 1: An Introduction to the  Open Science Data Cloud

An Introduction to the Open Science Data Cloud

Heidi AlvarezFlorida International University

Robert L. GrossmanUniversity of Chicago

Open Cloud Consortium

October 10, 2013

Page 2: An Introduction to the  Open Science Data Cloud

1. Open Science Data Cloud (OSDC)

Page 3: An Introduction to the  Open Science Data Cloud

Open Science Data Cloud (OSDC)• OSDC is a Science Cloud Service Provider (CSP)• Operated by not-for-profit Open Cloud Consortium• OSDC is a 6 PB / 12,000 core science cloud• 1 PB science data for the research community• 1 PB of biomedical data for medical research• We have been doubling in size each year• We run production services for NASA and NIH researchers• Interoperate with Amazon Web Services a.k.a. AWS (still

rudimentary)• Hundreds of users (not thousands)• Typical job uses 1000s of core hours over 10-100’s TB

Page 4: An Introduction to the  Open Science Data Cloud
Page 5: An Introduction to the  Open Science Data Cloud
Page 6: An Introduction to the  Open Science Data Cloud

Designed to hold Protected Health Information (PHI) e.g. genomic data, electronic medical records, etc. (HIPAA, FISMA)

• Earth sciences• Biological sciences• Social sciences• Digital humanities• ACL, groups, etc.

Science Cloud Biomedical Cloud

Page 7: An Introduction to the  Open Science Data Cloud

What You Get with the OSDC

• Login with your university credentials via InCommon

• Launch virtual machines, virtual clusters, access to large Hadoop clusters, etc.

• Access PB+ of open and protected data• Manage files, collections of files, collections of

collections• Manage users, groups of users• Manage accounts, sub-accounts• Efficient transfer of large data (UDT, UDR)

Page 8: An Introduction to the  Open Science Data Cloud

8www.opencloudconsortium.org

• U.S based not-for-profit corporation.• Companies: Cisco, Yahoo!, Infoblox, …• Universities: University of Chicago, Northwestern Univ., Johns

Hopkins, Calit2, etc.• Federal agencies and labs: NASA, LLNL, ORNL• International university and government partners • Manages cloud computing infrastructure to support scientific

research: Open Science Data Cloud.• Manages cloud computing testbeds: Open Cloud Testbed.

Page 9: An Introduction to the  Open Science Data Cloud

Our Point of View• We want to develop as little technology and software as

possible – we want others to develop software and technology.

• We focus on providing researchers the ability to compute over large and very large datasets.

• We need open source solutions.• Today it is difficult to interoperate with AWS for our

protected data cloud, but we expect this to change (someday).

• Run lights out over multiple data centers connected with 10G (soon 100G) networks.

Page 10: An Introduction to the  Open Science Data Cloud

2. Challenges

Page 11: An Introduction to the  Open Science Data Cloud

OSDC Data Centers and Networks

• We have three data centers– Chicago with 100G to StarLight– FIU with 10G to StarLight– Livermore Valley Open Campus 10G to StarLight

• We’re planning one more data center with 100G connection to StarLight

• We are looking to interoperate the OSDC with international partners over 10G and 100G networks

Page 12: An Introduction to the  Open Science Data Cloud

Challenges

• We are focusing on the following:– How do we authenticate, authorize and provide access

controls to researchers at our international partners to data and to cloud based services (storage and compute)

– We need open source implementations of these services

– We need trust relationships with our peers• We are running a series of interoperability

workshops to try to get this right.

Page 13: An Introduction to the  Open Science Data Cloud

PARTNERSHIP FOR INTERNATIONAL RESEARCH AND EDUCATION

NSF Award # 1129076

Page 14: An Introduction to the  Open Science Data Cloud

PIRE & OSDC• National Science Foundation Partnership for International

Research and Education 5 year program 2010 – 2014 at $3.5M.

• Prepares students to compete in the global cyberinfrastructure community

• Provides international research and education experiences around the world!

• The student/faculty/scientist research teams help develop large-scale distributed computing capabilities, data and, State-of-the-art services for integrating, analyzing, sharing and archiving scientific data.

Page 15: An Introduction to the  Open Science Data Cloud

International Collaborators• Malcolm Atkinson – School of Informatics, Edinburgh University, Scotland,

UK • Paola Grosso & Cees de Laat – Faculty of Science, Informatics Institute,

University of Amsterdam, The Netherlands • Karen Langona and Tereza Cristina Carvalho - LARC – Laboratory of Computer

Networks and Architecture at the University of Sao Paulo Brazil• Satoshi Sekiguchi – National Institute of Advanced Industrial Science and

Technology (AIST),Japan • Chung-I Wu – Beijing Institute of Genomics (BIG), Chinese Academy of

Sciences

Page 16: An Introduction to the  Open Science Data Cloud

Research Opportunities• What?

• Funded internships for US citizens and residents, which provide the chance to participate in sophisticated international research collaborations.

• When?• Summer of 2014

• How long?• 6 weeks

• Where?• At any of our international partners.

Page 18: An Introduction to the  Open Science Data Cloud

Questions?

Page 19: An Introduction to the  Open Science Data Cloud

Thank You!