1
T h e d a s h b o a r d w e b a p p l i c a t i o n l e t s u s e r s m a n a g e t h e i r a p p l i c a t i o n c o n t a i n e r s : c r e a t e , d e l e t e , s t a r t , a n d s t o p t h e m a s n e c e s s a r y . C r e a t e s e v e r a l i n d e p e n d e n t c o n t a i n e r s f o r v a r i o u s p u r p o s e s , f o r e x a m p l e , u s i n g d i f f e r e n t b a s e D o c k e r i m a g e s w h i c h p r o v i d e d i f f e r e n t s e t s o f p r e - i n s t a l l e d l i b r a r i e s a n d l a n g u a g e t o o l s . I n t e r a c t i v e s e r v e r - s i d e c o m p u t a t i o n a l c a p a b i l i t i e s t h r o u g h J u p y t e r n o t e b o o k s w i t h P y t h o n , R , a n d M A T L A B . W e p r o v i d e P y t h o n a n d R l i b r a r i e s f o r c o m m u n i c a t i o n w i t h o t h e r S c i S e r v e r c o m p o n e n t s f r o m t h e J u p y t e r n o t e b o o k s r u n n i n g w i t h i n S c i S e r v e r C o m p u t e f r a m e w o r k . SciServer Compute User Interface SciServer/Compute Architecture · H o m o g e n e o u s V M n o d e s s e r v e a s b u i l d i n g b l o c k s o f a s c a l a b l e s y s t e m . · D o c k e r c o n t a i n e r s a r e u s e d f o r p r o c e s s a n d r e s o u r c e i s o l a t i o n . · A s i n g l e i d e n t i t y s e r v i c e b a s e d o n O p e n S t a c k K e y s t o n e p r o v i d e s s e a m l e s s i n t e g r a t i o n w i t h o t h e r S c i S e r v e r c o m p o n e n t s . · D i r e c t a c c e s s t o l a r g e d a t a a r c h i v e s a n d a d d i t i o n a l l i b r a r i e s a n d t o o l s t h r o u g h s h a r e d , r e a d - o n l y d a t a v o l u m e s . · P e r s o n a l p e r s i s t e n t a n d s c r a t c h u s e r s p a c e f o r l o n g - a n d s h o r t - t e r m s t o r a g e o f i n p u t d a t a a n d r e s u l t s . App Container App Container App Container Docker VM Node Compute Registry DB SSO portal <username> User containers Data volume container Shared containers Keystone Data archives Scratch storage User folders <username> Images Persistent storage User folders <username> Scratch storage volume container SciDrive CasJobs SkyServer SciServer WebApps Persistent storage volume container SciServer Compute S c i S e r v e r C o m p u t e u s e s J u p y t e r n o t e b o o k s r u n n i n g w i t h i n s e r v e r - s i d e D o c k e r c o n t a i n e r s a t t a c h e d t o l a r g e r e l a t i o n a l d a t a b a s e s a n d f i l e s t o r a g e t o b r i n g a d v a n c e d a n a l y s i s c a p a b i l i t i e s c l o s e t o t h e d a t a . A p a r t f r o m i n t e r a c t i v e n o t e b o o k s i n P y t h o n , R a n d M A T L A B , S c i S e r v e r C o m p u t e o f f e r s a n A P I f o r r u n n i n g a s y n c h r o n o u s t a s k s , a l s o i n D o c k e r c o n t a i n e r s . h t t p : / / c o m p u t e . s c i s e r v e r . o r g Running Asynchronous Tasks S i m p l e R E S T A P I f o r r u n n i n g a s y n c h r o n o u s t a s k s i n s i d e t h e s a m e c o n t a i n e r i n f r a s t r u c t u r e t h a t i s u s e d f o r r u n n i n g i n t e r a c t i v e J u p y t e r n o t e b o o k s . P O S T / d a s h b o a r d / a p i / c o n t a i n e r H T T P / 1 . 1 H o s t : c o m p u t e . s c i s e r v e r . o r g X - A u t h - T o k e n : 4 1 6 a e e 7 e a b 4 7 4 . . . C o n t e n t - T y p e : a p p l i c a t i o n / j s o n { " I m a g e N a m e " : " a s t r o " , " L a b e l " : " t e s t " , " C m d " : [ " p y t h o n ~ / w o r k s p a c e / p e r s i s t e n t / t e s t . p y " ] , " V o l u m e s " : [ " s d s s _ d a s " ] } SciServer S c i S e r v e r i s a b i g d a t a i n f r a s t r u c t u r e p r o j e c t t o d e v e l o p a c o m m o n e n v i r o n m e n t f o r s h a r a b l e c o m p u t a t i o n a l r e s e a r c h . S c i S e r v e r i s a f u l l y i n t e g r a t e d c y b e r i n f r a s t r u c t u r e s y s t e m e n c o m p a s s i n g r e l a t e d t o o l s a n d s e r v i c e s . S c i S e r v e r e n a b l e s a n e w a p p r o a c h t h a t w i l l a l l o w r e s e a r c h e r s t o w o r k w i t h T e r a b y t e s o r P e t a b y t e s o f s c i e n t i f i c d a t a , w i t h o u t n e e d i n g t o d o w n l o a d a n y l a r g e d a t a s e t s . S o m e f e a t u r e s w e p r o v i d e a r e · d a t a s t o r a g e f o r s c i e n t i f i c u s e r s , t o o l s f o r s e a r c h i n g b i g d a t a s e t s , a n d s p a c e f o r u s e r s t o s t o r e a n d a n a l y z e t h e i r r e s u l t s . · t h e a b i l i t y t o a n a l y z e d a t a o n o u r s e r v e r s , k e e p i n g t h e c o m p u t a t i o n c l o s e t o t h e d a t a t o m i n i m i z e d a t a m o v e m e n t . · a c c e s s t o q u e r y , a n a l y s i s , a n d s t o r a g e r e s o u r c e s t o r e s e a r c h e r s a n d e d u c a t o r s a n d s u p p o r t f o r t h e l o n g t a i l o f s c i e n c e . h t t p : / / w w w . s c i s e r v e r . o r g For quesons email us at [email protected] Funded by the U.S. Naonal Science Foundaon Award ACI–1261715 SciServer Compute Bringing Analysis Close to the Data Jai Won Kim, Gerard Lemson Instute for Data Intensive Engineering and Science (IDIES), Johns Hopkins University SciDrive Compute Login Portal SkyServer CasJobs We offer access to the complete SDSS dataset to the world through SkyServer. You can store any data you like in a private database in CasJobs, or as files in SciDrive. Data You can find SDSS data with SQL queries through SkyServer or CasJobs. You can run queries through scripts using Compute. Query You can share your data and results with colleagues, and you can write files in any format to SciDrive. Collaboration You can analyze the largest datasets with Python or R scripts through Compute. You can share scripts and results while Logged In. Analysis SkyQuery You can cross-match your own data with SDSS data using our new SkyQuery tool. Cross-Matching

SciServer Compute · · Docker containers are used for process and resource isolation. · A single id entity service based on O penStack K eystone provid es seamless integration w

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SciServer Compute · · Docker containers are used for process and resource isolation. · A single id entity service based on O penStack K eystone provid es seamless integration w

The dashboard web application lets users manage their application containers: create, delete, start, and stop them as necessary.

Create several independent containers for various purposes, for example, using different base Docker images which provide different sets of pre-installed libraries and language tools.

Interactive server-side computational capabilities through Jupyter notebooks with Python, R, and MATLAB.

We provide Python and R libraries for communication with other SciServer components from the Jupyter notebooks running within SciServer Compute framework.

SciServer Compute User Interface

SciServer/Compute Architecture

· Homogeneous VM nodes serve as building blocks of a scalable system.

· Docker containers are used for process and resource isolation.

· A single identity service based on OpenStack Keystone provides seamless integration with other SciServer components.

· Direct access to large data archives and additional libraries and tools through shared, read-only data volumes.

· Personal persistent and scratch user space for long- and short-term storage of input data and results.

App Container

App Container

App Container

Docker

VM N

ode

Compute

Registry DB

SSO portal<username>

User containers

Data volume container

Shared containers

Keystone

Data archives

Scratch storage

User folders<username>

Images

Persistent storage

User folders<username>

Scratch storagevolume container

SciDrive

CasJobs

SkyServer

SciServer WebApps

Persistent storagevolume container

SciServer Compute

SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. Apart from interactive notebooks in Python, R and MATLAB, SciServer Compute offers an API for running asynchronous tasks, also in Docker containers.

http://compute.sciserver.org

Running Asynchronous Tasks

Simple REST API for running asynchronous tasks inside the same container infrastructure that is used for running interactive Jupyter notebooks.

POST /dashboard/api/container HTTP/1.1Host: compute.sciserver.orgX-Auth-Token: 416aee7eab474...Content-Type: application/json

{"ImageName":"astro","Label":"test","Cmd":["python ~/workspace/persistent/test.py"],"Volumes":["sdss_das"]}

SciServerSciServer is a big data infrastructure project to develop a common environment for sharable computational research. SciServer is a fully integrated cyberinfrastructure system encompassing related tools and services. SciServer enables a new approach that will allow researchers to work with Terabytes or Petabytes of scientific data, without needing to download any large datasets. Some features we provide are· data storage for scientific users, tools for searching big datasets, and space for users to store and analyze their results.

· the ability to analyze data on our servers, keeping the computation close to the data to minimize data movement.

· access to query, analysis, and storage resources to researchers and educators and support for the longtail of science.

http://www.sciserver.org

For questions email us at [email protected] Funded by the U.S. National Science Foundation Award ACI–1261715

SciServer ComputeBringing Analysis Close to the Data

Jai Won Kim, Gerard Lemson Institute for Data Intensive Engineering and Science (IDIES), Johns Hopkins University

SciDrive

Compute

Login Portal

SkyServer

CasJobs

We o�er access to the complete SDSS dataset to the world through SkyServer. You can store any data you l ike in a private database in CasJobs, or as �les in SciDrive.

DataYou can �nd SDSS data with SQL queries through SkyServer or CasJobs.You can run queries through scripts using Compute.

Query

You can share your data and results with col leagues, and you can write �les in any format toSciDrive.

Collaboration

You can analyze the largest datasets with Python or R scr ipts through Compute.You can share scripts and results while Logged In.

Analysis

SkyQuery

You can cross-match your own data with SDSS data using our new SkyQuery tool .

Cross-Matching