Upload
nancy-wilkins-diehr
View
166
Download
0
Embed Size (px)
Citation preview
science gateway /sī′ əәns gāt′ wā′/ n. 1. an online community space for science and engineering research and education.
2. a Web-based resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline.
Science Gateways: History, Successes, Path Forward
A tale of many slide templates J
Thank you to Michelle Barker, Richard Sinnott andDavid Abramson for the invitation to speak
Remember 2004?
• Microsoft, AOL and Jeeves ruled the Web• Facebook launched–More users today than were on the entire internet in 2004
• Google 5th most popular brand behind AOL and Yahoo in popularity• Time magazine recommends friendster as website of the year• I first start working with science gateways
Beginnings of the TeraGrid program
• TeraGrid develops Deep, Wide and Open strategy• For the first time we are targeting not just the high-end HPC user community
Despite the technological progress of grid technology and deployment, only a minority of the scientific, engineering, and education community use today’s national computing infrastructure. Our WIDE strategy addresses this situation by working directly with specific community leaders who are building discipline-specific cyberinfrastructure capabilities and resources for their communities.
TeraGrid proposal, 2003
April 2006
Science GatewaysA new initiative for the TeraGrid
• Increasing investment by communities in their own cyberinfrastructure, but heterogeneous:
• Resources• Users – from expert to K-12• Software stacks, policies
• Science Gateways– Provide “TeraGrid Inside” capabilities
– Leverage community investment• Three common forms:– Web-based Portals – Application programs running on users' machines but accessing services in TeraGrid
– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.
Workflow Composer
April 2006
But in the beginning, we had no servicesWe paid science teams to help us develop them
Science Gateway Prototype Discipline Science Partner(s) TeraGrid Liaison
Linked Environments for Atmospheric Discovery (LEAD)
Atmospheric Droegemeier (OU) Gannon (IU), Pennington (NCSA)
National Virtual Observatory (NVO)
Astronomy Szalay (Johns Hopkins) Williams (Caltech)
Network for Computational Nanotechnology (NCN) and “nanoHUB”
Nanotechnology Lundstrum (PU) Goasguen (PU)
Open Life Sciences Gateway Biomedicine and Biology Schneewind (UC), Osterman (Burnham/UCSD), DeLong (MIT), Dusko (INRA)
Stevens (UC/Argonne)
Biology and Biomedical Science Gateway
Biomedicine and Biology Cunningham (Duke), Magnuson (UNC)
Reed (UNC), Blatecky (UNC)
Neutron Science Instrument Gateway
Physics Cobb (ORNL) Cobb (ORNL)
Grid Analysis Environment High-Energy Physics Newman (Caltech) Bunn (Caltech)
Transportation System Decision Support
Homeland Security Stephen Eubanks (LANL) Beckman (Argonne)
Groundwater/Flood Modeling Environmental Wells (UT-Austin), Engel (ORNL) Boisseau (TACC)
Science Grid [GrPhyN/ivDGL/Grid3]
Multiple Pordes (FNAL), Huth (Harvard), Avery (Uflorida)
Foster (UC/Argonne), Kesselman (USC-ISI), Livny (UW)
So how will we meet all these needs?
• With RATS! (Requirements Analysis Teams)
• Collection, analysis and consolidation of requirements to jump start the work– Interviews with 10 Gateways– Common user models, accounting needs, scheduling needs
• Summarized requirements for each TeraGrid working group– Accounting, Security, Web Services, Software
• Areas for more study identified• Primer outline for new Gateways in progress
• And milestonesApril 2006
April 2006
Linked Environments for Atmospheric DiscoveryLEAD
•Providing tools that are needed to make accurate predictions of tornados and hurricanes•Data exploration and Grid workflow
Social Informatics Data GridCollaborative access to large, complex datasets
•SIDGrid is unique among social science data archive projects–Streaming data which change over time•Voice, video, images (e.g. fMRI), text, numerical (e.g. heart rate, eye movement)
–Investigate multiple datasets, collected at different time scales, simultaneously•Large data requirements•Sophisticated analysis tools
Viewing multimodal data like a symphony conductor
•“Music-score” display and synchronized playback of video and audio files– Pitch tracks– Text– Head nods, pause, gesture references
•Central archive of multi-modal data, annotations, and analyses–Distributed annotation efforts by multiple researchers working on a common data set•History of updates
•Computational tools–Distributed acoustic analysis using Praat– Statistical analysis using R–Matrix computations using Matlab and Octave
Source: Studying Discourse and Dialog with SIDGrid, Levow, 2008
Over the years, the program developedI gave lots and lots of talks
Eventually we had a program• And customers• Starting in 2013, gateway users surpass command line users in XSEDE
Gateways
Login
Proliferation of Science GatewaysThese are some that use XSEDE supercomputers
Cyberinfrastructure for Phylogenetic Research (CIPRES)PI Mark Miller, SDSC, www.phylo.org
• 210 US research universities– Harvard, Yale, UC Berkeley, Stanford, etc.– Non-‐PhD granting colleges (including one all-‐
women’s college, community colleges, and Hispanic-‐serving institutions)
• 3 K-‐12 school systems• 43 non-‐governmental organizations,
– Museums including the Smithsonian Institution, the American Museum of Natural History, and the Field Museum),
– Botanical gardens, (e.g. Chicago, Rancho Santa Ana, and New York)
– Institutes (e.g. JCVI and Broad)• 10 US governmental agencies
– Including NIH, USDA, NOAA, US Forest Service• Curriculum delivery (76)• 2000+ publications since 2010• 47% of all XSEDE users in Q4 2015
CIPRES’ reach is deep and wideNature article, Feb 2016Mass. state science fair, July 2012
Saving wetlands with the Simulocean science gatewayFootball field-sized parcel of land lost every hour
It's important to enhance the collaboration among earth scientists, computer scientists, cyberinfrastructure specialists and coastal engineers tasked with solving the sustainability issues of deltaic coasts like those in Louisiana.Dr. Jian Tao, research scientist, LSU
Source: XSEDE External Relations
Some NSF programs even specify the use of gatewaysThis is the right direction to go! Gateways as cost-effective infrastructure
• Developers typically– work in isolation
– must bridge to variety of resources
– need building blocks in order to focus on higher-‐level functionality
– struggle to secure sustainable funding
Despite many successes, there are still challengesGateways often funded as 3-year research projects
Early adopters
Publicity
Wider adoption
Funding ends
Scientists disillusioned
New project
prototype
In 2014, we sent a survey to 29,000 NSF PIs and academic CIOs and CTOs
5000 responded
We wanted to understand both the importance of gateways and challenges developers face
Specialized Resources PercentData collections 75%Data analysis tools, including visualization and mining 72%Computational tools 72%Tools for rapidly publishing and/or finding articles and data specific to my domain 69%
Educational tools 67%
Platforms for fostering group or community collaboration 63%
Simplified interfaces that eliminate the need to learn coding 62%
Citizen science and other public engagement resources 47%Workflows that automate or capture tasks or processes 42%
Scientific instruments, such as telescopes, microscopes, or sensors 39%
We learned that 88% of respondents felt Web-based applications were important to their work
n=4,004, or 88% of 4,538 researcher/educators. Percentage indicates these resources are “somewhat” or “very” important to their work.
57% played some role in gateway creationand these gateways were used for a variety of purposes
n of application types=7,805, by 2,756 creators (out of 2,819); mean=2.8 application types per application creator
34% 36%
20%17%
31%26%
42%
16%
30%
18%
45% 44%
14% 15%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
UsabilityConsultant
GraphicDesigner
CommunityLiaison/Evangelist
ProjectManager
ProfessionalSoftwareDeveloper
SecurityExpert
QualityAssuranceand Testing
Expert
Wished we had thisYes, we had this
A variety of expertise was needed for successful gateway development
n=2,756 respondents or 98% of application creators
NSF has recognized the importance of gateways as well
We’ve come a long way since 2004, baby
A successful gateway institute will provide leadership to– 1) bring science gateway developers together with each other and with the developers and operators of existing and potential cyberinfrastructure elements that science gateways integrate and enable the use of• in order to promote the efficient, effective, and sustainable development of scientific web and mobile interfaces
– 2) educate developers and the next generation of investigators to effectively use the gateway software ecosystem to solve real research problems; and
– 3) educate the next generation of researchers to enable them to create the software cyberinfrastructure required to both advance fundamental understanding of science gateways and enable researchers to address the grand challenge problems of the future
Science Gateways Community InstituteEst. Aug, 2016
• Incubator– Modeled after business
incubators– Diverse expertise on
demand• Extended Developer
Support– We help others build
gateways and teach them in the process
• Scientific Software Collaborative– Listing of both functional
gateways and gateway software
• Community Engagement and Exchange– Annual conference– Gateways in the news– Job postings– International and inter-‐
agency community building
– Campus expertise
• Workforce Development– Student interns– Gateways in the classroom
SGCI Highlights
• New developments in electron detectors and electron microscopes now provide images of macromolecules (protein, RNA, etc.) that can be determined to atomic resolution
• Inherent low signal to noise ratio means 300,000 images needed for one object of interest
• HPC resources to calculate atomic structures based upon these thousands of images
• Now discovering the structures of macromolecules that were previously unattainable using traditional methods
• The importance of these discoveries has brought global interest into our field from scientists without HPC training
Our first customerDr. Michael Cianfrocco, Cryo-EM gateway
Source: Michael Cianfrocco
The cryo-‐EM science gateway will offer users access to HPC resources, requiring only that they have raw cryo-‐EM data.
This will have a wide-‐ranging impact as national cryo-‐EM centers are coming online in the coming years, requiring that users have a location to process their data.
We are building a gateway that can handle all cryo-‐EM data, whereas, currently, every user has to navigate the complex work of HPC data analysis to either install software on local clusters or get access to national centers for data analysis.
Long term, we will incorporate workflows that will guide new users through the processing pipeline, helping them assess data quality along the way.
This pipeline will be the first of its kind.
Source: Michael Cianfrocco
The Institute allows us to expand our focus beyond HPC
• Sensor-‐based gateways• Interfaces to instruments– Telescopes, microscopes, ultracentrifuges, more
• Gateways that access data collections• Citizen science• Gateways that use clouds or campus resources
There is a whole wide world of gateways out there
Sage Bionetworks developing predictors of disease
• Synapse is an open computational platform used by Challenge teams spread across the globe to crowdsource questions in biology and medicine
http://sagebase.org/challenges/
The examples are endless
How do you find a gateway?We plan to design a marketplaceOne that would interact with other marketplaces
Vision for SGCI success5-10 years from now
• Science gateways form a vibrant community– Inter-‐agency, international, collegial
• Creating gateways is easier– Created with more thoughtfulness, so they are more sustainable
• Gateway developers have stable career paths– More efficient environments on campuses
• Students are excited to stay in the sciences• Radical changes in how research is conducted
Beyond the institute• The effects of the democratization on science• Gateways’ role in reproducibility
Benefits of democratizationNew areas of study
• Breakthroughs don’t always come from assembling the best and the brightest in a closed room
• Lowering barriers to resources encourages experimentation– 2010 study from MIT and UCSD compared research from the National Institutes for Health (NIH) and the non-‐profit, Howard Hughes Medical Institute (HHMI)• Riskier HHMI grants produced more innovative and influential research
Gateways’ role in reproducibilityExploring collaboration between SGCI and Whole Tale
• How can we design gateways in support of reproducible science? With ties to publishing?
Continually changing technologies• Jupyter notebooks• Gateways interfacing to other gateways
Jupyter notebooksWonderful examples of interactive gateway development
https://anaconda.org/jbednar/nyc_taxi/notebook
• Additional work needed to support very large user communities? Very large data?
Gateways interfacing with other gateways
Science Gateway Platform as a Service (SciGaP) provides application programmer interfaces (APIs) to hosted generic infrastructure services that can be used by domain science communities to create Science Gateways.
RDF: a directed, labeled graph data format for representing information in the WebSPARQL: query language for RDF across diverse data sources
• US workshops– Gateway Computing
Environments workshops since 2005
• European workshops– International Workshop on
Science Gateways since 2009
• Australian workshops– IWSG-‐A since 2015
• Joint special issue journals combine submissions from all of the above
Final note: International collaborations
• Provide leadership on future directions for science gateways
• Facilitate awareness and international, regional and national developments in science gateways
• Identify and share best practice in the field
• Science Gateways Community Institute (USA)
• NeCTAR (Australia)• NESI (New Zealand)• Sci-‐GaIA (Africa)• Academia Sinica Grid Computing
Center (Taiwan)• Software Sustainability
Institute (UK)• VRE4E1C (Europe)• IWSG (Europe)• CANARIE (Canada)• Research Data Canada (Canada)• IEEE Technical Area on Science
Gateways (International)
International Coalition on Science GatewaysMichelle Barker, Nectar providing leadership
http://www.icsciencegateways.org/
Thank you• I’m looking forward to a great program today