43
science gateway /sīns gātwā′/ n. 1. an online community space for science and engineering research and education. 2. a Webbased resource for accessing data, software, computing services, and equipment specific to the needs of a science or engineering discipline. Science Gateways: History, Successes, Path Forward A tale of many slide templates Thank you to Michelle Barker, Richard Sinnott and David Abramson for the invitation to speak

Sgci iwsg-a-10-10-16

Embed Size (px)

Citation preview

Page 1: Sgci iwsg-a-10-10-16

science  gateway    /sī′ əәns gāt′ wā′/  n.  1. an  online  community  space  for  science  and  engineering  research  and  education.

2. a  Web-­based  resource  for  accessing  data,  software,  computing  services,  and  equipment  specific  to  the  needs  of  a  science  or  engineering  discipline.

Science  Gateways:  History,  Successes,  Path  Forward

A  tale  of  many  slide  templates  J

Thank  you  to  Michelle  Barker,  Richard  Sinnott andDavid  Abramson  for  the  invitation  to  speak

Page 2: Sgci iwsg-a-10-10-16

Remember  2004?

• Microsoft,  AOL  and  Jeeves  ruled  the  Web• Facebook  launched–More  users  today  than  were  on  the  entire  internet  in  2004

• Google  5th  most  popular  brand  behind  AOL  and  Yahoo  in  popularity• Time  magazine  recommends  friendster as  website  of  the  year• I  first  start  working  with  science  gateways

Page 3: Sgci iwsg-a-10-10-16

Beginnings  of  the  TeraGrid program

• TeraGrid develops  Deep,  Wide  and  Open  strategy• For  the  first  time  we  are  targeting  not  just  the  high-­end  HPC  user  community

Despite  the  technological  progress  of  grid  technology  and  deployment,  only  a  minority  of  the  scientific,  engineering,  and  education  community  use  today’s  national  computing  infrastructure.  Our  WIDE  strategy  addresses  this  situation  by  working  directly  with  specific  community  leaders  who  are  building  discipline-­specific  cyberinfrastructure  capabilities and  resources  for  their  communities.

TeraGrid proposal,  2003

Page 4: Sgci iwsg-a-10-10-16

April  2006

Science  GatewaysA  new  initiative  for  the  TeraGrid

• Increasing  investment  by  communities  in  their  own  cyberinfrastructure,  but  heterogeneous:

• Resources• Users  – from  expert  to  K-­12• Software  stacks,  policies

• Science  Gateways– Provide  “TeraGrid  Inside”  capabilities

– Leverage  community  investment• Three  common  forms:– Web-­based  Portals  – Application  programs  running  on  users'  machines  but  accessing  services  in  TeraGrid

– Coordinated  access  points  enabling  users  to  move  seamlessly  between  TeraGrid  and  other  grids.

Workflow  Composer

Page 5: Sgci iwsg-a-10-10-16

April  2006

But    in  the  beginning,  we  had  no  servicesWe  paid  science  teams  to  help  us  develop  them

Science Gateway Prototype Discipline Science Partner(s) TeraGrid Liaison

Linked  Environments  for  Atmospheric  Discovery  (LEAD)

Atmospheric Droegemeier  (OU) Gannon  (IU),  Pennington  (NCSA)

National  Virtual  Observatory  (NVO)

Astronomy Szalay  (Johns  Hopkins) Williams  (Caltech)

Network  for  Computational  Nanotechnology  (NCN)  and  “nanoHUB”

Nanotechnology Lundstrum  (PU) Goasguen  (PU)

Open  Life  Sciences  Gateway Biomedicine  and  Biology Schneewind  (UC),  Osterman  (Burnham/UCSD),  DeLong  (MIT),  Dusko  (INRA)

Stevens  (UC/Argonne)

Biology  and  Biomedical  Science  Gateway

Biomedicine  and  Biology Cunningham  (Duke),  Magnuson  (UNC)

Reed  (UNC),  Blatecky  (UNC)

Neutron  Science  Instrument  Gateway

Physics Cobb  (ORNL) Cobb  (ORNL)

Grid  Analysis  Environment High-­Energy  Physics Newman  (Caltech) Bunn  (Caltech)

Transportation  System  Decision  Support

Homeland  Security Stephen  Eubanks  (LANL) Beckman  (Argonne)

Groundwater/Flood  Modeling Environmental Wells  (UT-­Austin),  Engel  (ORNL) Boisseau  (TACC)

Science Grid [GrPhyN/ivDGL/Grid3]

Multiple Pordes (FNAL), Huth (Harvard), Avery (Uflorida)

Foster (UC/Argonne), Kesselman (USC-ISI), Livny (UW)

Page 6: Sgci iwsg-a-10-10-16

So  how  will  we  meet  all  these  needs?

• With  RATS!  (Requirements  Analysis  Teams)

• Collection,  analysis  and  consolidation  of  requirements  to  jump  start  the  work– Interviews  with  10  Gateways– Common  user  models,  accounting  needs,  scheduling  needs

• Summarized  requirements  for  each  TeraGrid  working  group– Accounting,  Security,  Web  Services,  Software

• Areas  for  more  study  identified• Primer  outline  for  new  Gateways  in  progress

• And  milestonesApril  2006

Page 7: Sgci iwsg-a-10-10-16

April  2006

Linked  Environments  for  Atmospheric  DiscoveryLEAD

•Providing  tools  that  are  needed  to  make  accurate  predictions  of  tornados  and  hurricanes•Data  exploration  and  Grid  workflow  

Page 8: Sgci iwsg-a-10-10-16

Social Informatics Data GridCollaborative access to large, complex datasets

•SIDGrid  is  unique  among  social  science  data  archive  projects–Streaming  data  which  change  over  time•Voice,  video,  images  (e.g.  fMRI),  text,  numerical  (e.g.  heart  rate,  eye  movement)

–Investigate  multiple  datasets,  collected  at  different  time  scales,  simultaneously•Large  data  requirements•Sophisticated  analysis  tools

Page 9: Sgci iwsg-a-10-10-16

Viewing multimodal data like a symphony conductor

•“Music-­score”  display  and  synchronized  playback  of  video  and  audio  files– Pitch  tracks– Text– Head  nods,  pause,  gesture  references

•Central  archive  of  multi-­modal  data,  annotations,  and  analyses–Distributed  annotation  efforts  by  multiple  researchers  working  on  a  common  data  set•History  of  updates

•Computational  tools–Distributed  acoustic  analysis  using  Praat– Statistical  analysis  using  R–Matrix  computations  using  Matlab  and  Octave

Source: Studying Discourse and Dialog with SIDGrid, Levow, 2008

Page 10: Sgci iwsg-a-10-10-16

Over  the  years,  the  program  developedI  gave  lots  and  lots  of  talks

Page 11: Sgci iwsg-a-10-10-16

Eventually  we  had  a  program• And  customers• Starting  in  2013,  gateway  users  surpass  command  line  users  in  XSEDE

Gateways

Login

Page 12: Sgci iwsg-a-10-10-16

Proliferation  of  Science  GatewaysThese  are  some  that  use  XSEDE  supercomputers

Page 13: Sgci iwsg-a-10-10-16

Cyberinfrastructure  for  Phylogenetic  Research  (CIPRES)PI  Mark  Miller,  SDSC,  www.phylo.org

• 210  US  research  universities– Harvard,  Yale,  UC  Berkeley,  Stanford,  etc.– Non-­‐PhD  granting  colleges  (including  one  all-­‐

women’s  college,  community  colleges,  and  Hispanic-­‐serving  institutions)

• 3 K-­‐12  school  systems• 43  non-­‐governmental  organizations,

– Museums  including  the  Smithsonian  Institution,  the  American  Museum  of  Natural  History,  and  the  Field  Museum),  

– Botanical  gardens,  (e.g.  Chicago,  Rancho  Santa  Ana,  and  New  York)

– Institutes  (e.g.  JCVI  and  Broad)• 10  US  governmental  agencies

– Including  NIH,  USDA,  NOAA,  US  Forest  Service• Curriculum  delivery  (76)• 2000+  publications  since  2010• 47%  of  all  XSEDE  users  in  Q4  2015

Page 14: Sgci iwsg-a-10-10-16

CIPRES’  reach  is  deep  and  wideNature  article,  Feb  2016Mass.  state  science  fair,  July  2012

Page 15: Sgci iwsg-a-10-10-16

Saving  wetlands  with  the  Simulocean science  gatewayFootball  field-­sized  parcel  of  land  lost  every  hour

It's  important  to  enhance  the  collaboration  among  earth  scientists,  computer  scientists,  cyberinfrastructure  specialists  and  coastal  engineers  tasked  with  solving  the  sustainability  issues  of  deltaic  coasts  like  those  in  Louisiana.Dr.  Jian  Tao,  research  scientist,  LSU

Source:  XSEDE  External  Relations

Page 16: Sgci iwsg-a-10-10-16

Some  NSF  programs  even  specify  the  use  of  gatewaysThis  is  the  right  direction  to  go!  Gateways  as  cost-­effective  infrastructure

Page 17: Sgci iwsg-a-10-10-16

• Developers  typically– work  in  isolation

– must  bridge  to  variety  of  resources

– need  building  blocks  in  order  to  focus  on  higher-­‐level  functionality

– struggle  to  secure  sustainable  funding

Despite  many  successes,  there  are  still  challengesGateways  often  funded  as  3-­year  research  projects

Early  adopters

Publicity

Wider  adoption

Funding  ends

Scientists  disillusioned

New  project  

prototype

Page 18: Sgci iwsg-a-10-10-16

In  2014,  we  sent  a  survey  to  29,000  NSF  PIs  and  academic  CIOs  and  CTOs

5000  responded

We  wanted  to  understand  both  the  importance  of  gateways  and  challenges  developers  face

Page 19: Sgci iwsg-a-10-10-16

Specialized  Resources   PercentData  collections     75%Data  analysis  tools,  including  visualization  and  mining 72%Computational  tools 72%Tools  for  rapidly  publishing  and/or  finding  articles  and  data  specific  to  my  domain 69%

Educational  tools 67%

Platforms  for  fostering  group  or  community  collaboration 63%

Simplified  interfaces  that  eliminate  the  need  to  learn  coding 62%

Citizen  science  and  other  public  engagement  resources 47%Workflows  that  automate  or  capture  tasks  or  processes 42%

Scientific  instruments,  such  as  telescopes,  microscopes,  or  sensors 39%

We  learned  that  88%  of  respondents  felt  Web-­based  applications  were  important  to  their  work

n=4,004,  or  88%  of  4,538  researcher/educators.  Percentage  indicates  these  resources  are  “somewhat”  or  “very”  important  to  their  work.

Page 20: Sgci iwsg-a-10-10-16

57%  played  some  role  in  gateway  creationand  these  gateways  were  used  for  a  variety  of  purposes

n  of  application  types=7,805,  by  2,756  creators  (out  of  2,819);  mean=2.8  application  types  per  application  creator

Page 21: Sgci iwsg-a-10-10-16

34% 36%

20%17%

31%26%

42%

16%

30%

18%

45% 44%

14% 15%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

UsabilityConsultant

GraphicDesigner

CommunityLiaison/Evangelist

ProjectManager

ProfessionalSoftwareDeveloper

SecurityExpert

QualityAssuranceand  Testing

Expert

Wished  we  had  thisYes,  we  had  this

A  variety  of  expertise  was  needed  for  successful  gateway  development

n=2,756  respondents  or  98%  of  application  creators

Page 22: Sgci iwsg-a-10-10-16

NSF  has  recognized  the  importance  of  gateways  as  well

Page 23: Sgci iwsg-a-10-10-16

We’ve  come  a  long  way  since  2004,  baby

Page 24: Sgci iwsg-a-10-10-16

A  successful  gateway  institute  will  provide  leadership  to– 1)  bring  science  gateway  developers  together  with  each  other  and  with  the  developers  and  operators  of  existing  and  potential  cyberinfrastructure elements  that  science  gateways  integrate  and  enable  the  use  of• in  order  to  promote  the  efficient,  effective,  and  sustainable  development  of  scientific  web  and  mobile  interfaces

– 2)  educate  developers  and  the  next  generation  of  investigators  to  effectively  use  the  gateway  software  ecosystem to  solve  real  research  problems;  and  

– 3)  educate  the  next  generation  of  researchers  to  enable  them  to  create  the  software  cyberinfrastructure required  to  both  advance  fundamental  understanding  of  science  gateways  and  enable  researchers  to  address  the  grand  challenge  problems  of  the  future

Page 25: Sgci iwsg-a-10-10-16

Science  Gateways  Community  InstituteEst.  Aug,  2016

Page 26: Sgci iwsg-a-10-10-16

• Incubator– Modeled  after  business  

incubators– Diverse  expertise  on  

demand• Extended  Developer  

Support– We  help  others  build  

gateways  and  teach  them  in  the  process

• Scientific  Software  Collaborative– Listing  of  both  functional  

gateways  and  gateway  software

• Community  Engagement  and  Exchange– Annual  conference– Gateways  in  the  news– Job  postings– International  and  inter-­‐

agency  community  building

– Campus  expertise

• Workforce  Development– Student  interns– Gateways  in  the  classroom

SGCI  Highlights

Page 27: Sgci iwsg-a-10-10-16

• New  developments  in  electron  detectors  and  electron  microscopes  now  provide  images  of  macromolecules  (protein,  RNA,  etc.)  that  can  be  determined  to  atomic  resolution

• Inherent  low  signal  to  noise  ratio  means  300,000  images  needed  for  one  object  of  interest

• HPC  resources  to  calculate  atomic  structures  based  upon  these  thousands  of  images

• Now  discovering  the  structures  of  macromolecules  that  were  previously  unattainable  using  traditional  methods

• The  importance  of  these  discoveries  has  brought  global  interest  into  our  field  from  scientists  without  HPC  training

Our  first  customerDr.  Michael  Cianfrocco,  Cryo-­EM  gateway

Source:  Michael  Cianfrocco

Page 28: Sgci iwsg-a-10-10-16

The  cryo-­‐EM  science  gateway  will  offer  users  access  to  HPC  resources,  requiring  only  that  they  have  raw  cryo-­‐EM  data.  

This  will  have  a  wide-­‐ranging  impact  as  national  cryo-­‐EM  centers  are  coming  online  in  the  coming  years,  requiring  that  users  have  a  location  to  process  their  data.  

We  are  building  a  gateway  that  can  handle  all  cryo-­‐EM  data,  whereas,  currently,  every  user  has  to  navigate  the  complex  work  of  HPC  data  analysis  to  either  install  software  on  local  clusters  or  get  access  to  national  centers  for  data  analysis.

Long  term,  we  will  incorporate  workflows  that  will  guide  new  users  through  the  processing  pipeline,  helping  them  assess  data  quality  along  the  way.  

This  pipeline  will  be  the  first  of  its  kind.

Source:  Michael  Cianfrocco

Page 29: Sgci iwsg-a-10-10-16

The  Institute  allows  us  to  expand  our  focus  beyond  HPC

• Sensor-­‐based  gateways• Interfaces  to  instruments– Telescopes,  microscopes,  ultracentrifuges,  more

• Gateways  that  access  data  collections• Citizen  science• Gateways  that  use  clouds  or  campus  resources

There  is  a  whole  wide  world  of  gateways  out  there

Page 30: Sgci iwsg-a-10-10-16

Sage  Bionetworks  developing  predictors  of  disease

• Synapse  is  an  open  computational  platform  used  by  Challenge  teams  spread  across  the  globe  to  crowdsource  questions  in  biology  and  medicine

http://sagebase.org/challenges/

Page 31: Sgci iwsg-a-10-10-16
Page 32: Sgci iwsg-a-10-10-16

The  examples  are  endless

Page 33: Sgci iwsg-a-10-10-16

How  do  you  find  a  gateway?We  plan  to  design  a  marketplaceOne  that  would  interact  with  other  marketplaces

Page 34: Sgci iwsg-a-10-10-16

Vision  for  SGCI  success5-­10  years  from  now

• Science  gateways  form  a  vibrant  community– Inter-­‐agency,  international,  collegial

• Creating  gateways  is  easier– Created  with  more  thoughtfulness,  so  they  are  more  sustainable

• Gateway  developers  have  stable  career  paths– More  efficient  environments  on  campuses

• Students  are  excited  to  stay  in  the  sciences• Radical  changes  in  how  research  is  conducted

Page 35: Sgci iwsg-a-10-10-16

Beyond  the  institute• The  effects  of  the  democratization  on  science• Gateways’  role  in  reproducibility

Page 36: Sgci iwsg-a-10-10-16

Benefits  of  democratizationNew  areas  of  study

• Breakthroughs  don’t  always  come  from  assembling  the  best  and  the  brightest  in  a  closed  room

• Lowering  barriers  to  resources  encourages  experimentation– 2010  study  from  MIT  and  UCSD  compared  research  from  the  National  Institutes  for  Health  (NIH)  and  the  non-­‐profit,  Howard  Hughes  Medical  Institute  (HHMI)• Riskier  HHMI  grants  produced  more  innovative  and  influential  research

Page 37: Sgci iwsg-a-10-10-16

Gateways’  role  in  reproducibilityExploring  collaboration  between  SGCI  and  Whole  Tale

• How  can  we  design  gateways  in  support  of  reproducible  science?  With  ties  to  publishing?

Page 38: Sgci iwsg-a-10-10-16

Continually  changing  technologies• Jupyter notebooks• Gateways  interfacing  to  other  gateways

Page 39: Sgci iwsg-a-10-10-16

Jupyter notebooksWonderful  examples  of  interactive  gateway  development

https://anaconda.org/jbednar/nyc_taxi/notebook

• Additional  work  needed  to  support  very  large  user  communities?  Very  large  data?

Page 40: Sgci iwsg-a-10-10-16

Gateways  interfacing  with  other  gateways

Science  Gateway  Platform  as  a  Service  (SciGaP)  provides  application  programmer  interfaces  (APIs)  to  hosted  generic  infrastructure  services  that  can  be  used  by  domain  science  communities  to  create  Science  Gateways.

RDF:  a  directed,  labeled  graph  data  format  for  representing  information  in  the  WebSPARQL:  query  language  for  RDF  across  diverse  data  sources

Page 41: Sgci iwsg-a-10-10-16

• US  workshops– Gateway  Computing  

Environments  workshops  since  2005

• European  workshops– International  Workshop  on  

Science  Gateways  since  2009

• Australian  workshops– IWSG-­‐A  since  2015

• Joint  special  issue  journals  combine  submissions  from  all  of  the  above

Final  note:  International  collaborations

Page 42: Sgci iwsg-a-10-10-16

• Provide  leadership  on  future  directions  for  science  gateways

• Facilitate  awareness  and  international,  regional  and  national  developments  in  science  gateways

• Identify  and  share  best  practice  in  the  field

• Science  Gateways  Community  Institute (USA)

• NeCTAR (Australia)• NESI (New  Zealand)• Sci-­‐GaIA (Africa)• Academia  Sinica  Grid  Computing  

Center (Taiwan)• Software  Sustainability  

Institute (UK)• VRE4E1C (Europe)• IWSG (Europe)• CANARIE (Canada)• Research  Data  Canada (Canada)• IEEE  Technical  Area  on  Science  

Gateways (International)

International  Coalition  on  Science  GatewaysMichelle  Barker,  Nectar  providing  leadership

http://www.icsciencegateways.org/

Page 43: Sgci iwsg-a-10-10-16

Thank  you• I’m  looking  forward  to  a  great  program  today