14
HDF Server A Web API for HDF5 data John Readey The HDF Group [email protected]

HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group [email protected]

Embed Size (px)

Citation preview

Page 1: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

HDF  ServerA  Web  API  for  HDF5  data

John  ReadeyThe  HDF  [email protected]

Page 2: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

My  Background

Joined  HDF  Group  last  year

Previously  worked  at  Amazon.com(FBA,  AWS  groups)

Bezos’  Dictate:  Everything  must  be  a  service(http://tinyurl.com/ojjzmrx)

This  page  invokes  MANY  services

Page 3: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

What  is  HDF5?

Depends  on  your  point  of  view:• a C-­‐API• a  File  Format• a data  model

Think  of  HDF5  as  a  file  systemwithin  a  file.With  chunking  and  compression.Add  NumPy style  data  selection.

Page 4: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

HDF5  Features

Some  nice  things  about  HDF5:• Sophisticated  type  system• Highly  portable  binary  format• Hierarchal  objects  (directed  graph)• Compression• Fast  data  slicing/reduction• Attributes• Bindings  for  C,  Fortran,  Java,  Python

Page 5: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

Why  an  HDF5  Web  API?

Motivation  to  create  a  web  api:• Anywhere  reference-­‐able  data  – ie URI• Network  Transparency• Clients  can  be  lighter  weight• Support  Multiple  Writer/Multiple  Reader• Enable  Web  UIs• Increased  scope  for  features/performance  boosters• E.g.  in  memory  cache  of  recently  used  data  

Page 6: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

HDF  Server  Highlights

• Written  in  Python  using  Tornado  Framework  (uses  h5py  &  PyTables)• REST-­‐based  API• HTTP  request/responses  in  JSON• Full  CRUD  (create/read/update/delete)  support• Most  HDF5  features  (Compound  types,  Compression,  chunking,  links)• Self-­‐contained  web  server  • Open  Source• UUID  identifiers  for  Groups/Datasets/Datatypes• Very  easy  to  install/run• Python  client  lib  h5pyd  (h5py-­‐distributed)

Page 7: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

A  simple  diagram  of  the  REST  API

Page 8: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

What  makes  it  RESTful?• Client-­‐server  model• Stateless  – (no  client  context  stored  on  server)• Cacheable  – clients  can  cache  responses• Resources  identified  by  URIs  (datasets,  groups,  attributes,  etc)• Standard  HTTP  methods  and  behaviors:

Method Safe Idempotent Description

GET Y Y Get  a  description  of  a  resource

POST N N Create  a  new  resource

PUT N Y Create  a  new  named  resource

DELETE N Y Delete  a  resource

Page 9: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

Example  URI

http://tall.data.hdfgroup.org:7253/datasets/34…d5e/value?select=[0:4,0:4]

scheme domain port resource Query  param

• Scheme:  the  connection  protocol• Domain:  HDF5  files  on  the  server  can  be  viewed  as  domains• Port:  the  port  the  server  is  running  on• Resource:  identifier  for  the  resource  (dataset  values  in  this  case)• Query  param:  Modify  how  the  data  will  be  returned  

• (e.g.  hyperslab  selection)

http://tall.data.hdfgroup.org:7253/datasets/feef70e8-­‐16a6-­‐11e5-­‐994e-­‐06fc179afd5e/value?select=[0:4,0:4]

Note:  no  run  time  context!

Page 10: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

Demo

1.  REST  API,  one  client client server filehttp h5py

h5pyd server filehttp h5py

2.  h5pyd,  one  clientclient

h5pyd server filehttp h5py

client

h5pyd

client

h5pyd

client

h5pyd

client

h5pyd

client http

3.  h5pyd,  multiple  clients

Page 11: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

What’s  next:  – Web  UI

Page 12: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

What’s  next  – access  control

• Authentication   (you  are  who  you  say  you  are)• HTTPS  (cut  out  the  man  in  the  middle)• Authorization   (who  can  do  what)

• Per  resource  ACL’s

Security!    Needed  if  the  server  is  to  be  run  on  an  open  network

Page 13: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

What’s  next:  Scalable  Server• Support   any  sized  repository• Any  number  of  users• Any  request  volume• Provide  data  as  fast  as  the  client  can  pull  it  in• Targeted  for  public/private  cloud

Page 14: HDF$Server - s3.amazonaws.com · HDF$Server AWeb"API"for"HDF5"data John"Readey The"HDF"Group jreadey@hdfgroup.org

Try  it  out!:

https://github.com/HDFGroup/h5serv