19
Data Management and Open Access Creating Data Files for Published Figures Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February, 2016

Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

DataManagementandOpenAccessCreatingDataFilesforPublishedFigures

JoshStillerman,MartinGreenwald,MarkLondon,JasonThomasFebruary,2016

Page 2: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PublishingDataforFigures

2

● TheDOErequirementisnotspecificaboutexactlywhichdataandmetadatamustbeincludedwithpublishedfigures.

–Weareinterpretingtherequirementtobe:

oTheactualvaluesplottedinthefigure

oMetadataaboutthosevalues

§Name,Description,Units

oMetadataabouthowthedataaredisplayedinthefigure

§ Labels,DisplayParameters

– Theyarearealsonotdictatinghowthedatashouldbestored.

oFileFormat/DataOrganization…

Page 3: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedFormat

3

● Choosingastandardfileformathasseveraladvantages:

– Easieraccessforreadersofthepublication

– Easierverificationforlibrarians,curators,andsponsors

– Slowerobsolescence,andeasierconversionasstandardsevolve

– Standardgeneralpurpose toolsforbrowsingandviewingcontents.

● WehavechosenHDF5

– https://www.hdfgroup.org/HDF5/

Page 4: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema

4

● Usingastandardfileformatisgood,butnotgoodenough.– IfallofthedatafilesforfiguresinPSFCpublicationswereforexampleMSExcelo Thiswouldnotdictatetheorganizationoflabels,rowsandcolumnsinthosespreadsheets.

o Inordertointerpretoneofthemauserwouldhavetoopenthefileinteractivelyandattempttounderstandtheorganization.

– ThesameistrueforHDF5,sooWehavedefinedastandardHDF5fileorganizationtorepresentthedatainpublishedfigures.

o Easyaccessforallconsumers(sincetheyareallthesameinstructure)o EasytocreationfromtheprograminglanguagesinuseatthePSFC.§ IDL§ PYTHON§ MATLAB§ Thislistcanbeexpandedasneeded.

Page 5: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema(2)

5

● Onefileperfigure - thelibrarysystemwillnamethefilebasedonthepublication’sID– Rootlevelattributes:author, username,date,description, caption…– OneGroupper’trace’displayed.oGroup levelattributesforthistrace:

● OneGrouppersetofdatadisplayed– Group levelattributes:name,legendstring,plot-information– x_data – valuesfortheXaxisoUnits,label

– Y_data – valuesfortheYaxisoUnits,label

– Z_data – valuesfortheZaxisoUnits,label

Page 6: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Creatingdatafiles

6

● Thetimetocreate(orupdate)thedatafilesiswhenthefiguresarebeingcreated– Atthattime,allofthedataisavailableinsomeprogramming language.– Itismuchmorelikelythefilewillmatchthefigure, ifitiscreatedatthattime.

● APIsaresetuptomimictheplottingAPIs.● Filescanbecreatedandconsumed inanyprogramming languageinterchangeably● ExampleinIDL● ExampleinPython● Otherlanguagestofollow

Page 7: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL- Thefigure

7

Page 8: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL

8

file='Fig_1'fig_description ='Besel FunctionsJ0,J1andJ2'fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe'date=systime(0)

;setupasimplecolortable(justforplotting)r=[000,255,255,000,000]g=[000,255,000,000,255]b=[000,255,000,255,000]tvlct,r,g,b

;startanewhdf5filehdf5_new,file=file,fig_description=fig_description,fig_source=fig_source,$

comment=comment,user_fullname=user_fullname,date=date

Page 9: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(2)x_units ='s’x_axis ='time(s)'x_name ='measuredwithastopwatch'x_type ='float'

y_units ='m'y_axis ='height(m)'y_name ='measuredwitharuler'y_type ='float'

legend='J0'

;compute and plotthe firstcurve(you'll dothis to create the plotfile)x=indgen(100)/5.y0=beselj(x,0)plot,x,y0,charsize=1.8,title=fig_description,xtitle=x_axis,ytitle=y_axis,color=1xyouts,/norm,.9,.85,legend,size=1.8

hdf5_add,x,y0,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

9

Page 10: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(3)legend='J1'

y1=beselj(x,1)oplot,x,y1,color=2xyouts,/norm,.9,.8,legend,size=1.8,color=2

group_name =legendplot_graphics ='redline’

hdf5_add,x,y1,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

10

Page 11: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(4)legend='J2'

;compute and plotthe third curvey2=beselj(x,2)oplot,x,y2,color=4xyouts,/norm,.9,.75,legend,size=1.8,color=4

group_name =legendplot_graphics ='greenline’

;adddatagroupforthistracetofilehdf5_add,x,y2,file=file,group_name=group_name,$

x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

11

Page 12: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 12

Page 13: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python- Thefigure

13

Page 14: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Pythonfromscipy.special importjvromh5_dataimporth5_data

file_name ='Fig_4’fig_description ='Besel Functions J0, J1andJ2’fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe’

#Createthedatafile,withfilelevelmetadatahdf_file =h5_data("%s.hdf5"%(file_name,),

fig_description =fig_description,fig_source=fig_source,comment=comment,user_fullname =user_fullname)

14

Page 15: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(2)#Drawthefirstcurvex=linspace(0, 20)y0=jv(0,x)plot(x,y0, '-b',label='J0')x_units='s’x_label='time(s)’y0_units='m’y0_label='height (m)’

#Addthefirstcurvetothefilehdf_file.add_dataset('J0',x,y0,

legend=None,plot_info='BlueLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y0_units,y_label=y0_label,y_datatype='float')

15

Page 16: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(3)#Drawthesecondcurvey1=jv(1,x)plot(x, y1,'-g',label='J1')y1_units='m’y1_label='height (m)’

#Addthesecondcurvetothefilehdf_file.add_dataset('J1',x,y1,

legend=None,plot_info='GreenLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y1_units,y_label=y1_label,y_datatype='float')

16

Page 17: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(4)#Drawthethirdcurvey2=jv(2,x)plot(x, y2,'-r',label='J2')y2_units='m’y2_label='height (m)’title(fig_description)xlabel(x_label)ylabel(y0_label)

#Addalegendlegend(loc='upper right')

#addthethirdcurvetothefilehdf_file.add_dataset('J2',x,y2,

legend=None,plot_info='RedLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y2_units,y_label=y2_label,y_datatype='float')

17

Page 18: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 18

Page 19: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

19

END