Transcript
Page 1: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

DataManagementandOpenAccessCreatingDataFilesforPublishedFigures

JoshStillerman,MartinGreenwald,MarkLondon,JasonThomasFebruary,2016

Page 2: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PublishingDataforFigures

2

● TheDOErequirementisnotspecificaboutexactlywhichdataandmetadatamustbeincludedwithpublishedfigures.

–Weareinterpretingtherequirementtobe:

oTheactualvaluesplottedinthefigure

oMetadataaboutthosevalues

§Name,Description,Units

oMetadataabouthowthedataaredisplayedinthefigure

§ Labels,DisplayParameters

– Theyarearealsonotdictatinghowthedatashouldbestored.

oFileFormat/DataOrganization…

Page 3: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedFormat

3

● Choosingastandardfileformathasseveraladvantages:

– Easieraccessforreadersofthepublication

– Easierverificationforlibrarians,curators,andsponsors

– Slowerobsolescence,andeasierconversionasstandardsevolve

– Standardgeneralpurpose toolsforbrowsingandviewingcontents.

● WehavechosenHDF5

– https://www.hdfgroup.org/HDF5/

Page 4: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema

4

● Usingastandardfileformatisgood,butnotgoodenough.– IfallofthedatafilesforfiguresinPSFCpublicationswereforexampleMSExcelo Thiswouldnotdictatetheorganizationoflabels,rowsandcolumnsinthosespreadsheets.

o Inordertointerpretoneofthemauserwouldhavetoopenthefileinteractivelyandattempttounderstandtheorganization.

– ThesameistrueforHDF5,sooWehavedefinedastandardHDF5fileorganizationtorepresentthedatainpublishedfigures.

o Easyaccessforallconsumers(sincetheyareallthesameinstructure)o EasytocreationfromtheprograminglanguagesinuseatthePSFC.§ IDL§ PYTHON§ MATLAB§ Thislistcanbeexpandedasneeded.

Page 5: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

PSFCStandardizedSchema(2)

5

● Onefileperfigure - thelibrarysystemwillnamethefilebasedonthepublication’sID– Rootlevelattributes:author, username,date,description, caption…– OneGroupper’trace’displayed.oGroup levelattributesforthistrace:

● OneGrouppersetofdatadisplayed– Group levelattributes:name,legendstring,plot-information– x_data – valuesfortheXaxisoUnits,label

– Y_data – valuesfortheYaxisoUnits,label

– Z_data – valuesfortheZaxisoUnits,label

Page 6: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Creatingdatafiles

6

● Thetimetocreate(orupdate)thedatafilesiswhenthefiguresarebeingcreated– Atthattime,allofthedataisavailableinsomeprogramming language.– Itismuchmorelikelythefilewillmatchthefigure, ifitiscreatedatthattime.

● APIsaresetuptomimictheplottingAPIs.● Filescanbecreatedandconsumed inanyprogramming languageinterchangeably● ExampleinIDL● ExampleinPython● Otherlanguagestofollow

Page 7: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL- Thefigure

7

Page 8: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL

8

file='Fig_1'fig_description ='Besel FunctionsJ0,J1andJ2'fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe'date=systime(0)

;setupasimplecolortable(justforplotting)r=[000,255,255,000,000]g=[000,255,000,000,255]b=[000,255,000,255,000]tvlct,r,g,b

;startanewhdf5filehdf5_new,file=file,fig_description=fig_description,fig_source=fig_source,$

comment=comment,user_fullname=user_fullname,date=date

Page 9: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(2)x_units ='s’x_axis ='time(s)'x_name ='measuredwithastopwatch'x_type ='float'

y_units ='m'y_axis ='height(m)'y_name ='measuredwitharuler'y_type ='float'

legend='J0'

;compute and plotthe firstcurve(you'll dothis to create the plotfile)x=indgen(100)/5.y0=beselj(x,0)plot,x,y0,charsize=1.8,title=fig_description,xtitle=x_axis,ytitle=y_axis,color=1xyouts,/norm,.9,.85,legend,size=1.8

hdf5_add,x,y0,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

9

Page 10: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(3)legend='J1'

y1=beselj(x,1)oplot,x,y1,color=2xyouts,/norm,.9,.8,legend,size=1.8,color=2

group_name =legendplot_graphics ='redline’

hdf5_add,x,y1,file=file,group_name=group_name,$x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

10

Page 11: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

IDL(4)legend='J2'

;compute and plotthe third curvey2=beselj(x,2)oplot,x,y2,color=4xyouts,/norm,.9,.75,legend,size=1.8,color=4

group_name =legendplot_graphics ='greenline’

;adddatagroupforthistracetofilehdf5_add,x,y2,file=file,group_name=group_name,$

x_units=x_units,x_axis=x_axis,x_name=x_name,x_type=x_type,$y_units=y_units,y_axis=y_axis,y_name=y_name,y_type=y_type,$legend=legend,plot_graphics=plot_graphics

11

Page 12: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 12

Page 13: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python- Thefigure

13

Page 14: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Pythonfromscipy.special importjvromh5_dataimporth5_data

file_name ='Fig_4’fig_description ='Besel Functions J0, J1andJ2’fig_source ='Phys.Plasmas17,12342010'comment='Thisisthewaytheballbounces'user_fullname ='JohnDoe’

#Createthedatafile,withfilelevelmetadatahdf_file =h5_data("%s.hdf5"%(file_name,),

fig_description =fig_description,fig_source=fig_source,comment=comment,user_fullname =user_fullname)

14

Page 15: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(2)#Drawthefirstcurvex=linspace(0, 20)y0=jv(0,x)plot(x,y0, '-b',label='J0')x_units='s’x_label='time(s)’y0_units='m’y0_label='height (m)’

#Addthefirstcurvetothefilehdf_file.add_dataset('J0',x,y0,

legend=None,plot_info='BlueLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y0_units,y_label=y0_label,y_datatype='float')

15

Page 16: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(3)#Drawthesecondcurvey1=jv(1,x)plot(x, y1,'-g',label='J1')y1_units='m’y1_label='height (m)’

#Addthesecondcurvetothefilehdf_file.add_dataset('J1',x,y1,

legend=None,plot_info='GreenLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y1_units,y_label=y1_label,y_datatype='float')

16

Page 17: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

Python(4)#Drawthethirdcurvey2=jv(2,x)plot(x, y2,'-r',label='J2')y2_units='m’y2_label='height (m)’title(fig_description)xlabel(x_label)ylabel(y0_label)

#Addalegendlegend(loc='upper right')

#addthethirdcurvetothefilehdf_file.add_dataset('J2',x,y2,

legend=None,plot_info='RedLine',x_units=x_units,x_label=x_label,x_datatype='float',y_units=y2_units,y_label=y2_label,y_datatype='float')

17

Page 18: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

TheResult<HDF5file"Fig_1.hdf5"(mode r,12.4k)>(File) /

root(Group)/root('user_fullname', 'JohnDoe')('user_id', 'g')('date','Thu Feb413:52:102016')('fig_description', 'Besel Functions J0,J1and J2')('fig_source', 'Phys.Plasmas17,12342010')('n_groups', 3)

J0(Group)/root/J0('group1plotting information', 'black line')('legend', 'J0')

x_values (Dataset)/root/J0/x_values len =(100,)('units', 's')('axislabel','time(s)')('datatype', 'float')('nx',100)

y_values (Dataset)/root/J0/y_values len =(100,)('units', 'm')('axislabel','height(m)')('datatype', 'float')('ny',100)

J1(Group)/root/J1('group1plotting information', 'red line')('legend', 'J1')

x_values (Dataset)/root/J1/x_values len =(100,) 18

Page 19: Data Management and Open Access Creating Data Files for ...library.psfc.mit.edu/publishing/dmp/slides/josh.pdf · Josh Stillerman, Martin Greenwald, Mark London, Jason Thomas February,

19

END


Recommended