13
Recommendations from the Workshop on Research Data Lifecycle Management (RDLM 2011) March 2013 Summary report for the National Science Foundation5sponsored workshop held July 18520, 2011 at Princeton University Introduction The objective of the National Science Foundation (NSF)=sponsored workshop on Research Data Lifecycle Management (RDLM 2011) was to bring together researchers, campus information technology (IT) leaders, and library and archive specialists to discuss data lifecycle management for computational science and engineering research data. Participants worked to develop a common understanding of best practices and funding models for selecting, storing, describing, and preserving digital research data, including data generated from simulations and analysis conducted with high performance computing . The workshop was also designed to help cultivate collaboration between participating communities to stimulate ongoing improvement in the preservation and sharing of research data. The inspiration for RDLM 2011 emerged during a series of discussions and meetings among national leaders in the academic computing field during the past several years. The need for academic institutions to focus on the long=term collection, management, and sustainability of research data was highlighted as both a strategic and tactical priority at the July 2008 workshop, “Developing a Coherent Cyberinfrastructure from Local Campus to National Facilities: Challenges and Strategies.” That workshop, held in Indianapolis, was co= sponsored by the EDUCAUSE Campus Cyberinfrastructure (CCI) working group and the Coalition for Academic Scientific Computation (CASC). The 2008 workshop articulated a number of strategic and tactical goals. Among them were three major strategic recommendations for Information Life Cycle Management 1 : ! Funding agencies and institutions must fund both (1) operational implementations of data preservation to meet immediate needs and (2) research on data preservation and reuse to guide future activities. ! Federal agencies, research institutions, IT professional communities, and data management experts should develop, publish, and use standards for provenance, metadata, discoverability, and openness. 1 See Section 3.2 Information Life Cycle: Accessibility, Usability, and Sustainability, pp 24=28, in the report for the July 2008 joint CASC/CCI workshop. This report, issued in February 2009, is available at: http://casc.org/papers/EPO0906.pdf

Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

Recommendations,from,the,Workshop,on,Research,Data,Lifecycle,Management,(RDLM,2011),!March!2013!

!Summary'report'for'the'National'Science'Foundation5sponsored'workshop'held'July'18520,'2011'at'Princeton'University!

Introduction,The!objective!of!the!National!Science!Foundation!(NSF)=sponsored!workshop!on!Research!

Data!Lifecycle!Management!(RDLM!2011)!was!to!bring!together!researchers,!campus!

information!technology!(IT)!leaders,!and!library!and!archive!specialists!to!discuss!data!

lifecycle!management!for!computational!science!and!engineering!research!

data.!!Participants!worked!to!develop!a!common!understanding!of!best!practices!and!

funding!models!for!selecting,!storing,!describing,!and!preserving!digital!research!data,!

including!data!generated!from!simulations!and!analysis!conducted!with!high!performance!

computing!.!The!workshop!was!also!designed!to!help!cultivate!collaboration!between!

participating!communities!to!stimulate!ongoing!improvement!!in!the!preservation!and!

sharing!of!research!data.!

The!inspiration!for!RDLM!2011!emerged!during!a!series!of!discussions!and!meetings!among!

national!leaders!in!the!academic!computing!field!during!the!past!several!years.!!!!!!

The!need!for!academic!institutions!to!focus!on!the!long=term!collection,!management,!and!

sustainability!of!research!data!was!highlighted!as!both!a!strategic!and!tactical!priority!at!the!

July!2008!workshop,!“Developing!a!Coherent!Cyberinfrastructure!from!Local!Campus!to!

National!Facilities:!Challenges!and!Strategies.”!That!workshop,!held!in!Indianapolis,!was!co=

sponsored!by!the!EDUCAUSE!Campus!Cyberinfrastructure!(CCI)!working!group!and!the!

Coalition!for!Academic!Scientific!Computation!(CASC).!

The!2008!workshop!articulated!a!number!of!strategic!and!tactical!goals.!Among!them!were!

three!major!strategic!recommendations!for!Information!Life!Cycle!Management1:!

! Funding'agencies'and'institutions'must'fund'both'(1)'operational'implementations'of'data'

preservation'to'meet'immediate'needs'and'(2)'research'on'data'preservation'and'reuse'to'

guide'future'activities.'

! Federal'agencies,'research'institutions,'IT'professional'communities,'and'data'

management'experts'should'develop,'publish,'and'use'standards'for'provenance,'

metadata,'discoverability,'and'openness.'

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1!See!Section!3.2!Information!Life!Cycle:!Accessibility,!Usability,!and!Sustainability,!pp!24=28,!in!the!report!for!the!July!2008!joint!CASC/CCI!workshop.!This!report,!issued!in!February!2009,!is!available!at:!

http://casc.org/papers/EPO0906.pdf!

Page 2: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

2!

! Funding'agencies,'research'institutions,'and'IT'professional'communities'must'

collaborate'to'develop'a'combination'of'policy'and'financial'frameworks'to'ensure'the'

longBterm'maintenance'of'important'data,'beyond'the'length'of'any'individual'career.'''

At!subsequent!meetings!there!was!general!agreement!that!progress!on!the!above!three!

recommendations!was!needed,!and!the!expertise!of!librarians!and!archivists!is!critical!to!

future!progress!and!success!in!this!area.!A!proposal!for!the!RDLM!2011!workshop!was!

awarded!by!the!NSF.!!

Self=descriptions!submitted!by!the!one!hundred!workshop!participants,!both!on=site!at!

Princeton!University!and!online,!were!roughly!evenly!distributed!across!researcher,!

research!computing/IT!professional,!and!librarian/archivist!roles.!

Recommendations,The!following!recommendations,!summarizing!the!outcomes!of!seven!workshop!discussion!

sessions,!were!submitted!to!the!NSF!in!September!2012.!

Funding,and,Operation,of,Research,Data,Lifecycle,Management,Research!grants,!researcher!careers!and!research!institutions!have!different!durations,!

while!the!value!of!some!research!data!may!continue!in!perpetuity.!!This!session!addressed!

the!questions:!Are!there!reasonable!funding!and!operation!models!available!to!provide!for!

the!continued!storage!and!presentation!of!data!throughout!its!lifecycle?!!If!not,!how!can!

such!models!be!developed!and!implemented!in!a!supportable!way?!!!

Existing!models!discussed!in!this!session!included:!the!Princeton!University!DataSpace!

service!“pay!once,!store!forever”!model,!where!a!researcher!is!charged!a!fee!based!on!the!

projected!capital!cost!of!storing!that!data!forever;!monthly!payments,!which!could!work!

well!if!the!monthly!cost!is!low!as!long!as!the!researcher!remains!at!the!institution;!and!a!

multi=phase!funding!model!with,!for!example,!seed!funding!from!grants,!maintenance!and!

staff!funding!from!central!funds,!and!researcher!media!funding!paid!for!by!the!researcher.!

The!library!model!for!research!data!preservation!was!debated,!as!well!as!the!community!

model!represented!by!the!DataOne!initiative.!Developing!workflow!models!to!assist!

researchers!with!data!management!was!also!discussed.!

The!recommendations!from!this!session!include:!

! The'National'Science'Board'or'similar'entities'should'be'engaged'to'interact'with'funding'

agencies'to'determine'whether'data'preservation'activities'should'be'covered'by'the'

indirect'cost'pool'defined'for'academic'institutions.'

! Research'community'standards'similar'to'records'retention'standards'in'business'need'to'

be'clarified'to'allow'future'research'data'federation'or'partnerships'to'be'established.'

! An'initiative'should'be'undertaken'to'provide'a'common'(interdisciplinary)'definition'of'

data.'

! An'initiative'should'be'undertaken'to'outline'phases'of'the'data'lifecycle.'

Page 3: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

3!

! An'initiative'should'be'undertaken'to'develop'“taxonomy”,'since'terms'such'as'

“preservation”'may'mean'different'things'to'different'communities.'

! A'research'project'should'be'performed'to'look'at'the'existing'collection'of'operational'

models'for'data'lifecycle'management.''

Partnering,Researchers,,IT,Staff,,Librarians,,and,Archivists,The!conversation!relating!to!partnerships!between!researchers,!IT!staff,!librarians!and!

archivists!covered!many!different!approaches,!and!suggested!there!could!be!more!sharing!

among!institutions!of!the!varying!models!and!best!practices.!!Virtual!help!desks!and!task!

forces,!and!groups!that!involve!the!office!of!sponsored!research!within!an!institution!were!

mentioned.!!

Ideas!for!activities!included:!workshops!that!target!higher!levels!of!institutional!officials;!

sharing!information!about!cross=training!programs!involving!researchers,!IT!and!library!

and!archive!groups;!and!implementing!more!training!for!graduate!students.!Creating!

reward!and!recognition!for!staff!and!faculty!at!the!interface!between!research!and!library!

functions!was!discussed.!

The!idea!to!formally!document!requirements!for!different!actors!in!data!preservation!was!

presented—to!clarify,!for!example,!how!librarian!requirements!for!metadata!differ!from!

discipline!specific!metadata!standards.!It!was!proposed!to!have!the!NSF!present!a!generic,!

high=level!framework!that!shows!how!existing!policies!and!procedures!such!as!data!

management!plans!fit!together.!It!was!also!proposed!to!have!the!NSF!lead!the!development!

of!unified!data!management!standards!and!a!basic!model!of!data!lifecycles.!!

The!major!recommendation!from!this!session!was:!

! Complete'a'survey'of'organizational'“models”'or'best'practices'for'communication'and'

interaction'between'researchers,'IT'staff,'librarians'and'archivists.'

Assessment,and,selection,of,research,data,The!discussions!in!this!session!about!the!assessment!and!selection!of!research!data!included!

the!questions:!“Should!it!be!a!goal!to!keep!all!the!data!if!possible?!Do!we!have!to!worry!

about!what!to!select!if!it!is!possible!to!keep!all!the!data?”!

The!overarching!theme!of!the!discussion!was!that!these!are!essentially!community!based,!

discipline!specific!decisions.!!

There!are!a!number!of!criteria!to!consider!when!selecting!data!that!include!things!such!as!

the!cost!of!not!keeping!the!data,!how!much!it!cost!to!gather!the!data,!and!how!usable!the!

data!format!is.!!While!some!metadata!can!be!added!at!the!time!of!use,!there!is!a!need!for!

minimal!provenance!data.!!Metadata!should!not!be!viewed!as!fully!static,!and!it!should!be!

updated!regularly!to!accommodate!for!things!such!as!keyword!changes.!!While!researchers!

have!the!domain!knowledge!and!expertise!to!assess!and!select!data,!librarians!can!pose!

important!questions!to!guide!the!process.!!It!is!important!to!document!what!data!has!been!

Page 4: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

4!

deselected.!!There!is!also!much!that!could!be!learned!from!non=digital!areas;!for!example,!

managers!of!physical!specimens!and!collections!have!extensive!experience!with!assessment!

and!selection.!

The!recommendations!from!this!session!include:!

! Over'the'longBterm,'create'workflows'for'data'collection'that'include'metadata'creation'

and'data'selection'which'will'make'it'easy'for'researchers'to'undertake'these'actions.'

! Add'data'management'to'the'academic'research'methodology'curriculum,'perhaps'as'an'

enhancement'to'existing'research'compliance'training.'

! Demonstrate'to'researchers'how'standardized'archiving'and'retention'policies'will'help'

make'their'lives'easier.'

Policy,The!discussion!of!this!topic!began!with!existing!policies,!standards,!and!regulations.!!It!was!

noted!that!policies,!standards,!and!regulations!are!only!useful!if!they!are!well!known!and!

generally!adhered!to.!!Another!consideration!is!that!data!is!only!useful!if!it!is!available!in!a!

format!that!is!usable.!!A!clear!understanding!is!needed!for!who!has!ownership!of!data!and!

who!has!responsibility!for!preserving!it!and!making!it!available!(including!metadata).!!

Librarians!have!the!expertise!to!assist!with!metadata,!curation,!and!controlled!vocabularies,!

and!consortia!should!be!considered!as!an!alternative!to!creating!repositories!at!every!

institution.!!The!goal!should!be!to!reduce!the!time!for!discovery!of!data!by!other!researchers.!!

It!was!noted!that!university!libraries!and!archives!are!more!persistent!than!individual!

research!centers!or!even!disciplines.!

Policy!may!change!during!the!lifecycle!of!data!as!may!the!style!of!management.!!A!clear!line!

was!drawn!between!the!management!of!active!data!and!the!preservation!of!static!data.!!

Standards!are!needed!for!the!peer!review!of!data.!!There!is!also!a!need!to!differentiate!

between!capturing!the!exact!data!used!to!support!a!publication!and!correcting!errors!in!the!

data!(perhaps!via!versioning!which!can!be!handled!as!metadata).!

The!recommendations!from!this!session!include:!

! Universities'should'develop'or'clarify'policies'about'data'management'including'

recommendations'such'as'where'data'should'be'deposited.'

! Universities'and'related'institutions'should'act'in'concert'to'develop'policies'about'data'

ownership'and'responsibility'in'consultation'with'funding'agencies.''

! Organize'a'workshop'for'senior'research'officers'(VP’s'or'Deans'for'research'and'similar'

positions)'and'senior'academic'officers'(Provosts'and'similar'positions)'to'discuss'data'

lifecycle'management'in'order'to'elevate'the'visibility'and'importance'of'this'topic'with'

senior'university'administrators.'

! Create'a'catalog'of'issues'such'as'data'ownership,'data'restrictions,'etc.'for'all'disciplines'

in'a'quick'guide'format.'

! Organize'a'workshop'for'leaders'of'discipline'communities'to'develop'a'common'

framework'that'could'then'be'customized'or'extended'to'meet'each'community’s'needs.''

Page 5: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

5!

Provide'a'list'of'issues'and'possible'solutions'to'communities'to'meet'their'standards'and'

needs.'

Standards,for,Provenance,,Metadata,,Discoverability,The!discussion!of!standards!for!provenance,!metadata!and!discoverability!began!with!

current!standards.!!The!International!Federation!of!library!Associations!and!Institutions!

(IFLA)!has!a!useful!standard!of!FISO!(find,!identify,!search,!obtain).!!DDI3!is!a!useful!

standard!for!the!social!and!behavioral!sciences!in!which!all!flow!control!is!documented.!!

While!each!community!can!develop!its!own!International!Organization!Standard!(ISO)!for!

metadata!in!its!discipline,!the!ideal!would!be!a!universal!framework!for!which!each!

discipline!would!determine!which!fields!are!needed!for!their!area.!!Metadata!and!data!

should!be!packaged!so!that!they!can!move!from!repository!to!repository,!or!repository!to!

personal!workspace.!!Using!a!unified!format!such!as!XML!would!enable!the!librarians!and!

archivists!to!use!software!tools!to!validate!metadata.!!Librarians!and!archivists!serve!as!

stand=ins!for!future!researchers.!!They!should!be!made!aware!of!software!and!format!

changes!that!will!require!them!to!translate!from!old!to!new!formats.!

Ontologies!will!have!a!role!in!providing!standard!terminology!for!a!domain!and!sharing!

definitions!in!a!meaningful!way.!!Basic!standards!for!what!data!consists!of!are!needed;!there!

was!disagreement!in!the!group!on!whether!one!could!get!down!to!the!variable!or!parameter!

level!when!providing!metadata.!

The!recommendations!from!this!session!include:!

! Create'a'framework'to'share'and'receive'data'and'metadata'across'disciplines,'with'

confidence'in'the'quality'of'data'and'metadata.'When'doing'so,'describe'using'principles'

as'opposed'to'specific'technologies'(for'example,'describe'trees'and'relationships'rather'

than'specifying'RDF).'

! Design'provenance'metadata'that'cuts'across'different'areas.'

! Create'strategies'for'capturing'metadata'at'various'points'in'the'data'lifecycle,'in'an'

automated'way,'if'possible.'For'instance,'with'sensor'data,'sensors'could'be'designed'to'

create'better'metadata.'A'frequentlyBused'approach'is'to'create'a'data'model'with'project'

members,'build'use'cases,'and'identify'milestones.''Observing'data'practices'is'very'

valuable'for'learning'about'an'area'and'translating'practice'in'the'field'into'tools.'

! Use'the'approach'demonstrated'by'the'genome'sequencing'community'of'developing'

community'metadata'standards,'and'having'agencies'fund'only'those'researchers'who'

follow'the'standards.''

Secure,Research,Data,There!is!great!concern!over!the!security!of!sensitive!research!data,!and!the!large!number!of!

possibly!overlapping!policies!covering!research!data!can!be!daunting!to!institutions!and!

officials!in!charge!of!research!computing!environments.!!While!it!would!be!appealing!to!

have!a!minimal!set!of!policies!applied!to!information!technology!solutions!that!can!be!

certified!as!compliant,!in!reality!such!certification!from!a!vendor!would!be!difficult!because!

Page 6: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

6!

the!technology!solution!may!not!encompass!all!the!activities!that!an!institution!must!carry!

out!to!adhere!to!the!standard.!

While!disciplines!such!as!social!science!and!population!research!may!require!solutions!such!

as!non=networked!computers!in!locked!rooms!to!prevent!unauthorized!access,!some!

institutions!are!developing!secure!computing!environments!that!allow!authorized!

researchers!to!access!restricted!datasets!over!the!network.!!It!would!be!useful!to!have!an!

automated!checklist!approach!to!create!such!and!environment.!

The!recommendations!from!this!session!include:!

! Enquire'about'a'national'working'group'to'guide'compliance'to'various'federal'standards'

by'research'computing'environments.'

! Catalog'solutions'for'remote'access'to'restricted'data.'

! Catalog'solutions'for'clinical'translational'study'data'(several'university'medical'

hospital/research'groups'have'clinical'translational'study'award'(CTSA)'programs).'

Partnering,Funding,Agencies,,Research,Institutions,,and,Communities,combined)with,Industrial,and,Corporate,Partnerships,The!discussion!started!with!consideration!of!projects!where!industry!assistance!would!be!

required,!but!there!would!be!no!business!model!for!profiting!from!such!an!environment.!!In!

the!pay!once,!store!forever!model,!liability!does!not!decrease!over!time,!so!if!industry!were!

to!consider!providing!such!services!it!would!have!to!be!done!in!a!way!to!assure!the!provider!

was!exempt!from!legal!responsibility!for!any!loss!of!data.!!While!industry!may!develop!

secure!storage!systems,!they!will!not!likely!be!rigorous!enough!for!restricted!research!

products.!!Workshop!participants!and!the!NSF!should!work!together!to!determine!

information!they!need!from!each!other!to!better!handle!data!management!policies.!!There!

are!challenges!with!joint!industry!and!academic!projects!that!are!supported!by!funding!

agencies.!!Such!projects!are!possible.!

There!was!significant!discussion!about!developing!a!model!for!a!digital!container!that!can!

be!sustained!through!the!ages,!which!somehow!includes!the!social!context!in!the!metadata.!!

Possibilities!exist!for!a!collaboration!between!industry!and!academia!to!“fill!in!the!missing!

pieces”!in!the!research!data!software!stack!and!to!share!data!center!facilities.!

The!recommendations!from!this!session!include:!

! Arrange'for'a'trusted'party'do'a'survey'of'cost'models'of'the'data'lifecycle'and'

preservation'available'through'the'vendor'community,'for'comparison'to'storage'options'

available'through'academic'institutions'and'federal'agencies.''

! Provide'suggestions'and'insight'to'NSF'and'other'funding'agency'program'directors'on'

how'to'leverage'regional,'disciplineBspecific'archives'that'already'exist'and'how'smaller'

schools'with'fewer'resources'can'use'larger'institutions’'repositories.'

! Engage'funding'agencies'and'vendors'in'discussions'about'potential'networking'and'

regional'repository'solutions'for'moving'around'large'data.'

Page 7: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF5Sponsored'Workshop'Recommendations:' ' 'Research'Data'Lifecycle'Management'(RDLM'2011)'March'2013''

7!

Additional,Material,Appended!to!this!summary!report!are!the!workshop!agenda,!the!list!of!workshop!attendees,!

and!a!list!of!the!thirty!position!papers!submitted!by!workshop!participants.!At!the!time!of!

writing,!these!items!and!additional!workshop!material!are!available!at!

http://rcs.columbia.edu/rdlm.!Information!about!a!future!workshop!archive!will!be!made!

available!at!this!URL.!

Acknowledgements,The!organizing!committee!for!the!RDLM!2011!workshop!consisted!of:!Committee!Chair!Curt!

Hillegas,!Princeton!University;!Rajendra!Bose,!Columbia!University;!Kerstin!Lehnert,!

Lamont=Doherty!Earth!Observatory,!Columbia!University;!Clifford!Lynch,!Coalition!for!

Networked!Information,!and!Oren!Sreebny,!University!of!Chicago.!

For!volunteering!to!assist!with!events,!we!also!thank:!Vijay!Agarwala,!Penn!State!University;!

Stan!Ahalt,!RENCI!(Renaissance!Computing!Institute),!UNC!Chapel!Hill;!Grace!Agnew,!

Rutgers!University;!Serge!Goldstein,!Princeton!University;!and!Thorny!Staples,!Smithsonian!

Institution.!

We!thank!IBM!and!Dell!for!financial!support!of!the!workshop.!

This!material!is!based!upon!work!supported!by!the!National!Science!Foundation!under!

Grant!No.!1137007.!Any!opinions,!findings,!and!conclusions!or!recommendations!expressed!

in!this!material!are!those!of!the!author(s)!and!do!not!necessarily!reflect!the!views!of!the!

National!Science!Foundation.!

Page 8: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

NSF$Workshop$on$Research$Data$Lifecycle$Management$$Princeton$University$July$18B20,$2011$

$Agenda$

Monday,$July$18$12:00%pm% % Afternoon%Check3in%at%Nassau%Inn,%10%Palmer%Square,%Princeton,%NJ$3:30%pm%–%4:30%pm% Orange%Key%Tour%of%Princeton%University’s%campus%%% % % (Assemble%in%front%of%Frist%Campus%Center)%5:30%pm%–%7:30%pm% Informal%Reception%at%the%Prospect%House$$Tuesday,$July$19$8:00%am%–%9:00%am% Continental)Breakfast,)Prince)William)Ballroom$8:30%am%–%8:35%am%% Welcome%Remarks,%James%Stone,%Princeton%University%8:35%am%–%9:00%am%% Overview,%Goals,%and%Brief%Introductions,%Prince%William%Ballroom%9:00%am%–%9:25%am% "Issues%surrounding%enterprise%data%infrastructure%and%governance%systems%to%%% % % support%research%at%the%University%of%Michigan33%A%view%from%the%Health%%%% % % System"%3%Brian%Athey,%University%of%Michigan%%$

9:25%am%–%9:50%am% “DataSpace:%a%funding%and%operational%model%for%long3term%preservation%of%%% % % research%data”%3%Serge%Goldstein,%Princeton%University$%

9:50%am%–%10:15%am% Break,)Prince)William)Ballroom/PWB$

%

10:15%am%–%10:40%am% “Taking%AIM%at%Data%Lifecycle%Management”%3%Jose3Marie%Griffiths,%Bryant%%% % % University$10:40%am%–%11:05%am% “DataONE%(Observation%Network%for%Earth):%Enabling%New%Science%by%Supporting%% % % the%Management%of%Data%Throughout%its%Life%Cycle”%3%%Bill%Michener,%University%%% % % Libraries,%University%of%New%Mexico%%11:05%am%–%11:35%am% Lightning%Round%(three3minute%presentations)$11:35%am%–%12:00%pm% Preparation%and%selection%for%afternoon%breakout%sessions$%

12:00%pm%–%1:00%pm% Lunch)Buffet)(Senior)Room)%$%

1:00%pm%–%2:00%pm% Breakout%Sessions%% % % ·%Secure%research%data%–%Palmer%Room%%% % % · Policy%–%Witherspoon%Room%% % % · Assessment%and%selection%of%research%data%–%Senior%Room%% % % · Funding%and%operation%of%research%data%lifecycle%management%–%PWB%%%

2:00%pm%–%3:00%pm% Breakout%Sessions%%% % % · Partnering%researchers,%IT%staff,%librarians%and%archivists%3%PWB% %% % % · Standards%for%provenance,%metadata%and%discoverability%–%Palmer%Room%% % % · Partnering%funding%agencies,%research%institutions%and%communities%–%Senior%Room%% % % · Industrial%and%corporate%partnerships%–%Senior%Room%%%

3:00%pm%–%3:30%pm% Break,)Prince)William)Ballroom%%

3:30%pm%–%4:45%pm% Reports%from%the%Breakout%Sessions%%%

5:00%pm% % Assemble%in%front%of%Nassau%Inn%to%board%bus%to%Rat’s%Restaurant%5:30%pm%–%9:00%pm% Reception)and)Dinner)at)Rat’s)Restaurant%%% % % % %% % % Tour)of)the)Grounds)for)Sculpture)(Rat’s)Restaurant)) % %9:00%pm% % Bus%returns%to%the%Nassau%Inn%

Page 9: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

$Wednesday,$July$20$8:00%am%–%8:30%am% Continental)Breakfast,)Prince)William)Ballroom%8:30%am%–%8:45%am% Welcome%and%Agenda%Review,%Prince%William%Ballroom%8:45%am%–%9:45%am% Panel%Discussion%–%Vendor%and%Corporate%Relationships%%

9:45%am%–%10:15%am% Break,)Prince)William)Ballroom%%

10:15%am%–%11:15%am% Panel%Discussion%–%Funding%Agencies%11:15%am%–%11:45%am% Open%Discussion%11:45%am%–%12:00%pm% Wrap3up%%

12:00%pm% % Boxed)lunches)available)at)the)assembly)area)outside)Prince)William)Ballroom)(PWB)%%

12:00%pm%–%1:00%pm% Organizing%Committee%generate%report%writing%assignments%and%deadlines,%PWB%%1:15%pm% % Van%leaves%for%tour%of%Princeton%University’s%new%High%Performance%Computing%%% % % Research%Center%(HPCRC)%in%front%of%the%Nassau%Inn%%1:30%pm%–%2:30%pm% Tour%of%Princeton%University’s%new%HPCRC%%2:45%pm%% % Van%returns%to%the%Nassau%Inn%hotel%%%%

Speakers/Panelists$%Brian$Athey,%Ph.D.,%Director,%Academic%Informatics,%University%of%Michigan%Medical%School%$Raymond$Clarke,%Enterprise%Storage%Consultant,%Oracle%America%3%Enterprise%Solutions%Group%%Serge$Goldstein,%Ph.D.,%Associate%Chief%Information%Officer%and%Director,%Academic%Services,%Princeton%University%%JoseBMarie$Griffiths,%Ph.D.,%Vice%President%Academic%Affairs,%Bryant%University%$Michael$Huerta,%Ph.D.,%Associate%Director%for%Program%Development,%National%Institutes%of%Health%$$Imtiaz$Khan,%Software%Specialty%Architect%3%Information%Management,%IBM$$Jeffrey$Layton,%Ph.D.,%Enterprise%Technologist%for%HPC,%Dell%$%William$Michener,%Ph.D.,%Professor%and%Director%of%DataONE,%University%Libraries,%University%of%New%Mexico%%Jennifer$Schopf,%Ph.D.,%Program%Officer,%National%Science%Foundation%%Donald$Waters,%Ph.D.,%Program%Officer,%The%Andrew%W.%Mellon%Foundation%%%

Page 10: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

W o r k s h o p o n R e s e a r c h D a t a L i f e c y c l e M a n a g e m e n t ( R D L M 2 0 1 1 ) A t t e n d e e L i s tJ u l y 1 8 - 2 0 , 2 0 1 1 , P r i n c e t o n U n i v e r s i t y , P r i n c e t o n , N J

7/18/2011 1

Agarwala Vijay Senior Director, Research Computing and CI Penn State UniversityAgnew Grace Associate University Librarian Rutgers University Ahalt Stan Director/Professor RENCI UNC Chapel HillAthey (S) Brian Professor University of MichiganBernstein Herbert Professor Computer Science Dowling CollegeBlaire Jay Director of Strategic Development University of MiamiBose (O) Rajendra Manager, Research Computing Services Columbia UniversityBrandt D. Scott Associate Dean for Research Purdue UniversityBrisson Erik Associate Director Boston UniversityCarroll Tim Director & Global Lead, Research Computing Solutions Dell Cheverie Joan Policy Specialist EDUCAUSEChilds Dee Deputy CIO Louisiana State UniversityClarke (P) Raymond Enterprise Storage Consultant Oracle America- Enterprise Solutions GroupCollie Aaron Digital Curation Librarian Michigan State UniversityCombs Jody Associate Dean of Libraries Vanderbilt Universityde la Cruz Gutierrez Manuel Digital Library University of HoustonDeumens Erik Director University of FloridaDressel Willow Assistant Engineering Librarian Princeton UniversityEsteva Maria Research Associate Texas Advanced Computing Center Faundeen John Archivist U.S. Geological SurveyFomenkov Marina Research Analyst University of California San DiegoFronczak Christine HPC Marketing Manager DellFurlough Michael Assistant Dean for Scholarly Communications and Co-

Director of the Office of Digital Scholarly PublishingPenn State University

Gaylord Clark Chief Information Officer Virginia Tech Transportation Institute Goldstein (S) Serge Associate CIO for Academic Services Princeton UniversityGrappone Todd AUL for Digital Initiatives and IT UCLAGreenbaum David Director of Research & Content Technologies University of California, BerkeleyGriffiths (S) Jose-Marie VPAA/University Professor Bryant UniversityGu Grace Instruction/Reference Librarian George Washington UniversityHanson Karen Digital Projects Librarian New York UniversityHires Will Assistant Librarian Louisiana State UniversityHillegas (O) Curt Director of Research Computing Princeton UniversityHswe Patricia Digital Collections Curator Penn State University Hudak David Program Director Ohio Supercomputer CenterHudson Michelle Science & Social Science Data Librarian Yale UniversityKidney Gary Director, Academic & Research Computing Rice UniversityKonomos Philip Head, Library Technology Arizona State UniversityKrishnamurthy Ashok Interim Co-Director Ohio Supercomputer CenterLangley Anne E. Head Librarian, Science and Technology Libraries Princeton UniversityLayton (P) Jeff Enterprise Technologist for HPC DellLehnert (O) Kerstin Senior Research Scientist Columbia UniversityLynch (O) Clifford Executive Director Coalition for Networked InformationMajchrzak Dan Director Research Computing University of South FloridaMayo Bob Director, Rensselaer Libraries Rensselaer Polytechnic InstituteMcAllister Stephen Director of Digital Information Strategy Dartmouth CollegeMcMullen Donald Senior Scientist University of KansasMichener (S) William Professor and Director of DataONE University of New MexicoMinor David Digital Preservation Manager UC San DiegoMizzy Danianne Engineering Librarian Columbia UniversityMonaco Gregory Director for Research & CI Initiatives Great Plains NetworkMoore Alyssa Account Executive Scientific ComputingMundrane Michael Deputy CIO University of California, BerkeleyMyers James Director, CCNI Rennselaer Polytechnic InstituteNabrzyski Jaroslaw Director University of Notre DameNeeman Henry Director of Supercomputing University of OklahomaOwen Kim Advanced Applications Coordinator North Dakota State UniversityParis Joseph Acting Director for Research Computing Northwestern UniversityParuchuri Ravi Asst. Director of Research and Advanced Computing Louisiana State UniversityRohrs Lynn Asst. Director, eSystems & HPC Services New York UniversitySallans Andrew Head of Strategic Data Initiatives University of Virginia LibrarySchopf (P) Jennifer Program Director NSF/WHOI

Page 11: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

W o r k s h o p o n R e s e a r c h D a t a L i f e c y c l e M a n a g e m e n t ( R D L M 2 0 1 1 ) A t t e n d e e L i s tJ u l y 1 8 - 2 0 , 2 0 1 1 , P r i n c e t o n U n i v e r s i t y , P r i n c e t o n , N J

7/18/2011 2

Sreebny (O) Oren Sr. Director, Emerging Technologies & Communications University of ChicagoStacey Kimberly Research Computing Consultant University of Colorado - BoulderStaples Thorny Director, Research and Scientific Data Smithsonian InstitutionSteinhart Gail Research Data and Environmental Sciences Librarian Cornell UniversityStrasser Carly DCXL Project Manager California Digital LibraryStroop Jon P. Metadata Analyst, Library Princeton UniversityTsinoremas Nicholas Director of Center for Computational Sciences University of Miami Waters (P) Donald Program Officer Andrew W. Mellon FoundationWhittaker Martha Director of Content Management George Washington UniversityWittman Noah Program Manager, Research Hub University of California, BerkeleyWomack Ryan Data and Economics Librarian Rutgers University LibrariesZysman Joel Director of HPC University of Miami

Online Attendees

Last Name First Name Job Title Institution

Aschenbrenner Andreas Senior Architect Göttingen State AMD University LibraryAvery Bonnie Natural Resources Librarian Oregon State University LibrariesBoock Michael Head, Center for Digital Scholarship and Services Oregon State University LibrariesBothmer Jim Asst. VP, Health Science/Director HSL Creighton UniversityChapa Brie Data Manager University of Wisconsin-MadisonClendinning David Director of Library Services West Virginia State UniversityCopenhagen Liz Records Manager Harvard University Foster Ian Director, Computation Institute University of Chicago and Argonne National LabHayes Barrie Bioinformatics Librarian University of North Carolina-Chapel HillHuerta (P) Michael Associate Director for Program Development National Library of MedicineKendall Skip Senior Electronic Records Analyst/Archivist Harvard UniversityLake Sherry Scientific Data Consultant University of VirginiaMcLaughlin Don Research Associate West Virginia UniversityMitchell Victoria Social Science Data& Government Documents Librarian University of Oregon

Parham Susan Research Data Librarian Georgia Institute of TechnologyPeters Dale Academic Computing Manager University of KwaZuluNatalRead David Records Analyst Harvard UniversityRushing Amy Head Librarian, Digital Access Services University of Texas at AustinSchmelz Lynne Librarian for the Sciences Harvard College Library

Seiffert Kurt Manager, Research Storage Indiana UniversitySniffin-Marinoff Megan University Archivist Harvard UniversitySoderdahl Paul Director, Library IT University of Iowa LibrariesUjda Leah Digital Services Librarian University of Wisconsin MadisonWestra Brian Science Data Services Librarian University of OregonWhite Darla Records Manager and Archivist Harvard University Wright Stephanie Data Services Coordinator University of Washington Libraries

S - speaker

P - panel member or moderator

O - workshop organizing committee

Page 12: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

RDLM 2011 Submitted position papers:

All workshop participants were encouraged to submit a position paper to gather input from the community and seed the conversations at the workshop. The position papers below have been made publicly available with permission from the authors.

Please note: the URLs provided below are temporary; we are planning for the workshop materials, including publicly accessible position papers, to be placed in a more permanent location with URLs that are more suitable for citation.

 Agnew, G. and Womack, R. (2011). Managing Research Data Lifecycles through Context. Rutgers University position paper.

Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Agnew_Rutgers_RDLM2011.pdf Bernstein, H. J., Folk, M. J., Benger, W., Dougherty, M. T., Eliceiri, K. W. and Schnetter, E. (2011). Communicating Scientific Data

from the Present to the Future. Dowling College position paper. This work is licensed under a Creative Commons Attribution–Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/ Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Bernstein_Dowling_RDLM2011.pdf

 Bose, R. and Mizzy, D. (2011). Research Data Storage Approaches at Columbia. Columbia University position paper. This work is

licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0) License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Bose_Columbia_RDLM2011.pdf

 Brandt, D. S. (2011). Disambiguating the Role of Data Lifecycle Gatekeeper. Purdue University position paper. This work is

licensed under a Creative Commons CC BY‐ND License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Brandt_Purdue_RDLM2011.pdf

 Brisson, E. (2011). Boston University position paper. Copyright 2011 Trustees of Boston University; This work is licensed under a

Creative Commons License: http://creativecommons.org/licenses/by-nc-nd/3.0/ Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Brisson_BU_RDLM2011.pdf

 Collie, A. (2011). Michigan State University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/

Collie_MSU_RDLM2011.pdf Cruse, P., Kunze, J. and Strasser, C. (2011). An Excel Add-in to Make Scientific Data Publishable, Shareable and Archiveable.

University of California Curation Center position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Cruse_UC_RDLM2011.pdf

 Deumens, E., Taylor, L. N. F., Schipper, R. A., Botero, C., Garcia-Milian, R., Norton, H. F., Tennant, M. R., Acord, S. K. and Barnes, C.

P. (2011). Research Data Lifecycle Management: Tools and guidelines. University of Florida position paper. This work is licensed under Creative Commons Attribution License CC BY. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Deumens_UF_RDLM2011.pdf

 Esteva, M., Jordan, C., Urban, T., Walling, D. and Xu, W. (2011). An Evolving Model to Manage and Preserve Research Data

Collections at the Texas Advanced Computing Center (TACC). Texas Advanced Computing Center (TACC) position paper. This work is licensed under a Creative Commons Attribution-NoDerivs  (CC BY-ND) License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Esteva_TACC_RDLM2011.pdf

 Faundeen, J. (2011). U.S. Geological Survey position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/

Faundeen_USGS_RDLM2011.pdf Flynn, P., Thain, D., Morgan, E. L., Nabrzyski, J. and Skendzel, D. (2011). The Digital Assets Strategy and Management of

Research Data at the University of Notre Dame. University of Notre Dame position paper. This work is licensed under a Creative Commons CC Attribution-NoDerivs License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Flynn_UND_RDLM2011.pdf

 Fomenkov, M. and claffy, k. (2011). Internet measurement data management challenges. CAIDA, University of California, San

Diego position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Fomenkov_UCSD_RDLM2011.pdf Foster, I. (2011). Research Data Lifecycle Management as a Service. Computation Institute, University of Chicago and Argonne

National Laboratory position paper. This work is licensed under a Creative Commons CC BY-ND License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Foster_UChicago_RDLM2011.pdf

 Gaylord, C. (2011). The Scientific Data Warehouse and the Research Data Lifecycle Of the SHRP2 Naturalistic Driving Study.

Virginia Tech Transportation Institute position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Gaylord_VT_RDLM2011.pdf

 Goldstein, S. J. and Hillegas, C. (2011). DataSpace: A Funding Model for the Preservation and Sharing of Research Data.

Princeton University position paper. This work is licensed under a Creative Commons CC BY-ND License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Goldstein_Princeton_RDLM2011.pdf

 Gu, G. and Whittaker, M. (2011). The Role of the Public Services Librarian in a Campus Cyberinfrastructure Center: A Position

Page 13: Recommendations,from,the,Workshop,on, Research,Data ... › ~rb2568 › rdlm › RDLM_2011_Recommendation… · Recommendations,from,the,Workshop,on, Research,Data,Lifecycle,Management,(RDLM,2011),!

Paper. George Washington University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Gu_GWU_RDLM2011.pdf

 Guss, S., and Rohrs, L. (2011). New York University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/

Guss_NYU_RDLM2011.pdf Hudak, D., Calyam, P., Wohlever, K. and Krishnamurthy, A. (2011). Research Data Services for Biomedical Science: Program

status and upcoming opportunities. Ohio Supercomputer Center position paper. This work is licensed under a Creative Commons Attribution 3.0 Unported License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Hudak_OSC_RDLM2011.pdf

 Hudson, M. (2011). Yale University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/

Hudson_Yale_RDLM2011.pdf Kidney, G., Odegard, J. and Andrews, K. (2011). Data Lifecycle Management at Rice University. Rice University position paper.

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Kidney_Rice_RDLM2011.pdf

 Konomos, P. (2011). Arizona State University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/

Konomos_ASU_RDLM2011.pdf Mayo, B. (2011). Estimating Research Data Lifecycle Management Costs. Rensselaer Polytechnic Institute position paper. This

work is licensed under a Creative Commons Attribution License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Mayo_RPI_RDLM2011.pdf

 McMullen, D. F., Ludwig, D. and Monaco, G. (2011). Management and Preservation of Research Data: Whose Responsibility is it

Anyway? University of Kansas and Kansas State University position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/McMullen_Kansas_RDLM2011.pdf

 Minor, D., Fleming, D., Kozbial, A. and Westbrook, B. (2011). The UC San Diego Digital Curation Program. UC San Diego Libraries

and San Diego Supercomputing Center position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Minor_UCSD_RDLM2011.pdf

 Paris, J. M. (2011). Research Data Lifecycle Management at Northwestern University. Northwestern University position paper.

Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Paris_Northwestern_RDLM2011.pdf Paruchuri, R., Tohline, J., Phillips, F., Childs, M., White, S., Armstrong, W., Hires, W. and Melancon, B. (2011). Research Data -‐

Storage & Management at Louisiana State University. Louisiana State University position paper. This work is licensed under a Creative Commons CC BY-NC-SA License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Paruchuri_LSU_RDLM2011.pdf

 Sallans, A. (2011). DMPTool: supporting the data lifecycle. University of Virginia position paper. This work is licensed under a

Creative Commons Attribution–Share Alike 3.0 Unported License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Sallans_UV_RDLM2011.pdf

 Schopf, J. (2011). Production Quality Data. Woods Hole Oceanographic Institution position paper. Temporary URL: http://

www.columbia.edu/~rb2568/rdlm/Schopf_NSF_RDLM2011.pdf Staples, T. (2011). Towards a Virtual Environment for Supporting Research Activities at the Smithsonian. Smithsonian position

paper. This work is licensed under a Creative Commons CC BY License. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Staples_Smithsonian_RDLM2011.pdf

 Tsinoremas, N. F., Zysman, J., Mader, C. and Blaire, J. (2011). Data in Motion: a new paradigm in Research Data Lifecycle

Management. University of Miami position paper. Temporary URL: http://www.columbia.edu/~rb2568/rdlm/Tsinoremas_UMiami_RDLM2011.pdf