CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.
- 1. DCXL: Digital Curation for Excel Funders: Gordon & Betty Moore Foundation, Microsoft Research Carly Strasser UC3, California Digital Library firstname.lastname@example.org 22 Sept 2011 UC3 Webinar Series California Digital Library
2. Community Build on existing Engagement cyberinfrastructure Create new cyberinfrastructure Support communities 3. Roadmap 4. How to get involved in DCXL 3. Progress & future plans 2. Goals of DCXL project 1. An overview: why is DCXL needed? 4. Digital data + Complex workAlows 5. Data Models Maximum Likelihood estimation Matrix Models Images Tables Paper 6. UGLY TRUTH Most Earth | Environmental | Ecological scientists 5shortessays.blogspot.com are not taught data management dont know what metadata are cant name data centers or repositories dont share data publicly or store it in an archive arent convinced they should share data 7. 2 tables Random notes From Stephanie Hampton (2010) ESA Workshop on Best Practices 8. Wash Cres Lake Dec 15 Dont_Use.xls From Stephanie Hampton (2010) ESA Workshop on Best Practices 9. Collaboration and Data Sharing 9 10. What is this? 11. The path of research products wwwwww.collectionconoaa.gov nnection.alcts.ala.org www.Tlickr.com/photos/csessums Data blog.disorder2order.com Metadata blog.seattlepi.com Recreated from Klump et al. 2006 12. Data Reuse Data Sharing Data Management 13. The path of research products www www.collectionconoaa.gov nnection.alcts.ala. org Data wwwMetadata digital- servers.com Recreated from Klump et al. 2006 14. Barriers Cost ttatteredntornprims.blogspot.com/ Time cultblender.wordpress.com Software, Personnel hardware 15. Barriers Cost: time, personnel, software, hardware free-photos.biz Culture of Science Not the norm Lack of training Disparate data 16. Barriers Cost: time, personnel, software, hardware Culture of Science Loss of rights or bene:its wattsupwiththat.com colouringbook.org Misuse of data Missed opportunities ConZlict 17. Barriers Cost: time, personnel, software, hardware Culture of Science Loss of rights or bene:its Lack of incentives Time consuming & expensive Reward structure Few requirements georgevanantwerp.com 18. Roadmap 4. How to get involved in DCXL 3. Progress & future plans 2. DCXL project overview 1. An overview: why is DCXL needed? 19. DCXL Project Goals A transformation in the conduct of a segment of scientiTic research by enabling and promoting publishing, sharing, and archiving of tabular data Increase interoperability = Sharing publishability = Publishing archivability = Archiving Focus on atmospheric, ecological, hydrological, and oceanographic data 20. DCXL Project Goals Open Source & Free Excel Add-in Software program that extends the capabilities of larger programs Complements basic Excel functionality From www.webopedia.com www.ablebits.com 21. DCXL Add-in Goals Easier Archiving Sharing Harder Publishing 22. DCXL Project Deliverables Excel add-in Publicly available source code Technical documentation End user documentation Publicly available requirements Community storageplusgulfport.com 23. DCXL Project Outcomes Enable citation & allow credit Enable policy enactment Enable re-use by eliminating barriers Save time for researcher Encourage creation of extensions 24. Process Assess needs Quantitative Surveys 25. Process Assess needs Quantitative Surveys Quick poll 26. Process Assess needs Quantitative ? Surveys Quick poll Qualitative Interviews 27. Process Assess needs Gather requirements Recruitment tools DCXL/data management seminars Listservs & email Blog, Facebook, Twitter Face-to-face interactions Flyers 28. Process Assess needs Gather requirements Locations Conferences UC campus visits Remote/web-based 29. Process Assess needs Gather requirements Stakeholders & contributors Libraries Scientists Repositories Experts: MSR, GBMF Personnel on related projects 30. Process Social media, emails, Social media, campus visits emails CDL Email Data Libraries Seminars Flyers Centers Social media Scientists Quick poll Survey Interview Related Funders projects Requirements 31. Implementation Assess needs Gather requirements Build requirements document 32. Implementation Assess needs Gather requirements Build requirements document Build community Libraries Scientists Repositories Programmers/Developers 33. Timeline 26 Sept DCXL Kickoff Meeting 7 OctFinalize Requirements Gathering Framework 9 Nov1st draft of Requirements to MSR30 Nov2nd draft of Requirements to MSR 5-9 DecAGU Meeting, San Francisco15 DecFinal Requirements to MSR 201216 JanReceive Excel Add-in Version 123 JanRollout Excel Add-in Version 1 16-19 FebAAAS meeting: Add-in user testing 20-24 FebOcean Sciences meeting: Add-in user testing26 Feb1st Draft of updated Requirements based on Version 1 to MSR 2 AprDeliver updated Requirements based on Version 1 to MSR28 MayReceive Excel Add-in Version 2 29 May- 24 Jun User testing of Version 225 JunRollout Excel Add-in Version 27-10 July CSEE meeting: Add-in debut & demo13 July Final code, technical documentation, and requirements published31 July End user documentation published 34. Roadmap 4. How to get involved in DCXL 3. Progress & future plans 2. DCXL project overview 1. An overview: why is DCXL needed? 35. Ecological Society of America Summer 2011 Meeting 36. ESA Overview Everyone uses Excel Most use Excel for organizing raw data Most import spreadsheets into other programs for analysis ~75% are embarrassed about using Excel Excitement about open source Minimal knowledge about data management, organization, and archiving 55 surveys from diverse group 37. Operating System 50 45 40 35 30 25 20 15 10 5 0 Mac PC Linux 38. Use Excel for... Sharing Other Analyses Statistics Visualization Organization 0 10 20 30 40 50 60 # Respondents (out of 55) 39. How often do you use Excel? 30 25 # repsondents 20 15 10 5 0 Never Rarely Every Every day day 40. What features are used in Excel? Comments Cell shading Macros Embedded formulas Headers Pivot Tables Multiple Tabs Multiple Tables 0 10 20 30 40 50 60 70 80 90 100 Percent 41. Ray Troll (trollart.com) American Fisheries Society Summer 2011 Meeting 42. AFS Overview Everyone uses Excel Most use it only for data organization and sharing 36 surveys from diverse group Heavy MS Access use 100% PC 43. How often do you use Excel? 18 16 14 12 # respondents 10 8 6 4 2 0 Rarely Every day 44. Tasks performed in Excel? Sharing data Simple Calculations Statistics Visualizing data Organizing data 0 10 20 30 40 50 60 70 80 90 100 % respondents (n = 36) 45. What should the add-in help you do? 60 50 % Respondents 40 30 20 10 0 Organize my Organize my Archive my Create Share my data No opinion data for my data for others data metadata publicly own use to use more easily 46. AFS Overview Everyone uses Excel Most use it only for data organization and sharing 36 surveys from diverse group Heavy MS Access use 100% PC Data hoarders Myoverstuffedbookshelf.blogspot.com 47. Roadmap 4. How to get involved in DCXL 3. Progress & future plans 2. DCXL project overview 1. An overview: why is DCXL needed? 48. Get Involved dcxl.cdlib.org Now: General info Blog Forum Calendar Later: Requirements Documentation 49. Get Involved @dcxlCDL www.facebook.com/DCXLatCDL 50. Acknowledgements CDL: Rachael Hu, Trisha Cruse, John Kunze, Tracy Seneca MSR: Lee Dirks GBMF: Chris Mentzel Carly Strasser email@example.com