If you can't read please download the document
Upload
jpipitone
View
622
Download
0
Embed Size (px)
DESCRIPTION
A very informal talk I gave to Hausi Muller's group at UVic in June 2009. I have included, without permission, slides from Daniel Hook's excellent presentation at SE-CSE 2009 (http://www.cs.ua.edu/~SECSE09/schedule.htm).
Citation preview
2. This presentation
Tell a good story 3. Get feedback from you
4. What am I missing? 5. "If we knew what we were doing, it wouldn't be called research, would it?" - attributed to Albert Einstein 6. What is climate modelling? A kind ofcomputational science 7. What is computational science?
Wikipedia, computational science 8. What is computational science?
9. Program outputs are the results of the experiment. 10. Scientific software development my focus 11. What is climate modelling?
12. What is climate modelling? Source: Easterbrook,CUSEC'09 (Source: IPCC AR4, 2007) 13. What is climate modelling? Source: Easterbrook,CUSEC'09 (Source: IPCC AR4, 2007) 14. General Circulation Models Source: Easterbrook,CUSEC'09 Crown Copyright 15. Scientific software development 16. Scientific software development 17. Verification and Validation
Science Review and Code Review
18. Code review by designated code owners
19. Model-intercomparisons Source: Easterbrook, CUSEC'09 20. Basic Validation Steps
Reproduce past climate change
Reproduce pre-historic climates
Source: Easterbrook, CUSEC'09 21. Validation Notes Source: Easterbrook,CUSEC'09 Crown Copyright 22. Core problems with V&V 23. Validation notes Bit reproducibility Core problems with V&V 24. In other words,
25. What does this mean for software quality? 26. (note, we're not talking about model quality!) 27. How do we judge quality in scientific software? 28. Software Quality
We're used to thinking of the quality of software as how well it is designed and how well it matches our requirements. 29. Measuring Quality: Defect Density
30. Can we benchmark quality using defect density?
Preliminary observation: defect density for climate models is lower than comparably-sized industrial projects. 31. Hadley defect rates Some comparisons: NASA Space shuttle:0.1 failures/KLOCBest military systems: 5 faults/KLOC Worst military systems: 55 faults/KLOC Apache: 0.5 faults/KLOC XP: 1.4 faults/KLOCHadleys Unified Model: avg of 24 bug fixes per release avg of 50,000 lines edited per release 2 defects / KLOC make it through to released code
Source: Easterbrook, CUSEC'09 ? ? 32. Few Defects Post-release
33. Model crashes during a run 34. Model runs, but variables drift out of tolerance 35. Runs dont bit-compare (when they should) Subtle errors (model runs appear valid):
36. The right results for the wrong reasons (e.g. over-tuning) 37. Expected improvement not achieved Source: Easterbrook, CUSEC'09 38. Measuring Quality: Defect Density So,Is climate modelling software really that good? 39. On Benchmarking
41. How do we factor in severity?
Bug type: A bug is* not a bug, across projects. 42. No standards in the literature 43. What are we measuring?
44. We could ask ...
45. We could ask ...
46. What makes a piece of softwarebad ? 47. How do you know when you're done? 48. How do you train newcomers? 49. or... 50. When have you had to delay releasing due to a bug?Why? 51. Tell me the story behind these and other bugs... 52. Questions to ask...
Finding Activity:What activity discovered the problem or defect? Finding Mode:How was the problem or defect found? Problem Type:What is the nature of the problem? If a defect, what kind? Criticality : How critical or severe is the problem or defect? Related Changes:What are the prerequisite changes? .... Why did the bug go unnoticed?Why is it important to have fixed this bug then?How was the bug fixed? Why is the fix appropriate? 53. Why study climate modellers?
54. Already have connections with CM groups 55. Preliminary data suggesting the quality of their models is high:
57. My study:
58. What do climate modellers do when coding to guarantee correctness? 59. What are their notions of quality wrt to code? 60. How can we benchmark computational scientists' code quality? 61. My study:
62. Discover through bug reports and version control comments
Defect density over releases (trends) 63. Breakout by defect types (but, what are they?) 64. Maybe: static fault density using automated tool 65. Examine several climate models (>3?) 66. My study:
67. Use questions given previously to guide conversation 68. Investigate the story of a defect, judgement calls that were made. 69. ~5 defect stories per climate modelling centre 70. Cross-case analysis 71. Outcomes
72. Future: relevant quality benchmark Benchmarking statistics for climate modelling code
Learn from CS; Where can we can help? 73. Questions?
74. objective the study? 75. Issues with the study itself?
76. What would I look for? Others?