Scientific Software: sustainability, skills & sociologyNeil Chue Hong, [email protected], Software Sustainability InstituteUS/IAEA Workshop on Software Sustainability for Safeguards Instrumentation, Viennawww.software.ac.uk
The Software Sustainability Institute
A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage
• Developing the policy and tools tosupport the community developing andusing research software
Supported by EPSRC Grant EP/H043160/1
www.software.ac.uk
Anatomy of my talk
www.software.ac.uk
SOFT
WAR
E is
……
are IMPO
RTANT
everywhere
hard to define
long-lived
context
reasons
people
Software is everywhere(even where you expect it)
www.software.ac.uk
Factories
Services Cinema
Writing
Software is pervasive
Tamiflu binding to mutant influenza
A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057
Favouring of disease risk alleles
Selection at pleiotropic loci underlies disease co-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics
Behind every great piece of science…#go through each SNP of interestfor(my $x = 0; $x < scalar @pos; $x++){ #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypes file if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”;
#create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename");
#####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }
Software is long-lived(and outlasts computational hardware)
www.software.ac.uk
Architectural Dominance
www.software.ac.uk
Image courtesy PDES IncSlide from Sean Barker, BAE SYSTEMS, DPC Designed to Last
13
Computational Chemistry - CASTEP
From the first implementation of a DFT algorithm to a completely new code to community supported software
• Individual• Group• Consortium• W/ industry• Community• Active
Software advances< hardware speedup http://www.castep.org/
www.software.ac.uk
LOTAR: storing aeronautical models
Life of CAD System: 10 years
Time between CAD Versions: 6 months
Life of Product: 70 years +
time
Production
CAD Obsolete CAD Forgotten
Services
Legal Liability
Modifications
10 years 20 30 40 50 60
Spares
Image courtesy PDES IncSlide from Sean Barker, BAE SYSTEMS, DPC Designed to Last
www.software.ac.uk
So we have to maintain it…
• “The modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment” – IEEE defn.– Corrective maintenance: fixing faults– Adaptive maintenance: adapting to changes in
environment– Perfective maintenance: meeting new/different user
requirements– Preventative maintenance: increasing maintainability
www.software.ac.uk
… because we cannot change this with process and practice alone …
• “Many of us have tried to discover ways to prevent code from becoming legacy. But … prevention is imperfect. Even the most disciplined development team, knowing the best principles, using the best patterns, and following the best practices will create messes from time to time. The rot still accumulates. It’s not enough to prevent the rot – you have to be able to reverse it.”
www.software.ac.uk
… so we work with what we have
• Identify change points• Find test points• Break dependencies• Write tests• Make changes and refactor
Testing, infrastructure, documentation are key
www.software.ac.uk
Software is hard to define(and thus hard to sustain)
www.software.ac.uk
What do we sustain:- Workflow?- Software that runs workflow?- Software referenced by workflow?
Novel reuse of public sector datahttp://www.mysociety.org
What do we sustain:- Map?- Software that creates map?
21
Sustaining Function or FormWhat do we sustain:- Function?- Form?
Context is important(otherwise all you have is an object)
www.software.ac.uk
Comb badge, Museum of London
• Without context, objects have no meaning
What’s this item?
32x28mm, lead alloy, late Medieval 14-15th century
What about repositories?
re pos i tor y⋅ ⋅ ⋅ ⋅
/noun/ [ri-poz-i-tawr-ee] • 1. a receptacle or place where things are
deposited, stored, or offered for sale.
• 2. a burial place; sepulchre.
www.software.ac.uk
The Zombie Effect
• Software not always fully alive when you reanimate it!
• Complex set of dependencies– Significant Properties of Software– Purposes and benefits of
software preservation
http://www.jisc.ac.uk/media/documents/programmes/preservation/significantpropertiesofsoftware-final.doc
http://softwarepreservation.jiscinvolve.org/wp/
Reasons are important(so you take the right approach)
www.software.ac.uk
Why are you considering software sustainability?
Achieve legal compliance
Create heritage value
Enable continued access to data and services
Encourage software reuse
Purpose
www.software.ac.uk
How are you going to choose the right approach?
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Transition (process-centric)
Hibernation (knowledge-centric)
Approach
www.software.ac.uk
Preservation vs sustainability
Image courtesy of RGB Kew – not for reuse
Image courtesy of London Permaculture under CC-by-nc-sa license
Preservation?
Sustainability?
www.software.ac.uk
People are important(people are infrastructure too)
www.software.ac.uk
Sustainable Communities
• Cohesion and Identity: Creating a community
• Tolerance and Diversity: Smart growth through collaboration
• Efficient use of resources: Leveraging infrastructure
• Adaptability to change: Governing sustainably
www.software.ac.uk
34
Cultivate Contributors – R project
• Basics: Website, mailing list, code repository, issue resolution
• Remove barriers to participation, increase efficiency
• 1993: First public release; 2 devs• 1995: Code open sourced; 3 devs• 1996: r-testers list set up• 1997: lists split: r-announce, r-help,
r-devel; public CVS; 11 devs• 2000: CRAN split and mirror• 2001: BioConductor• 2003: Namespaces• 2005: I8n, L8n• 2007: R-Forge• Today: BioConductor (33 core devs),
R-Forge (532 projects, 1562 devs), CRAN (1400+ packages)
http://cran.r-project.org/doc/html/interface98-paper/paper_2.html
www.software.ac.uk
We under-appreciate training
• Basic training for kitchen chef: 3-4 years
• Head chef: 10 years
• Basic training for s/w engineer: 3-4 years
• Architect: 10 years
Phot
o by
Zag
atBu
zz
• Training in S/W Dev in UG Physics: 140 hours• Training in S/W Dev in UG Geography: 0 hours
www.software.ac.uk
Software Carpentry
• Lab skills for scientific computing– http://software-carpentry.org– International initiative to teach
basics of software engineering to researchers• The “why” more than
the “how”
– We ran 13 workshopsin 2013 to 600+ learners
Incentives are important
www.software.ac.uk
Courtesy of James Howison and James HerbslebIncentives and Integration In Scientific Software Production
Rewrite by original team: address fragility
Fork to add specific functionalityMaintained separately
Optimised for hardware Facilitate hardware
sales
Exploit new techniques / architectures
And money isn’t everything
www.software.ac.uk
Fund
ing
/ St
affing
Time
Next expt. running
ExperimentRunning
Analysis ofData
New experimentdesign starts
Maintenance of software to process data from
physics experiment
So beware your bus factor
www.software.ac.uk
Summary of my talk
www.software.ac.uk
SOFT
WAR
E is
……
are IMPO
RTANT
everywhere
hard to define
long-lived
context
reasons
people
Take home messages
www.software.ac.uk
No-one sets out to write unsustainable software
Software sustainability is importantbecause it has to happen
People need the skills and incentivesto maintain software through its lifetime
Work with us – www.software.ac.uk
www.software.ac.uk