Sociotechnical production systems for software in science James Howison and Jim Herbsleb Institute for Software Research School of Computer Science Carnegie

Embed Size (px)

Citation preview

  • Slide 1
  • Sociotechnical production systems for software in science James Howison and Jim Herbsleb Institute for Software Research School of Computer Science Carnegie Mellon University School of Information University of Texas at Austin http://james.howison.name/pubs/HowisonHerbsleb2011SciSoftIncentives.pdf
  • Slide 2
  • How does a a cubic km of ice become a scientific paper?
  • Slide 3
  • First find some ice Image Credit: NASA
  • Slide 4
  • Build a big drill Image Credit: IceCube
  • Slide 5
  • and some Digital Optical Modules Image Credit: IceCube
  • Slide 6
  • Combine Image Credit: IceCube
  • Slide 7
  • Collect and filter data Image Credit: IceCube
  • Slide 8
  • Store and analyze it Image Credit: http://www.flickr.com/photos/theplanetdotcom
  • Slide 9
  • Simulate light in ice Photo credit: http://www.flickr.com/photos/rainman_yukky/
  • Slide 10
  • Simulate Atmosphere Image Credit: NASA
  • Slide 11
  • Model
  • Slide 12
  • Analyze
  • Slide 13
  • Plots
  • Slide 14
  • Publish
  • Slide 15
  • Software is everywhere
  • Slide 16
  • Enhancing reproducibility and correctness Saving money Driving innovation Coalescing into widely used software platforms All linked to software as information artifact: Re-playable Re-useable Extendable A appealing vision of software
  • Slide 17
  • Yet software also has constraints Maintenance (avoiding bit rot) Software must be maintained (synchronization work Kept in sync with complements and dependencies Coordinated Rapid development and changes can lead to breakdown Path dependencies Easy to start, hard to architect for widespread use
  • Slide 18
  • How to achieve the Software Vision? Better technologies? Better engineering methods? Leadership/Norms/Ethics? Policy? Rewards?
  • Slide 19
  • A sociotechnical understanding Understand software work in existing institutions of science Specific Research Questions: What software is used? How created and maintains it? What incentives drive its creation? Why is it trusted?
  • Slide 20
  • Method: Data Route into complex practice Chose paper as unit of analysis: Focal Paper Trace back from paper to work that produced it Semi-structured interviews Supported by artifacts (e.g., paper/methods and materials) Elicit workflow, focus on software work Identify software authors/sources, and seek introductions Qualitative analysis Phenomenological exhaustion
  • Slide 21
  • Case 1: STAR Image Credit: RHIC
  • Slide 22
  • Our focal paper
  • Slide 23
  • Workflow
  • Slide 24
  • Software Production 1.Employed Core Software development Professional software developers ROOT4STAR framework 2.Core simulation code Scientists undertaking service work 3.Analysis code to get the plots Locally written, frozen at publication
  • Slide 25
  • Case 3: Bioinformatic microbiology Image Credit: http://www.flickr.com/photos/grytr
  • Slide 26
  • Studying the nitrogen cycle Image Credit: Focal Paper
  • Slide 27
  • A field revolutionized by software
  • Slide 28
  • Personal software infrastructure Power user scripts Personal competitive advantage that is something that most biologists cant do. period. Share methods but not personal infrastructure code or actively support others Methods and materials section should provide enough information, if not hell fix it. But not going to do their homework for them
  • Slide 29
  • Publishing on software Tools potentially useful to others described in separate publications, Software pubs Ambivalence: Can you make a career out of this? Definitely But: hes known for his software rather than his science hes known for facilitating science rather than and some people have that reputation Advise a student to do this? Yes, but if you happen to get a publication out of it and it becomes a tool thats widely used, then great, thats fantastic, better props for you but theres a danger Tool developers are greatly under-appreciated
  • Slide 30
  • Algorithm people Self-described member of the algorithm people as distinguished from biologists Muscle: biology == strcmp() Builds from scratch (avoid tricky dependencies) Obvious that they dont collaborate Credit accrues to the original publications Little credit in perceived incremental improvements Politics of improvement acceptance at the mercy of Competition is appropriate and productive
  • Slide 31
  • Software Production systems Practice that is similar on four aspects: 1.Incentives for the work 2.The type of artifacts produced 3.The way it is organized 4.The logic of correctness
  • Slide 32
  • Context: Academic reputation system
  • Slide 33
  • Software as support
  • Slide 34
  • Collaboration service-work
  • Slide 35
  • Academic credit: Incidental software
  • Slide 36
  • Academic credit: Parallel software practice
  • Slide 37
  • Systemic threats to software vision The type of software work needed to realize the cyberinfrastructure vision is poorly motivated Invisible work (Star and Ruhlender) Especially, little incentive to collaborate Project owned by initial creators Initial publications receive citations Extension dominated by fork-and-rename
  • Slide 38
  • Academic reputation and integration James Howison and Jim Herbsleb (2013) Sharing the spoils: incentives and integration in scientific software production. ACM CSCW
  • Slide 39
  • Where to for science policy? Exhortations? Training? Forcing open source through funding lever? Risk of substituting logics of correctness Kleenex code as open source? Risk of undermining appropriate competition Turn scientists into open source community managers? When there is little reward for this work?
  • Slide 40
  • Scientific Software Network Map But, you know, imagine it as a live, dynamic data set!
  • Slide 41
  • Techniques for measuring use Software that reports its own use Instrumentation Analysis of traces in papers Mentions, citations Characteristic artifacts Analysis of collections of software On supercomputing resources (TACC, NICS) Through workflow systems (Galaxy, Pegasus, Taverna)
  • Slide 42
  • Contact James Howison http://james.howison.name [email protected] This material is based upon work supported by the US National Science Foundation under Grant No. #0943168.