Tales of the Field: Building Small Science
Cyberinfrastructure
Andrea Wiggins
iSchool @ Syracuse University
31 October, 2009
Free/Libre Open Source Software
• FLOSS development– Large-scale social
phenomenon of “collaborative” software development
• Observing FLOSS research– Reflexive examination of
small scholarly community studying FLOSS development
– Specifically working on building CI for FLOSS research
http://www.flickr.com/photos/pmtorrone/304696349/
eScience Proof of Concept
• (some) FLOSS research is a good candidate for eScience approaches to doing the work– Lots of data due to scale of phenomenon– Research community ethos of sharing
• Data repositories• Research paper archive• Analysis artifacts
FLOSS Research Community
• Little Science– Interdisciplinary:
primarily software engineering, but also social sciences across a wide spectrum
– Fairly small community: under 500 researchers worldwide
http://www.flickr.com/photos/circulating/997909242/
FLOSS Data
• Many types of data, focus here on digital “trace” data– Archival, secondary– By-product of FLOSS work,
easy to get but hard to use
• Federated repositories of repositories (RoRs)– Data for research drawn from
hosting “forges”– ~1 TB across 3 RoRs
http://www.flickr.com/photos/smiteme/2379630899/
Research Methods & Tools
• Methods used with RoR data vary, but are generally quantitative– Correlational studies– Longitudinal analysis– Code metrics
• Two main approaches– Bespoke scripts or tools– eScience workflow tools
Barriers to Uptake
• Little Science– Lack of agreement over
epistemology, RQs, methods, tools
– Researcher isolation, few incentives to collaborate
• Bimodal distribution of skills– “I can’t possibly do that! I can’t
write code!”– “Why bother? I just write my
own Python script; you should too.”
http://www.flickr.com/photos/noner/1739876378/
Technology Skills Required
• Taverna• SVN• (more) SSH, Unix terminal, XML• R, plus packages• SQL, relational DB management• Java & Eclipse (just enough)• OWL, RDF, SPARQL• Knowledge of opaque data sources
Implications for Small Sciences
• Critical mass– Need stewardship, dedicated
resources
• Skills gap– eScience tools require fairly
high technology competency
• Convergence of research– Common questions, modes of
research
• Motivations to contribute– Academic credit
http://www.flickr.com/photos/askpang/327577395/
Potential Solutions
• $$$– Maintaining and developing resources is not free,
even if they are freely shared
• Curricular integration– Broaden contributor base by drawing on students
through coursework
• Deliberately cultivate a community– Train PhD students early in their studies
• Mechanisms to incentivize contribution
Conclusions
• Without external imperatives, CI for little science seems unlikely to emerge unaided
• CI requires standardization and movement toward normal science, which may be premature or simply inappropriate for many social sciences
• Benefits for early adopters: tools support efficient collaboration, enable rigorous research provenance, permit analysis replication, and speed time to results