Tales of the Field: Building Small Science Cyberinfrastructure

Preview:

DESCRIPTION

Society for the Social Studies of Science cyberinfrastructure methods panel presentation on experiences building small science cyberinfrastructure and reflections on implications for other pre-paradigmatic domains.

Citation preview

Tales of the Field: Building Small Science

Cyberinfrastructure

Andrea Wiggins

iSchool @ Syracuse University

31 October, 2009

Free/Libre Open Source Software

• FLOSS development– Large-scale social

phenomenon of “collaborative” software development

• Observing FLOSS research– Reflexive examination of

small scholarly community studying FLOSS development

– Specifically working on building CI for FLOSS research

http://www.flickr.com/photos/pmtorrone/304696349/

eScience Proof of Concept

• (some) FLOSS research is a good candidate for eScience approaches to doing the work– Lots of data due to scale of phenomenon– Research community ethos of sharing

• Data repositories• Research paper archive• Analysis artifacts

FLOSS Research Community

• Little Science– Interdisciplinary:

primarily software engineering, but also social sciences across a wide spectrum

– Fairly small community: under 500 researchers worldwide

http://www.flickr.com/photos/circulating/997909242/

FLOSS Data

• Many types of data, focus here on digital “trace” data– Archival, secondary– By-product of FLOSS work,

easy to get but hard to use

• Federated repositories of repositories (RoRs)– Data for research drawn from

hosting “forges”– ~1 TB across 3 RoRs

http://www.flickr.com/photos/smiteme/2379630899/

Research Methods & Tools

• Methods used with RoR data vary, but are generally quantitative– Correlational studies– Longitudinal analysis– Code metrics

• Two main approaches– Bespoke scripts or tools– eScience workflow tools

Barriers to Uptake

• Little Science– Lack of agreement over

epistemology, RQs, methods, tools

– Researcher isolation, few incentives to collaborate

• Bimodal distribution of skills– “I can’t possibly do that! I can’t

write code!”– “Why bother? I just write my

own Python script; you should too.”

http://www.flickr.com/photos/noner/1739876378/

Technology Skills Required

• Taverna• SVN• (more) SSH, Unix terminal, XML• R, plus packages• SQL, relational DB management• Java & Eclipse (just enough)• OWL, RDF, SPARQL• Knowledge of opaque data sources

Implications for Small Sciences

• Critical mass– Need stewardship, dedicated

resources

• Skills gap– eScience tools require fairly

high technology competency

• Convergence of research– Common questions, modes of

research

• Motivations to contribute– Academic credit

http://www.flickr.com/photos/askpang/327577395/

Potential Solutions

• $$$– Maintaining and developing resources is not free,

even if they are freely shared

• Curricular integration– Broaden contributor base by drawing on students

through coursework

• Deliberately cultivate a community– Train PhD students early in their studies

• Mechanisms to incentivize contribution

Conclusions

• Without external imperatives, CI for little science seems unlikely to emerge unaided

• CI requires standardization and movement toward normal science, which may be premature or simply inappropriate for many social sciences

• Benefits for early adopters: tools support efficient collaboration, enable rigorous research provenance, permit analysis replication, and speed time to results

Recommended