19
Improving RDataTracker Accessibility and Functionality: R Markdown and Caching Siqing (Alex) Liu, Amherst College Mentors: Barbara Lerner, Emery Boose Harvard Forest, Summer 2016

Alex Liu Harvard Forest Presentation

Embed Size (px)

Citation preview

Improving RDataTrackerAccessibility and

Functionality: R Markdown and Caching

Siqing (Alex) Liu, Amherst CollegeMentors: Barbara Lerner, Emery Boose

Harvard Forest, Summer 2016

Background: Provenance

• A record of ownership of a work of art or an antique, used as a guide to authenticity or quality1

1Oxford English Dictionary

The Beautiful Princess, allegedly a rediscovery from Leonardo da Vinci

Background: Data Provenance

• Why is it Important for Science?• Validity• Reproducibility

• Reproducibility Crisis

Data Provenance in R: RDataTrackerData R Script Data Derivation

Graph (DDG)

Problem: Organizing Data Derivation Graphs (DDGs)

Solution: R Markdown

• What is R Markdown?• File Format that allows simple

creation of formatted and interactive output from R

Organizing Scripts with R Markdown

Problem: Repeated Execution

• Intensive Calculations• Complex• Over large data sets

• Takes a significant amount of time each run• Minor changes take a lot

of time

ModifiedOriginal

Solution: Caching

• What is Caching?• Storing intermediate values so they don’t need to be reprocessed

• Problem: When to Re-execute?• Dependencies

Caching with Knitr: Chunk-by-Chunk Dependencies

DDG: Command-by-Command Dependencies

ddg.cacheRun Time

Modified

Conclusion: Accessibility and Functionality• R Markdown• Accessibility: Directly read in R Markdown files• Functionality: Automatically organize nodes in DDG

• Caching• Accessibility: Make repeated execution of compute-intensive

scripts faster• Functionality: Accurate, command-by-command caching and

tracing

Further Work

• More complete caching• Compression vs. Speed• Side Effects• Creating a plot• Writing file• Printing results to console

AcknowledgementsEmery Boose

Harvard ForestHarvard University

Barbara LernerDept. of Computer Science

Mount Holyoke College