How I spend my summer vacations
Justin F. BrunelleWS-DL Research Group
Department of Computer ScienceOld Dominion University
WADL 2013
Justin in a nutshell
• PhD Student at ODU• Dynamic representations–in the archives–Improved quality from–archived data–Alter-ego: Application Developer • at The MITRE Corporation–Big data & cloud computing
How much can we archive?
The setup
• 1,000 URIs from Twitter• 1,000 URIs from Archive-it• Capture with tools• Study the archivability
Good
Good
Good
Bad
Bad
Bad
Bad
Bad
Why?
Losing the Moment
•What we share != What we curate• 4.2% of Twitter is perfectly archived–Losing My Revolution: 11% gone in 2 years• 34.2% of Archive-it is perfectly archived• Accessibility? Gov vs. non-Gov?
Measuring memento damage
VS.
Not all embedded resources are created equal
Not all embedded resources are created equal
Planned Work
• Evaluate importance of missing stuff–Size, position–# CSS Classes–Not all stylesheets created equal–Missing border vs missing functionality– “Whitespace”–Provide Web service•Mechanical Turk evaluation of “damage”• Evaluate collections of mementos
What does it all mean?
• Archivability is measurable•Damage is measurable• If we can predict archivability….–We can try new methods of archiving on “hard to
capture” mementos–Attempt repairs on existing mementos–Gauge our successes in real-time•Next step: capturing dynamic content