Upload
jordon-ansted
View
216
Download
2
Embed Size (px)
Citation preview
The Case for Browser ProvenanceThe Case for Browser Provenance
Daniel W. Margo and Margo Seltzer
Harvard School of Engineering and Applied Sciences
Overview
• Problem: Browser Data Management
• Solution: Provenance for Web Browsers
• Use Cases• Details and Challenges• Implementation
The Modern Browser:A Super-Application
• Originally a distributed document reader.• But now most documents are distributed.
• And the definition of “document” has changed:– Webmail
– YouTube
– Google Apps
• It is difficult for users to manage all this data.– e.g., recall a specific web page.
Browser Data Management (I)
• A “little big data” problem…– My history: ~25k objects in ~2 months.– Tractable for computers, but not for users.
• Traditional solution: Bookmarks.– Requires users to tag their data in advance…– …and to manage the bookmarks.
• Advanced solutions:– History Search (Google Chrome’s “New Tab” page)– Autocompletion (form history, saved passwords)
Browser Data Management (II)
• Firefox 3’s “Smart Location Bar”
from http://support.mozilla.com/en-US/kb/Smart+Location+Bar
• Most solutions powered by history and usage statistics.• “History and usage statistics” = provenance.
Traditional Browser History
Web Graphs (Firefox 3 Places)
Browser Provenance
Browser Provenance
Use Case:
Contextual History Search• Most history search is textual
• Edges imply contextual relationships.– E.g. “rosebud” “Citizen Kane”.
• 2-phase contextual search (Shah et. al):– Perform a textual history search.– Then, push the weight of results to neighbors.
• Similar to modern web search…– And good for the same reasons.
Use Case:
Personalizing Web Search• Context is created by the user.
– So a gardener relates “rosebud” “flower”.– Frustrating if Google returns “Citizen Kane”.
• Browser could clarify context to search engine!– Naïve: Just insert “flower” into “rosebud” searches.– If engine had a better interface, we could do better.
• Personalization with privacy.– Browser knows more about user than cookies can.– No need to give third parties raw personal data.
Use Case:
Time-Contextual History Search• Current histories can’t recreate prior state.
– e.g., “were these two pages open simultaneously?”
• Time relationships…– Are natural: “rosebud, and I think I was also looking at
gardening tools around that time.”– Narrow the search space a great deal.
• Related Work:– Gyllstrom and Soules’ “SeeTrieve”– Dumals et. al’s “Stuff I’ve Seen”
Use Case:
Download Lineage
• Need to know where data comes from.– For source attribution, finding updates, etc.
• URL is not always sufficient.– “This image came from…ImageShack!”
• This is exactly what provenance is for!– Just query ancestors!
Conclusion
• Browsers record many statistics.
• These statistics are provenance records.
• Provenance techniques can improve:– History search, via context.– Web search, via personalization.– Data management, via lineage.
• Some details in the paper.
• Excruciating details in future work.