The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and...

Preview:

Citation preview

The Case for Browser ProvenanceThe Case for Browser Provenance

Daniel W. Margo and Margo Seltzer

Harvard School of Engineering and Applied Sciences

Overview

• Problem: Browser Data Management

• Solution: Provenance for Web Browsers

• Use Cases• Details and Challenges• Implementation

The Modern Browser:A Super-Application

• Originally a distributed document reader.• But now most documents are distributed.

• And the definition of “document” has changed:– Webmail

– YouTube

– Google Apps

• It is difficult for users to manage all this data.– e.g., recall a specific web page.

Browser Data Management (I)

• A “little big data” problem…– My history: ~25k objects in ~2 months.– Tractable for computers, but not for users.

• Traditional solution: Bookmarks.– Requires users to tag their data in advance…– …and to manage the bookmarks.

• Advanced solutions:– History Search (Google Chrome’s “New Tab” page)– Autocompletion (form history, saved passwords)

Browser Data Management (II)

• Firefox 3’s “Smart Location Bar”

from http://support.mozilla.com/en-US/kb/Smart+Location+Bar

• Most solutions powered by history and usage statistics.• “History and usage statistics” = provenance.

Traditional Browser History

Web Graphs (Firefox 3 Places)

Browser Provenance

Browser Provenance

Use Case:

Contextual History Search• Most history search is textual

• Edges imply contextual relationships.– E.g. “rosebud” “Citizen Kane”.

• 2-phase contextual search (Shah et. al):– Perform a textual history search.– Then, push the weight of results to neighbors.

• Similar to modern web search…– And good for the same reasons.

Use Case:

Personalizing Web Search• Context is created by the user.

– So a gardener relates “rosebud” “flower”.– Frustrating if Google returns “Citizen Kane”.

• Browser could clarify context to search engine!– Naïve: Just insert “flower” into “rosebud” searches.– If engine had a better interface, we could do better.

• Personalization with privacy.– Browser knows more about user than cookies can.– No need to give third parties raw personal data.

Use Case:

Time-Contextual History Search• Current histories can’t recreate prior state.

– e.g., “were these two pages open simultaneously?”

• Time relationships…– Are natural: “rosebud, and I think I was also looking at

gardening tools around that time.”– Narrow the search space a great deal.

• Related Work:– Gyllstrom and Soules’ “SeeTrieve”– Dumals et. al’s “Stuff I’ve Seen”

Use Case:

Download Lineage

• Need to know where data comes from.– For source attribution, finding updates, etc.

• URL is not always sufficient.– “This image came from…ImageShack!”

• This is exactly what provenance is for!– Just query ancestors!

Conclusion

• Browsers record many statistics.

• These statistics are provenance records.

• Provenance techniques can improve:– History search, via context.– Web search, via personalization.– Data management, via lineage.

• Some details in the paper.

• Excruciating details in future work.

Recommended