Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick

Individualized Knowledge Access

David KargerLynn Andrea SteinMark AckermanRalph Swick

Information Access

A key task in Oxygen: help people manage and retrieve information

Three overlapping projects: Haystack:

information storage and retrieval application clients

Semantic Web: next-generation metadata

Volt: collaborative access

Presentation Overview

Motivation Information access behavior and goals

System Design & Architecture Data Model Interacting data and UI components

Working applications Base haystack Frontpage Volt

Motivation

Problem Scenario

I try solving problems using my data: Information gathered personally High quality, easy for me to understand Not limited to publicly available content

My organization: Personal annotations and meta-data Choose own subject arrangement Optimize for my kind of searching

Adapts to my needs

Then Turn to a Friend

Leverage They organize information for their own

use Let them find things for me too

Shared vocabulary They know me and what I want

Personal expertise They know things not in any library

Trust Their recommendations are good

Last to Library/web

Answer usually there But hard to find Wish: rearrange to suit my needs Wish: help from my friends in looking

Lessons

Individualized access Best tools adapt to individual ways of

organizing and seeking data

Individualized knowledge People know more than they publish That knowledge is useful to them and others

Collaborative use Right incentives lead to sharing and joint use

Haystack

Individualized access My data collection, organization Search tools tuned for me

Collaborate to leverage individual knowledge Access unpublished information in others’ haystacks Self interest public benefit

Lens to personalize access to the world library Rearrange presentation to suit my personal needs

Example

Info on probabilistic models in data mining My haystack doesn’t know, but “probability” is

in lots of email I got from Tommi Jaakola Tommi told his haystack that “Bayesian”

refers to “probability models” Tommi has read several papers on Bayesian

methods in data mining Some are by Daphne Koller I read/liked other work by Koller My Haystack queries “Daphne Koller Bayes”

on Yahoo Tommi’s haystack can rank the results for

me…

System Design

Gathering Data

Haystack archives anything Web pages browsed, email sent and

received, address book, documents written

And any properties, relationships Text of object (for text search) Author, title, color, citations, quotations,

annotations, quality, last usage

Users freely add types, relationships

Semantic Web

Arbitrary objects, connected by named links

No fixed schema User extensible

Sharable by any application A new “file

system”?

Doc

D. Karger

Haystack

title

author

Outstanding

qual

ity

says

HTML type

Gathering Data

Active user input Interfaces let user add data, note relationships

Mining data from prior data Plug-in services opportunistically extract data

Passive observation of user Plug-ins to other interfaces record user actions

Other Users

Data Extraction Services

Web Observer Proxy

Triple Store

Mail Observer Proxy

Machine Learning Services

Web Viewer

Volt Viewer/ Editor

Spider

Sample Applications

Sample Applications

Because everything uses the Semantic Web constructions, a variety of application clients can share information Web Browser---data viewer FrontPage---personalized information

filter Volt---collaboration tool

Haystack via Web

Web server interface

Basic operations: Insert

objects View objects Queries

Haystack via Web

Haystack via Web

Viewer shows one node and associated arrows

Service notices we’ve archived a directory; so archives the objects it contains (and so on…)

Haystack via Web

Services detect document type, extract relevant metadata

Output can specialize by type of object

Mediation

Haystack can be a lens for viewing data from the rest of the world Stored content shows what user

knows/likes Selectively spider “good” sites Filter results coming back

Compare to objects user has liked in the past

Can learn over time

Example - personalized news service

News Service

News Service

Scavenges articles from your favorite news sources Html parsing/extracting services

Over time, learns types of articles that interest you Prioritizes those for display

Content provider no longer controls viewing experience No more ads

Personalized News Service

Collaborative Access

Want to leverage others’ work in organizing information No need to “publish” expertise Exposed automatically---without effort Self interest helps others

Volt

Volt is about collaboration between people The Haystack architecture allows easy

collaboration among individuals semantic web references to Haystack

objects Individuals share parts of their

Haystack Group spaces and shared notebooks

Volt

Collaborators

Those I interact with Frequent mail contact Frequent visits to their home page

Those with shared content And who have same opinions about

content Collaborative filtering techniques

ReferralsExpertise search engine

Expertise Beacon

Volt Expertise Beacons

Group spaces and shared notebooks Create individual and group profiles

Profiles can be used to find other people Allows targeted search “Who else is working on this project?”

User controls visibility/privacy

Summary

Next generation information accessSemantic Web

provides a language and capabilities for meta-data

Haystack teases out individual knowledge, stores it in a coherent fashion, and allows a variety of application clients to leverage

individual meta-data

Volt turns individual knowledge into a community

resource

More Info

http://haystack.lcs.mit.edu/http://www.w3c.org/2001/[email protected]@[email protected]@w3.org

Documents

Individualized Knowledge Access David Karger Lynn Andrea Stein Mark Ackerman Ralph Swick