Surviving the Information Explosion Jaime Teevan, MIT with Christine Alvarado, Mark Ackerman and...

Preview:

Citation preview

Surviving the Information Explosion

Jaime Teevan, MIT

with Christine Alvarado, Mark Ackerman and David Karger

Let Me Interview You!

Web:–What’s the last Web page you visited? How did you get there?–Have you looked for anything on the Web?

Email:

Files:

–What’s the last email you read? What did you do with it?–Have you gone back to an email you’ve read before?

–What’s the last file you looked at? How did you get to it?–Have you looked for a file?

Overview

Introduction

Related Work

Study Methodology

Results: Search

Discussion

Intro

RW

Study

Res

Disc

Overview

Intro

RW

Study

Res

Disc

Introduction

Related Work

Study Methodology

Results: Search

Discussion

The Information Explosion

You must extract information from: 3 billion Web pages (Google) Dozens of incoming

emails daily Hundreds of files

on your personalcomputer

Intro

RW

Study

Res

Disc

Haystack:Personal Information Storage

Email Web pages

Files Calendar

Contacts

Haystack

Intro

RW

Study

Res

Disc

Haystack:Personal Information Storage

What was that paper I read last week about

Information Retrieval?Haystack

Intro

RW

Study

Res

Disc

Haystack:Personal Information Storage

Ah yes! Thank you.

Haystack

Intro

RW

Study

Res

Disc

Supporting Information Interaction

Treat different corpora the same? Provide access to meta-data?

– Keyword search (XP, advanced search)– Browse (Hearst)

Intro

RW

Study

Res

Disc

We don’t really know …

Understand access in the wild!

Overview

Intro

RW

Study

Res

Disc

Introduction

Related Work

Study Methodology

Results: Search

Discussion

– Interaction by corpus

– How people search

Interaction By Corpus

Paper documents– [Malone, 1983], [Whittaker & Hirshberg, 2001]

Files– [Barreau & Nardi, 1995]

Web– [Abrams, et al. 1998], [Byrne, et al. 1999]

Email/Calendar– [Whittaker & Snider, 1996], [Bellotti & Smith, 2000]

Intro

RW

Study

Res

Disc

How People Look for Information

Focus: Web Log analysis

– [Catledge & Pitkow, 95], [Tauscher & Greenberg 97]

Controlled tasks/environment– [Baldonado & Winograd, 1997], [Spool, 1998]

Situated navigation– Micronesian islanders [Suchman, 1987]– Electronic [Marchionini, 1995], [Hearst, 2000]– Information scent [Chi, Pirolli, Chen & Pitkow, 2001]

Intro

RW

Study

Res

Disc

Overview

Intro

RW

Study

Res

Disc

Introduction

Related Work

Study Methodology

Results: Search

Discussion

Method

Subjects– 15 MIT CS graduate students (5 women, 10 men)

Setup– 10 short interviews (~ 5 min.)– 1 long interview (~ 45 min.)

Topics– Web, Email, Files

Intro

RW

Study

Res

Disc

Short Interviews

Modified diary study [Palen, 2002] Randomly interrupted participant Two question types

– Last email/file/Web page looked at– Last email/file/Web page looked for

Goal: Discover patterns in searching and browsing

Intro

RW

Study

Res

Disc

Long Interviews

“Guided tour” of subject’s Web space, email, and file system

Goals:– Discover organizational patterns– Discover problems in

organizational structure– Relate organization to

search/browse behavior

Intro

RW

Study

Res

Disc

Overview

Intro

RW

Study

Res

Disc

Introduction

Related Work

Study Methodology

Results: Search

Discussion

– What and how

– Relating what and how

– Individual strategies

Complex Information Spaces

People had complex spaces Felt in control

Intro

RW

Study

Res

Disc

“That’s an interesting question. I think my email is the worst, because I have so much of it. And there are people on the other end who expect me to reply to it. My file system is pretty well organized. I have to go through it every once in a while, every couple of months and just kind of push things into the right folders and delete the old stuff. The Web just works, usually.”

What People Look For

Specific Information– A small fact– E.g., URL, phone number, appointment time

General Information– A broad set of information– E.g., good sneakers to buy, info on cancer

Specific Document– The actual document– E.g., a file to print, an email to reply to

Intro

RW

Study

Res

Disc

How People Look For Information

The last thing you looked for on the WebIntro

RW

Study

Res

Disc

Search is more than just keyword search

– Did you use a search engine?

Browse, use bookmarks, type URLs

“I was looking to figure out where Glaris was. When I lived in Switzerland there were only a few reasonable mapping places of the country. And so I had bookmarked [the Switzerland map site].”

– Traditional search– Jump directly to target– Specify everything up front

Strategies Looking for Information

Intro

RW

Study

Res

Disc

Teleporting

Orienteering– Use local navigation– [O’Day and Jeffries, 1993]– Could include keyword

search

Example: Orienteering

[…]J: I knew that she had a very small Web page saying, “I’m here at Harvard. Here’s my contact information.”

Intro

RW

Study

Res

Disc

Interviewer: Have you looked for anything on the Web today?Jim: I had to look for the office number of the Harvard professor.

[…]I: So you went to the Math department, and then what did you do over there?J: It had a place where you can find people and I went to that page and they had a dropdown list of visiting faculty, and so I went to that link and I looked for her name and there it was.

I: So how did you go about doing that?J: I went to the homepage of the Math department at Harvard

Example: Teleporting

What if Jim had teleported instead?

Could have typed into a search engine: “Connie Monroe, office number”

Intro

RW

Study

Res

Disc

“Keyword Search” and “Browse”“Keyword Search” and “Browse”“Keyword Search” and “Browse”

“Keyword Search”“Keyword Search”– Traditional search– Jump directly to target– Specify everything up front

“Keyword Search” and “Browse”

Intro

RW

Study

Res

Disc

Teleporting

Orienteering– Use local navigation– [O’Day and Jeffries, 1993]– Could include keyword

search

Teleporting

Orienteering

Orienteer to specific information

Relating How and What

People orienteer a lot What people look for related to how they look

Specific General Document

Orienteer 47 19 41

Teleport 34 23 17

Intro

RW

Study

Res

Disc

Surprise:

– Did you know what email contained that information?

Why So Much Orienteering?

Your last email searchIntro

RW

Study

Res

Disc

People look for the information source Specific information searches Document

searches

– What were you looking for?

Looking for the Source: Example

“I was looking to figure out where Glaris was. When I lived in Switzerland there were only a few reasonable mapping places of the country. And so I had bookmarked [the Switzerland map site].”

Intro

RW

Study

Res

Disc

Looking for the Source: Example

Interviewer: Have you looked for anything on the Web today?Jim: I had to look for the office number of the Harvard professor.I: So how did you go about doing that?J: I went to the homepage of the Math department at Harvard[…]J: I knew that she had a very small Web page saying, “I’m here at Harvard. Here’s my contact information.[…]I: So you went to the Math department, and then what did you do over there?J: It had a place where you can find people and I went to that page and they had a dropdown list of visiting faculty, and so I went to that link and I looked for her name and there it was.

Intro

RW

Study

Res

Disc

Individual Strategies

Search strategies varied by individual Pilers: Pile information Filers: File information

Intro

RW

Study

Res

Disc

Where was the last email you found?– Inbox?– Elsewhere?

File or Pile Email

0

2

4

6

8

0 50 100

% found in Inbox

# of

sea

rche

s

Intro

RW

Study

Res

Disc

Filer

Piler

How Individuals Search For Files

0 1 2 3 4 5 6 7 8 9

M

L

K

J

I

H

G

F

E

D

C

B

A

Keyword Search OrienteeringIntro

RW

Study

Res

Disc

Filers

Pilers

Teleport

Orienteer

Overview

Intro

RW

Study

Res

Disc

Introduction

Related Work

Study Methodology

Results

Discussion

– Understanding and applying what we learn

– Future work

UnderstandingTeleporting v. Orienteering

Why was orienteering chosen over teleporting? Teleporting doesn’t work Teleporting requires too much cognitive effort Risk of over-specifying target Orienteering gives knowledge of the source Teleporting a failure mode

– Can’t associate information with source– Can’t find the information source

Intro

RW

Study

Res

Disc

Understanding Filers v. Pilers

Why do filers teleport more than pilers? Irony: Those with good organization don’t take

advantage of it Filers have strictly organized information

Are used to defining meta-data for their information

Pilers loosely organize their information Are used to associative navigating

Intro

RW

Study

Res

Disc

Haystack: Applying What We Learn

Using meta-data: Support orienteering– Not about having the perfect search interface– Need ability to prompt

Individualized support– Pilers/filers– Learning individual behaviors

Intro

RW

Study

Res

Disc

Future Work: Search

Previously viewed information Causes of failure Searches across corpus Getting help from others

Intro

RW

Study

Res

Disc

Future Work: Organization

Consistency of organization across corpus

Corpora boundaries Context used in

organization Organization’s

effect on search

Intro

RW

Study

Res

Disc

Conclusion

Look at search in the wild Strategies: Teleport/Orienteer Individual strategies Future systems should:

– Support orienteering– Provide individualized support

Questions?

To learn more about Haystack:

http://haystack.lcs.mit.edu

Contact us with comments:

- teevan@ai.mit.edu

- calvarad@ai.mit.edu

Relating How and Corpus

Email and files: Almost always orienteered Easy to associate information with document Web: Teleported much more often

Email Files Web

Orienteer 59 42 19

Teleport 06 10 64

Intro

RW

Study

Res

Disc

Relating What and Corpus

Email Files Web

Specific 39 7 33

General 10 7 30

Document 08 35 14

Email searches were primarily for specific information File searches were primarily for documents Web searches were more evenly distributed

Intro

RW

Study

Res

Disc

Recommended