83
2002.11.21 - SLIDE 1 IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2002 http://www.sims.berkeley.edu/academics/courses/ is202/f02/ SIMS 202: Information Organization and Retrieval

2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 1IS 202 – FALL 2002

Lecture 24: Interfaces for Information Retrieval

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2002http://www.sims.berkeley.edu/academics/courses/is202/f02/

SIMS 202:

Information Organization

and Retrieval

Page 2: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 2IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 3: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 3IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 4: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 4

Human-Computer Interaction (HCI)

• Human– The end-users of a program– The others in the organization

• Computer– The machines the programs run on

• Interaction– The users tell the computers what they want– The computers communicate results

Page 5: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 5IS 202 – FALL 2002

Shneiderman on HCI

• Well-designed interactive computer systems– Promote

• Positive feelings of success• Competence• Mastery

– Allow users to concentrate on their work, exploration, or pleasure, rather than on the system or the interface

Page 6: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 6IS 202 – FALL 2002

Shneiderman’s Design Principles

• Provide informative feedback

• Permit easy reversal of actions

• Support an internal locus of control

• Reduce working memory load

• Provide alternative interfaces for expert and novice users

Page 7: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 7

How to Design and Build UIs

• Task analysis

• Rapid prototyping

• Evaluation

• Implementation

Design

Prototype

Evaluate

Iterate at every stage!

Page 8: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 8IS 202 – FALL 2002

Information Visualization

• Utility– Inherently visual data– Making the abstract concrete– Making the invisible visible

• Techniques– Icons– Color highlighting– Brushing and linking– Panning and zooming– Focus-plus-context– Magic lenses– Animation

Page 9: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 9IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 10: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 10IS 202 – FALL 2002

Why Interfaces Don’t Work

• Because…– We still think of using the interface– We still talk of designing the interface– We still talk of improving the interface

• “We need to aid the task, not the interface to the task.”

• “The computer of the future should be invisible.”

Page 11: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 11IS 202 – FALL 2002

Norman on Design Priorities

1. The user—what does the person really need to have accomplished?

2. The task—analyze the task. How best can the job be done?, taking into account the whole setting in which it is embedded, including the other tasks to be accomplished, the social setting, the people, and the organization.

3. As much as possible, make the task dominate; make the tools invisible.

4. Then, get the interaction right, making things the right things visible, exploiting affordances and constraints, providing the proper mental models, and so on—the rules of good design for the user, written about many, many times in many, many places.

Page 12: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 12IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 13: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 13IS 202 – FALL 2002

“What Dr. Bush Foresees”

Cyclops CameraWorn on forehead, it would photograph anything you see and want to record. Film would be developed at once by dry photography.

MicrofilmIt could reduce Encyclopaedia Britannica to volume of a matchbox. Material cost: 5¢. Thus a whole library could be kept in a desk.

VocoderA machine which could type when talked to. But you might have to talk a special phonetic language to this mechanical supersecretary.

Thinking machineA development of the mathematical calculator. Give it premises and it would pass out conclusions, all in accordance with logic.

MemexAn aid to memory. Like the brain, Memex would file material by association. Press a key and it would run through a “trail” of facts.

Page 14: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 14IS 202 – FALL 2002

Memex

Page 15: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 15IS 202 – FALL 2002

Memex Detail

Page 16: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 16IS 202 – FALL 2002

Cyclops Camera

Page 17: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 17IS 202 – FALL 2002

Vocoder: “Supersecretary”

Page 18: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 18IS 202 – FALL 2002

Investigator at Work

• “One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may be both in miniature, so that he projects them for examination.”

Page 19: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 19IS 202 – FALL 2002

Memex

• “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

Page 20: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 20IS 202 – FALL 2002

Associative Indexing

• “[…] associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of memex. The process of tying two items together is the important thing.”

Page 21: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 21IS 202 – FALL 2002

The WWW circa 1945

• “It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. But it is more than this; for any item can be joined into numerous trails, the trails can bifurcate, and they can give birth to side trails.”

• “Wholly new forms of encyclopaedias will appear, ready-made with a mesh of associative trails running them, ready to be dropped into the memex and there amplified.”

Page 22: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 22IS 202 – FALL 2002

Selection

• “The heart of the problem, and of the personal machine we have here considered, is the task of selection. And here, in spite of great progress, we are still lame.

• Selection, in the broad sense, is still a stone adze in the hands of a cabinetmaker.”

—“Memex Revisited” (Bush 1965)

Page 23: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 23IS 202 – FALL 2002

Interaction Paradigms for IR

• Direct manipulation– Query specification– Query refinement– Result selection

• Delegation– Agents– Recommender systems– Filtering

Page 24: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 24IS 202 – FALL 2002

The “Adaptive” Memex

• “In an adaptive Memex, the owner has delegated to the machine the ability to propose or effect changes in the stored information. By analogy to business practice, the Memex is said to be functioning as an agent (Kay, 1984). The machine is playing an autonomous role within a restricted charter: to attempt a more effective organization of the information based on observations of actual use and topical similarities.”

Page 25: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 25IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 26: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 26IS 202 – FALL 2002

Task = Information Access

The standard interaction model for information access

1) Start with an information need2) Select a system and collections to search on3) Formulate a query4) Send the query to the system5) Receive the results6) Scan, evaluate, and interpret the results7) Stop, or8) Reformulate the query and go to Step 4

Page 27: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 27IS 202 – FALL 2002

HCI Questions for IR

• Where does a user start? – Faced with a large set of collections, how can

a user choose one to begin with?

• How will a user formulate a query?

• How will a user scan, evaluate, and interpret the results?

• How can a user reformulate a query?

Page 28: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 28IS 202 – FALL 2002

HCI for IR: Collection Selection

Question 1: Where does the user start?

Page 29: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 29IS 202 – FALL 2002

Starting Points for Search

• Faced with a prompt or an empty entry form … how to start?– Lists of sources– Overviews

• Clusters• Category Hierarchies/Subject Codes• Co-citation links

– Examples, Wizards, and Guided Tours– Automatic source selection

Page 30: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 30IS 202 – FALL 2002

List of Sources

• Have to guess based on the name

• Requires prior exposure/experience

Page 31: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 31IS 202 – FALL 2002

Old Lexis-Nexis Interface

Page 32: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 32IS 202 – FALL 2002

Overviews

• Supervised (manual) category overviews– Yahoo!– HiBrowse– MeSHBrowse

• Unsupervised (automated) groupings – Clustering– Kohonen feature maps

Page 33: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 33IS 202 – FALL 2002

Yahoo! Interface

Page 34: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 34IS 202 – FALL 2002

Example: MeSH and MedLine

• MeSH category hierarchy– Medical Subject Headings

– ~18,000 labels

– Manually assigned

– ~8 labels/article on average

– Average depth: 4.5

– Max depth: 9

• Top level categories:

anatomy diagnosis related disc

animals psych technology

disease biology humanities

drugs physics

Page 35: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 35IS 202 – FALL 2002

MeshBrowse (Korn & Shneiderman 95)

Page 36: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 36IS 202 – FALL 2002

HiBrowse (Pollitt 97)

Page 37: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 37IS 202 – FALL 2002

Summary: Category Labels

• Advantages:– Interpretable– Capture summary information– Describe multiple facets of content– Domain dependent, and so descriptive

• Disadvantages– Do not scale well (for organizing documents)– Domain dependent, so costly to acquire– May mis-match users’ interests

Page 38: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 38IS 202 – FALL 2002

Text Clustering

• What clustering does:– Finds overall similarities among groups of documents– Finds overall similarities among groups of tokens– Picks out some themes, ignores others

• How clustering works:– Cluster entire collection– Find cluster centroid that best matches the query– Problems with clustering

• It is expensive• It doesn’t work well

Page 39: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 39IS 202 – FALL 2002

Scatter/Gather

• Cutting, Pedersen, Tukey & Karger 92, 93, Hearst & Pedersen 95

• How it works– Cluster sets of documents into general “themes”, like

a table of contents – Display the contents of the clusters by showing topical

terms and typical titles– User chooses subsets of the clusters and re-clusters

the documents within – Resulting new groups have different “themes”

• Originally used to give collection overview• Evidence suggests more appropriate for

displaying retrieval results in context

Page 40: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 40IS 202 – FALL 2002

S/G Example: Query on “star”

Encyclopedia text14 sports

8 symbols 47 film, tv 68 film, tv (p) 7 music97 astrophysics 67 astronomy(p) 12 stellar phenomena 10 flora/fauna 49 galaxies, stars

29 constellations 7 miscellaneous

Clustering and re-clustering is entirely automated

Page 41: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 41IS 202 – FALL 2002

Scatter/Gather Interface

Page 42: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 42IS 202 – FALL 2002

Another Use of Clustering

• Use clustering to map the entire huge multidimensional document space into a number of small clusters

• “Project” these onto a 2D graphical representation– Group by doc: SPIRE, Kohonen maps– Group by words: Galaxy of News, HotSauce,

Semio

Page 43: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 43IS 202 – FALL 2002

“ThemeScapes” Clustering

Page 44: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 44IS 202 – FALL 2002

Kohonen Feature Maps on Text

Page 45: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 45

Study of Kohonen Feature Maps

• H. Chen, A. Houston, R. Sewell, and B. Schatz, JASIS 49(7)

• Comparison: Kohonen Map and Yahoo• Task:

– “Window shop” for interesting home page– Repeat with other interface

• Results:– Starting with map could repeat in Yahoo (8/11)– Starting with Yahoo unable to repeat in map (2/14)

Page 46: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 46

What Study Participants Liked

• Correspondence of region size to number of documents

• Overview (but also wanted zoom)

• Ease of jumping from one topic to another

• Multiple routes to topics

• Use of category and subcategory labels

Page 47: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 47

What Study Participants Wanted

• Hierarchical organization• Other ordering of concepts (alphabetical)• Integration of browsing and search• Correspondence of color to meaning • More meaningful labels• Labels at same level of abstraction• Fit more labels in the given space• Combined keyword and category search• Multiple category assignment (sports+entertain)

Page 48: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 48IS 202 – FALL 2002

Summary: Clustering

• Advantages:– Get an overview of main themes– Domain independent

• Disadvantages:– Many of the ways documents could group together

are not shown– Not always easy to understand what they mean– Can’t see what documents are about– Documents forced into one position in semantic space– Hard to view titles

• Perhaps more suited for pattern discovery– Problem: often only one view on the space

Page 49: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 49IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 50: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 50IS 202 – FALL 2002

HCI for IR: Query Specification

• Question 2: How will a user specify a query?

Page 51: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 51IS 202 – FALL 2002

Query Specification

• Interaction styles (Shneiderman 97)– Command language– Form fill– Menu selection– Direct manipulation– Natural language

• What about gesture, eye-tracking, or implicit inputs like reading habits?

Page 52: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 52IS 202 – FALL 2002

Command-Based Query Specification

• COMMAND ATTRIBUTE value CONNECTOR …– FIND PA shneiderman AND TW interface

• What are the ATTRIBUTE names?

• What are the COMMAND names?

• What are allowable values?

Page 53: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 53IS 202 – FALL 2002

Form-Based Query Specification

Page 54: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 54IS 202 – FALL 2002

Form-Based Query Specification

Page 55: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 55IS 202 – FALL 2002

Direct Manipulation Query Specification

Page 56: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 56IS 202 – FALL 2002

Menu-Based Query Specification

Page 57: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 57IS 202 – FALL 2002

Natural Language Query

• AskJeeves– http://www.ask.com/

Page 58: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 58IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 59: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 59IS 202 – FALL 2002

HCI for IR: Viewing Results

• Question 3: How will a user scan, evaluate, and interpret the results?

Page 60: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 60IS 202 – FALL 2002

Display of Retrieval Results

• Goal: – Minimize time/effort for deciding which

documents to examine in detail

• Idea:– Show the roles of the query terms in the

retrieved documents, making use of document structure

Page 61: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 61IS 202 – FALL 2002

Putting Results in Context

• Interfaces should – Give hints about the roles terms play in the

collection– Give hints about what will happen if various

terms are combined– Show explicitly why documents are retrieved

in response to the query– Summarize compactly the subset of interest

Page 62: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 62IS 202 – FALL 2002

Putting Results in Context

• Visualizations of query term distribution– KWIC, TileBars, SeeSoft, Virtual Shakespeare

• Visualizing shared subsets of query terms– InfoCrystal, VIBE

• Table of contents as context– SuperBook, Cha-Cha

Page 63: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 63IS 202 – FALL 2002

KWIC (Keyword in Context)

Page 64: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 64IS 202 – FALL 2002

TileBars

• Graphical representation of term distribution and overlap• Simultaneously indicate:

– Relative document length

– Query term frequencies

– Query term distributions

– Query term overlap

Page 65: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 65IS 202 – FALL 2002

TileBars Example

• Mainly about both DBMS & reliability

• Mainly about DBMS, discusses reliability

• Mainly about, say, banking, with a subtopic discussion on DBMS/Reliability

• Mainly about high-tech layoffs

Query terms:

What roles do they play in retrieved documents?

DBMS (Database Systems)

Reliability

Page 66: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 66IS 202 – FALL 2002

TileBars Example

Page 67: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 67IS 202 – FALL 2002

SeeSoft (Eick & Wills 95)

Page 68: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 68IS 202 – FALL 2002

David Small: Virtual Shakespeare

Page 69: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 69IS 202 – FALL 2002

Other Approaches

• Show how often each query term occurs in sets of retrieved documents– VIBE (Korfhage ‘91)– InfoCrystal (Spoerri ‘94)

Page 70: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 70IS 202 – FALL 2002

VIBE (Olson et al. 93, Korfhage 93)

Page 71: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 71IS 202 – FALL 2002

InfoCrystal (Spoerri 94)

Page 72: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 72IS 202 – FALL 2002

Problems with InfoCrystal

• Can’t see proximity or frequency of terms within documents

• Quantities not represented graphically

• More than 4 terms hard to handle

• No help in selecting terms to begin with

Page 73: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 73IS 202 – FALL 2002

Cha-Cha (Chen & Hearst 98)

• Shows “Table-Of-Contents”-like view, like SuperBook

• Focus+Context using hyperlinks to create the TOC

• Integrates Web Site structure navigation with search

Page 74: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 74IS 202 – FALL 2002

Lecture Overview

• Review and Continuation– Introduction to HCI– Why Interfaces Don’t Work– Early Visions: Memex

• Interfaces for Information Retrieval II– Collection Selection– Query Specification– Query Results– Query Reformulation

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 75: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 75IS 202 – FALL 2002

HCI for IR: Query Reformulation

• Question 4: How can a user reformulate a query?

Page 76: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 76IS 202 – FALL 2002

Query Reformulation

• Thesaurus expansion– Suggest terms similar to query terms

• Relevance feedback– Suggest terms (and documents) similar to

retrieved documents that have been judged to be relevant

– “More like this” interaction

Page 77: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 77IS 202 – FALL 2002

Relevance Feedback

• Modify existing query based on relevance judgements– Extract terms from relevant documents and add them

to the query– And/or re-weight the terms already in the query

• Two main approaches:– Automatic (pseudo-relevance feedback)– Users select relevant documents

• Users/system select terms from an automatically generated list

Page 78: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 78IS 202 – FALL 2002

Relevance Feedback Interface

Page 79: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 79IS 202 – FALL 2002

Revealing Internals

• Opaque (black box) – (Like web search engines)

• Transparent – (See used terms after Relevance Feedback )

• Penetrable – (Choose suggested terms before Relevance

Feedback )

• Which do you think worked best?

Page 80: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 80IS 202 – FALL 2002

Effectiveness Results

• Subjects using Relevance Feedback showed 17% - 34% better performance than without Relevance Feedback

• Subjects with penetration case did 15% better as a group than those in opaque and transparent cases

Page 81: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 81IS 202 – FALL 2002

Summary: Relevance Feedback

• Iterative query modification can improve precision and recall for a standing query

• In at least one study, users were able to make good choices by seeing which terms were suggested for Relevance Feedback and selecting among them

• So … “more like this” can be useful!

Page 82: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 82IS 202 – FALL 2002

Summary: HCI for IR

• Focus on the task, not the tool• Be aware of

– User abilities and differences– Prior work and innovations– Design guidelines and rules-of-thumb

• Iterate, iterate, iterate

• It is very difficult to design good UIs• It is very difficult to evaluate search UIs• Better interfaces in future should produce better

IR experiences

Page 83: 2002.11.21 - SLIDE 1IS 202 – FALL 2002 Lecture 24: Interfaces for Information Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and

2002.11.21 - SLIDE 83IS 202 – FALL 2002

Next Time

• Web Search Architecture and Crawling (Avi Rappoport, President of SearchTools.com)