34
Searching in Hypertext Searching in Hypertext Prof. Marti Hearst Prof. Marti Hearst SIMS 202, Lecture 28 SIMS 202, Lecture 28

Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Embed Size (px)

Citation preview

Page 1: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Searching in HypertextSearching in Hypertext

Prof. Marti HearstProf. Marti Hearst

SIMS 202, Lecture 28SIMS 202, Lecture 28

Page 2: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

TodayToday

HypertextHypertext What is it? What does it mean to search on it?

Page 3: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

What is Hypertext?What is Hypertext? Non-sequentially-linked pieces of text or other Non-sequentially-linked pieces of text or other

information typeinformation type Each “piece” of information is called a node Nodes are connected by computer-supported links

to create an information network

Goals:Goals: Access and read text from multiple

viewpoints Link related pieces of information together Annotate existing texts

Page 4: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

What is Hypertext?What is Hypertext?

Links can be uni- or bi-directionalLinks can be uni- or bi-directional Links can have typesLinks can have types

argument structure: causation, support metadata: author, publisher

A Browser provides navigational aidsA Browser provides navigational aids table of contents history lists bookmarks guided tours maps/overviews

Page 5: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

What is Hypertext?What is Hypertext?

Influential writers on the topicInfluential writers on the topic Vannevar Bush ‘64 (US Director of Science

and Research) Ted Nelson ‘67: Coined the term Doug Engelbart ‘63: a variant approach Conklin ‘87: Definitive overview Halasz et al. ‘87: Notecards in a nutshell

Systems:Systems: notecards, hypercard, superbook, many

others

Page 6: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

The WWW and HypertextThe WWW and Hypertext

Finally realized the dream of visionaries!Finally realized the dream of visionaries! HTML allows only impoverished HTML allows only impoverished

functionalityfunctionality no back links only one type of link

Standard browsers are lacking as wellStandard browsers are lacking as well No history list No overview of sites Inconsistent marking of already visited pages Bookmarks are uniform in appearance

Page 7: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

What does it mean to search What does it mean to search hypertext?hypertext?

Page 8: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

What does it mean to search What does it mean to search hypertext?hypertext? Nelson ‘67 explicitly dissociates hypertext from Nelson ‘67 explicitly dissociates hypertext from

document retrievaldocument retrieval Nice for browsing, but bad for fact retrieval?Nice for browsing, but bad for fact retrieval? A contrast: planning vs. browsingA contrast: planning vs. browsing

Simply follow links less effort learn by serendipity (maybe)

Formulate query and show pages of hits as in standard IR requires planning systems assumed a professional intermediary

Hypertext allows for a novel interaction between Hypertext allows for a novel interaction between navigating links and formulating queriesnavigating links and formulating queries

Page 9: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Major challenge: Major challenge: (Marchionini & Shneiderman 88)(Marchionini & Shneiderman 88)

Balancing the power of analytic search with the ease of browsing

Solution: flexible, powerful user interfaces that allow for

selection feedback But: flexibility leads to complexity

Page 10: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

How to evaluation search in How to evaluation search in hypertext?hypertext?

One strategy -- evaluate a hypertexted One strategy -- evaluate a hypertexted book, manual, or help system.book, manual, or help system. Compare against paper Compare query formulation with link

navigation Ignore hyperlinks and simply compare

ranking algorithms Problem: can end up evaluating the Problem: can end up evaluating the

quality of the hyperlinksquality of the hyperlinks

Page 11: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Hypertext Source TextsHypertext Source Texts

Encyclopedia (Marchionini 89)Encyclopedia (Marchionini 89) Software manual (Egan et al. 89, Software manual (Egan et al. 89,

Campagnoni & Ehrlich 89)Campagnoni & Ehrlich 89) Library of Chemistry Journal Library of Chemistry Journal

Articles (Egan et al.)Articles (Egan et al.)

Page 12: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Superbook (Remde et al. 87)Superbook (Remde et al. 87)

Next-generation hyper-media bookNext-generation hyper-media book Functions:Functions:

Word Lookup: Show a list query words, stems, and word combinations

Table of Contents: Dynamic fisheye view of the hierarchical topics list

Search words can be highlighted here too Page of Text: show selected page and highlighted

search terms Hypertext features linking through search Hypertext features linking through search

words rather than page linkswords rather than page links

Page 13: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Superbook Superbook (http://superbook.bellcore.com/SB)(http://superbook.bellcore.com/SB)

Page 14: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Goal: compare Superbook with paper bookGoal: compare Superbook with paper book Tasks:Tasks:

structured search: find answer to a specific question using an unfamiliar reference text

open-book essay: synthesize material from different places in the document

incidental learning: how much useful information about the document is acquired while doing other tasks

subjective ratings: user reactions to the form and content

Page 15: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Factors for structured search:Factors for structured search: Does the user’s question correspond to

the author’s organization of the material? Half the study search questions contained cues as to

which topic heading to use, half did not

Does the user’s query as stated contain some of the same words as those used by the author?

Half the questions contained words taken from the text surrounding the target text, half did not

Page 16: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Example search questions:Example search questions: Find the section discussing the basic concept that the value of

any expression, however complicated, is a data structure. The dataset ‘murder’ contains murder rates per 100,000

population. Find the section that says which staes are included in this dataset.

Find the section that describes pie charts and states whether or not they are a good means for analyzing data.

Find the section that describes the first thing you have to do to get S to print pictoral output.

blue boldfaceblue boldface:: terms taken from text terms taken from text

pink italicspink italics:: terms taken from topic heading terms taken from topic heading

Page 17: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Hypotheses:Hypotheses: Conventional document would require good

cues from the topic headings, but Superbook would not.

Word lookup function hypothesized to allow circumvention of author’s organization scheme.

Superbook’s search facility would result in open-book essays that include more information.

Page 18: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Source text: statistics package manual (562 Source text: statistics package manual (562 pp.)pp.)

Compare:Compare: superbook vs. paper versions

Four sets of search questions of mixed typeFour sets of search questions of mixed type 20 university students with stats background20 university students with stats background Superbook training tutorialSuperbook training tutorial 15 minutes per structured query15 minutes per structured query One open-book essay retainedOne open-book essay retained

Page 19: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study

Results: Superbook had an advantage in:Results: Superbook had an advantage in: overall average accuracy (75% vs. 62%)

Superbook did better on questions with words from text but not in topic headings

Print version did better on questions with no search hits speed (5.4 vs. 5.6 min/query on average)

Superbook faster for text-only cues Paper faster for no questions with no hits

essay creation average score of 5.8 vs. 3.6 points out of 7 average 8.8 facts vs. 6.0 out of 15

Page 20: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Egan et al. StudyEgan et al. Study Results:Results:

Subjective ratings: Superbook users rated it easier than paper (5.8 vs. 3.1 out of 7) Superbook users gave higher ratings on the stat system

Incidental learning: Superbook users recalled more chapter headings

maybe because these were continually displayedmaybe because these were continually displayed No other differences were significant

Problems with study:Problems with study: Did not compare against non-hypertext computerized

version Did not show if/how hyperlinks affected results

Page 21: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

Data source was somewhat more Data source was somewhat more hypertext likehypertext like

Eight handbooks for a computer help deskEight handbooks for a computer help desk Still only had three levels plus an indexStill only had three levels plus an index

Top level: list of the eight handbooks Second level: t-o-c of a given handbook Third level: pages with reference information

Index: a list of concepts, objects, and Index: a list of concepts, objects, and procedures described throughout procedures described throughout

Page 22: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich StudySample indexSample index

Mail indexMail indexC: Cancel button …………………..C: Cancel button …………………..Writing and sending mail, 2Writing and sending mail, 2

cancelling a mail message ….cancelling a mail message ….Writing and sending mail, 2Writing and sending mail, 2

in Stay Up mode…………….in Stay Up mode…………….Writing and sending mail, 4Writing and sending mail, 4

Change DirectoryChange Directory

button …………………………..button …………………………..Saving and Loading Mail, 3Saving and Loading Mail, 3

option ……………………………option ……………………………Saving and Loading Mail, 2Saving and Loading Mail, 2

panel …………………………….panel …………………………….Saving and Loading Mail, 2Saving and Loading Mail, 2

changing compositionchanging composition

window modes ……………….window modes ……………….Writing and Sending Mail, 3Writing and Sending Mail, 3

changing the current changing the current

Mail directory …………………Mail directory …………………Reading Mail, 1Reading Mail, 1

… …

Page 23: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

Questions:Questions: Previous studies suggest users would rather

just browse links than search an index Will this happen when questions are designed to

be easier with the index? It has been suggested that ability to

perceive and manipulate spatial patterns should reduce disorientation in hyperlinked systems

Is there a correlation between efficient navigation and good visualization ability?

Page 24: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

Browsing defined as:Browsing defined as: scanning the table of contents and paging

through relevant topics to find answers Analytic strategy defined as:Analytic strategy defined as:

using indexes to look up specific query terms and following the links to the appropriate page

Therefore not really looking at formulating Therefore not really looking at formulating a search query.a search query.

Also, the hierarchy is very shallow.Also, the hierarchy is very shallow.

Page 25: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

Test questionsTest questions Three meant to be answerable with browsing Three meant to be answered with index *

I just deleted a mail message. Where can I found how to get it back?*

I need to find all my files that were modified after August 8, 1988. Where should I look?

How do I change the size of a window so it fits the height of the screen?*

Where can I find information on how to configure how my desktop behaves when I login?

Page 26: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

ResultsResults Most users preferred to browse even though

all questions could be answered with the index

subjects used index 1.92 times out of six questions subjects successful with index 1.6 times out of 6

Initial probability of succeeding with browsing quite high (75%) but then declined sharply with subsequent attempts

Probability of success with repeated attempts on index increased with number of attempts

Page 27: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study Results, continuedResults, continued

Searches meant to be analytic could be resolved using browsing

2 out of 4 of the most efficient searchers did not use indexes

The other 2 used them for 2 questions. Indexes mainly used when phrasing of

question was not amenable to the t-o-c Probably can be partly attributed to the

shallowness of the information architecture

Page 28: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Campagnoni & Ehrlich StudyCampagnoni & Ehrlich Study

Results, continuedResults, continued Found a strong correlation between visualization

ability and search time. Interestingly, most of the extra time seemed to be

spent by the poorer visualizers in returning to the top level node rather than just going up one level

CommentaryCommentary The study is flawed by not allowing query formulation Also, should test conditions in which users forced to

use analytic vs browsing, rather than just observing their choices.

Page 29: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)

Hyperties Dataset Hyperties Dataset 14 of 16 subjects asked to search for factual

information used the alphabetical index but it did not seem to have a topic index -- only

embedded hyperlinked terms No query formulation either

Museum DatasetMuseum Dataset In undirected use, 2/3 of all selections via hyperlinks When asked to search for specific facts, index users

were much faster than link users This difference became smaller with longer use

Page 30: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)

Hyperties on a large encyclopediaHyperties on a large encyclopedia Elementary school students used

querying successfully even though their queries were not well-formulated

Results were listed as alphabetic titles showing term frequencies

Students scanned list, picked promising docs, and choose terms from the docs (manual relevance feedback)

Page 31: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Other studies Other studies (Marchioni & Shneiderman 88)(Marchioni & Shneiderman 88)

Compared two groups of novices (high school)Compared two groups of novices (high school) Scan and select strategy, or Analytic strategy (using Boolean connectors

and planning search in advance) Effectiveness was pretty much even, but

analytical searchers were slightly faster Navigation and DisorientationNavigation and Disorientation

After finishing with an article, many subjects moved all the way to the top of the menu hierarchy to the query formulation screen rather than moving up just one level.

Page 32: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

More Recent: CHI Browse-offMore Recent: CHI Browse-off

CHI: Computer-Human Interaction CHI: Computer-Human Interaction ConferenceConference

This was an informal contestThis was an informal contest Goal: Compare GUI Browsers against each Goal: Compare GUI Browsers against each

other and paperother and paper Started with a set of hierarchically organized Started with a set of hierarchically organized

“facts” (not documents)“facts” (not documents) System designers raced to find answers to System designers raced to find answers to

questions, a winner for each questionquestions, a winner for each question Then novices took a try at itThen novices took a try at it

Page 33: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

CHI Browse-offCHI Browse-off

Problems with experiment designProblems with experiment design Doing a search task on a browsing system Answers are in the titles themselves Small collections

Results (my interpretation)Results (my interpretation) Those system designers who knew the most

facts/were smartest won Novices did better with familiar systems Paper won.

Page 34: Searching in Hypertext Prof. Marti Hearst SIMS 202, Lecture 28

Marti A. HearstSIMS 202, Fall 1997

Searching on HypertextSearching on Hypertext

Conclusions: More studies neededConclusions: More studies needed Bad visualizers don’t take advantage of

the link structure Most people would rather select links

Complication: how good are the linksComplication: how good are the links Most studies not done on systems with Most studies not done on systems with

the heterogeneity of the webthe heterogeneity of the web compare yahoo and altavista, for example