14
A User-Centered Interface for Information Exploration in a Heterogeneous Digital Library Michelle Q. Wang Baldonado Xerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304. E-mail: [email protected] The advent of the heterogeneous digital library provides the opportunity and establishes the need for the design of new user interfaces. As a single portal to a wide array of information sources, the heterogeneous digital library requires that a variety of cataloging schemas, subject domains, document genres, and institutional biases be accommodated. SenseMaker is a user-centered inter- face for information exploration in a heterogeneous dig- ital library. It unifies citations and articles from hetero- geneous sources by presenting them in a common schema with affordances for quick comparisons of properties. At the same time, SenseMaker users can recover a degree of context by iteratively organizing citations and articles into higher-level bundles based on either metadata or content. Furthermore, SenseMaker enables users to move fluidly from browsing to search- ing by introducing structure-based searching and struc- ture-based filtering. This paper outlines the SenseMaker interface design and details some of our experimental findings surrounding its use. 1. Introduction Many libraries have a degree of “heterogeneity” in their holdings and in their finding aids. For example, a student who needs to write a term paper and is versed in library research might begin by looking for well-established books in the card catalog and then turn to searching for current magazine articles in a guide to periodicals (Mann, 1993). Yet heterogeneity in the traditional library is limited by the library’s acquisition and cataloging practices—indeed, even by the size of the building itself. In contrast, the digital library need not be restricted by the same physical, institu- tional, or community-based boundaries. A digital library may provide a single point of access to a wide range of autonomous, distributed sources (Paepcke et al., 1996). Each source may vary from the others in terms of its quality, its cataloging schema, its subject domain, its document genre, and so forth. The resulting increased heterogeneity raises new challenges for our social practices, our commu- nity policies, and our technologies. This article, situated in the world of technology, de- scribes a particular user interface (SenseMaker) that we built to support users in exploring a heterogeneous digital library. The SenseMaker design seeks to meet three classes of user needs. First, users must be able to issue a single query to multiple, disparate sources and to view the ensuing results in a uniform way. At the infrastructure level, Sense- Maker addresses this need by using technology developed as part of the Stanford Digital Library project. At the interface level, SenseMaker achieves presentation unifor- mity by displaying results as automatically constructed ci- tations expressed in a common schema. Furthermore, Sense- Maker provides users with affordances for conducting quick comparisons of citation properties—a useful technique for result analysis. Second, users must be able to develop strategies for coping with the loss of context that occurs when a variety of independent sources are melded together. SenseMaker helps users to recover a degree of context by giving them tools for iteratively organizing citations and articles into higher-level bundles, based on either metadata or content. This interface- level aggregation facility enables users to view a collection from a variety of perspectives. For example, a user could view a collection of results according to topical units (as derived by a clustering algorithm) or according to author (as determined by a database grouping algorithm). Third, users must be able to obtain support in moving more fluidly from browsing to searching (Marchionini, 1995). Users’ interests evolve during information explora- tion as they learn and discover more about the topic at hand (Bates, 1989; O’Day & Jeffries, 1993). SenseMaker facili- tates both the contextual evolution of a user’s interests and the moves between browsing and searching with two hybrid strategies: structure-based searching and structure-based fil- tering. Each remaining section of this paper explores one of these three user needs in more detail by presenting the background and related literature surrounding the need, the Accepted July 2, 1999. © 2000 John Wiley & Sons, Inc. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 51(3):297–310, 2000 CCC 0002-8231/00/030297-14

A user-centered interface for information exploration in a heterogeneous digital library

Embed Size (px)

Citation preview

Page 1: A user-centered interface for information exploration in a heterogeneous digital library

A User-Centered Interface for Information Exploration ina Heterogeneous Digital Library

Michelle Q. Wang BaldonadoXerox Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, CA 94304.E-mail: [email protected]

The advent of the heterogeneous digital library providesthe opportunity and establishes the need for the designof new user interfaces. As a single portal to a wide arrayof information sources, the heterogeneous digital libraryrequires that a variety of cataloging schemas, subjectdomains, document genres, and institutional biases beaccommodated. SenseMaker is a user-centered inter-face for information exploration in a heterogeneous dig-ital library. It unifies citations and articles from hetero-geneous sources by presenting them in a commonschema with affordances for quick comparisons ofproperties. At the same time, SenseMaker users canrecover a degree of context by iteratively organizingcitations and articles into higher-level bundles based oneither metadata or content. Furthermore, SenseMakerenables users to move fluidly from browsing to search-ing by introducing structure-based searching and struc-ture-based filtering. This paper outlines the SenseMakerinterface design and details some of our experimentalfindings surrounding its use.

1. Introduction

Many libraries have a degree of “heterogeneity” in theirholdings and in their finding aids. For example, a studentwho needs to write a term paper and is versed in libraryresearch might begin by looking for well-established booksin the card catalog and then turn to searching for currentmagazine articles in a guide to periodicals (Mann, 1993).Yet heterogeneity in the traditional library is limited by thelibrary’s acquisition and cataloging practices—indeed, evenby the size of the building itself. In contrast, the digitallibrary need not be restricted by the same physical, institu-tional, or community-based boundaries. A digital librarymay provide a single point of access to a wide range ofautonomous, distributed sources (Paepcke et al., 1996).Each source may vary from the others in terms of its quality,its cataloging schema, its subject domain, its documentgenre, and so forth. The resulting increased heterogeneity

raises new challenges for our social practices, our commu-nity policies, and our technologies.

This article, situated in the world of technology, de-scribes a particular user interface (SenseMaker) that webuilt to support users in exploring a heterogeneous digitallibrary. The SenseMaker design seeks to meet three classesof user needs. First, users must be able to issue a singlequery to multiple, disparate sources and to view the ensuingresults in a uniform way. At the infrastructure level, Sense-Maker addresses this need by using technology developedas part of the Stanford Digital Library project. At theinterface level, SenseMaker achieves presentation unifor-mity by displaying results as automatically constructed ci-tations expressed in a common schema. Furthermore, Sense-Maker provides users with affordances for conducting quickcomparisons of citation properties—a useful technique forresult analysis.

Second, users must be able to develop strategies forcoping with the loss of context that occurs when a variety ofindependent sources are melded together. SenseMaker helpsusers to recover a degree of context by giving them tools foriteratively organizing citations and articles into higher-levelbundles, based on either metadata or content. This interface-level aggregation facility enables users to view a collectionfrom a variety of perspectives. For example, a user couldview a collection of results according to topical units (asderived by a clustering algorithm) or according to author (asdetermined by a database grouping algorithm).

Third, users must be able to obtain support in movingmore fluidly from browsing to searching (Marchionini,1995). Users’ interests evolve during information explora-tion as they learn and discover more about the topic at hand(Bates, 1989; O’Day & Jeffries, 1993). SenseMaker facili-tates both the contextual evolution of a user’s interests andthe moves between browsing and searching with two hybridstrategies: structure-based searching and structure-based fil-tering.

Each remaining section of this paper explores one ofthese three user needs in more detail by presenting thebackground and related literature surrounding the need, the

Accepted July 2, 1999.

© 2000 John Wiley & Sons, Inc.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE. 51(3):297–310, 2000 CCC 0002-8231/00/030297-14

Page 2: A user-centered interface for information exploration in a heterogeneous digital library

facets of the SenseMaker interface design that address theneed, and the lessons learned from efforts to evaluate thematch between need and interface design. The contributionsof this paper are its invention of a new user interfaceconstruct (hi-cites) for browsing items in a heterogeneousdigital library, its development of a uniform interface ag-gregation metaphor (iterative bundling) for viewing resultsat a higher level of granularity, and its introduction ofstructure-based searching and structure-based filtering to theinterface.

2. Adding Uniformity to Searches ofHeterogeneous Sources

Challenges in searching a heterogeneous digital libraryinclude both communicating with the sources and commu-nicating the results back to the user. This section details theinfrastructural scaffolding that enables SenseMaker users toquery multiple sources at once; describes a novel interfaceconstruct (hi-cites) that facilitates the uniform browsing ofheterogeneous search results; and presents a study per-formed to compare hi-cites with other techniques for dis-playing results.

2.1 Background

To provide users with the ability to launch a singlequery-based search to multiple, heterogeneous sources andto view the ensuing results in a unified way, SenseMakerutilizes infrastructure developed as part of the StanfordDigital Library project. At its core, SenseMaker uses theStanford InfoBus (see Fig. 1), a distributed object substratethat incorporates autonomous, distributed digital-libraryservices by wrapping them in proxies. A proxy is respon-sible for communicating with InfoBus clients via standardInfoBus protocols and also for translating InfoBus requestsand messages back to the service that it represents. Anexample of a Stanford InfoBus proxy is the wrapper forCSQuest, an automatically generated concept thesaurus forthe computer-science domain (Chen et al., 1997). The proxyworks by accessing the CSQuest service (located at theUniversity of Arizona) via HTTP over the World Wide

Web. From the point of view of SenseMaker, CSQuest is aterm-suggestion service. Given one term, CSQuest respondswith a list of related terms, with each term accompanied bya relevance score.

In the realm of information sources, the Stanford InfoBusprovides wrappers for numerous search services (e.g., Al-taVista, Dialog). The Stanford Digital Library Interoperabil-ity Protocol (DLIOP), which rests on top of the InfoBusfoundation, facilitates uniform communication with thesesearch services. It is an object-based search protocol thatallows for lazy materialization of results, flexible switchingbetween stateful and stateless communication modes, anddynamic load balancing (Hassan & Paepcke, 1997).

In addition to a uniform communication protocol, auto-matic searching of heterogeneous sources requires an ap-proach to handling heterogeneous cataloging schemas* andquery languages. The Stanford Digital Library MetadataArchitecture (Baldonado et al., 1997) provides proxies forcataloging schemas (referred to as “attribute models” in thearchitecture), heuristic translation services that can mapqueries and results from one schema to another, and anextension to source proxies that describes the query lan-guage and cataloging schema employed by the source (seeFig. 2).

The Stanford query translation machinery (Chang, Gar-cia-Molina, & Paepcke, 1996) uses the information about asource’s query language and cataloging schema to translatea query expressed in a rich, Boolean query language andcommon query schema to the query language and schemasupported natively by the source. When the results arereturned by the various sources, SenseMaker maps the re-sults back onto the common schema and presents them tothe user.

2.2 SenseMaker Interface Design: Hi-Cites

SenseMaker enables users to formulate a single querythat is then sent to a selected set of search services. The

* Examples of cataloging schemas include the Dublin Core (Weibel etal., 1995), the USMARC fields (Network Development and MARC Stan-dards Office, 1996), and Bib-1 (Z39.50 Maintenance Agency, 1995).

FIG. 1. Examples of services on the Stanford InfoBus. The labelI is usedto mark all InfoBus services; the clouds represent autonomous services.

FIG. 2. Examples of cataloging schemas, translation services, and sourcemetadata on the InfoBus.

298 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 3: A user-centered interface for information exploration in a heterogeneous digital library

results that are returned are translated into a commonschema. For example, the “creator” field returned by aDublin Core source and the “author” field returned by aBib-1 source are both translated to an “author” field inSenseMaker.

In the interface, the results are presented as hi-cites(Baldonado & Winograd, 1998). Hi-cites, dynamically cre-ated citations with active highlighting, are a cross betweencitation sets and tables. Citation sets are widely used for thepresentation of results. Many online library catalogs andWorld Wide Web search engines employ them. Over time,citations have evolved to become both visually compact andeasy to read. Typographical and other marking conventions(e.g., font size in Web search engines) serve to differentiatevisually the fields of the citation. For example, titles areoften italicized, fields are often separated by periods, andjournal names are often italicized. However, citations can bedifficult to use if the reader’s goal is to compare titles orauthors in the citation set. The reader must scan eachcitation for the relevant field.

Tables, though less common for presenting results, arestill used in a variety of digital-library interfaces (e.g., Wake& Fox, 1995), including in an early version of SenseMaker(Baldonado & Winograd, 1997). An important benefit oftables is that they enable the user to compare results quickly.For example, the user can see at a glance all representedauthors. On the downside, tables for heterogeneous searchitems can be sparse. For example, a movie has a producer,but a book does not. A Web site has a URL, but a printedmap does not. If a table is to capture both producers andURLs, it needs to have a column for each field. The table islikely to become sparse with even a modest degree ofheterogeneity.

The interface concept of hi-cites leverages the advan-tages of both citation sets and tables. Hi-cite elements arelike citations in that they concatenate visually marked at-tribute values (subject to wraparound), treating attributevalues as though they were words and phrases in a singlesentence. Definable citation styles encapsulate the ordering,marking, and rendering rules required for this display. Forexample, one citation style might stipulate thattitle appearbeforeauthor,while another might change that ordering. Inaddition, one citation style might mark aneditor value withthe suffix “ed.,” while another might use the suffix “(edi-tor).” Finally, a citation style might specify that articletitlesare quoted while journaltitles are italicized.

Hi-cites are like tables in that they facilitate the percep-tual grouping of all values of a particular attribute. Specif-ically, users can highlight in red all values for an attribute bypausing for a short period of time over that attribute with themouse (a “tool-tip” affordance). The choice to use colorrather than a different visual cue was made after reviewingseveral studies that showed the value of color highlightingfor the visual-search task (Brown, 1991; Philipsen, 1994).Figure 3 shows bothtitle highlighting andpublisherhigh-lighting. Note that it substitutes reverse video for color so

that the highlighting can be represented in a black-and-white figure.

In addition to enabling highlighting, the point-plus-delayaffordance also allows users to find out the name of eachattribute. If a SenseMaker user were to move her or hiscursor over atitle attribute value and pause, then alltitlevalues would be highlighted. In addition, the nameTitlewould show up in a temporary rectangular pop-up box.

This “help” feature is important for cases where users arenot familiar with the chosen attribute-marking conventions.For example, a newcomer to geographic informationsources may not be familiar with the conventional markingof latitude and longitude, and thus may need to discoverwhat the pieces of a geographic citation are.

Figure 4 is a screen shot of SenseMaker. The centralregion contains hi-cites that describe Web pages, articles,and videos. Note how they resemble citations in terms oftheir layout and typography. In the figure, the user haschosen to highlight titles. By simply moving the mouse anddwelling on a different attribute, such as author, the usercould cause all authors to be highlighted. This comparisonis possible even in this small amount of screen real estate,even though not all of the hi-cites have author informationavailable.

2.3 Evaluation

Hi-cites are modeled after both citation sets and tables.Citation sets are easier to skim than tables because they takeup less screen space and provide a more spatially continu-ous flow of information. Hi-cites are so close in appearanceto citation sets that it is reasonable to believe that they arealso easier to skim than tables. Tables are better than cita-tion sets for the comparison of attribute values because theyfoster perceptual grouping. Hi-cites are different enough

FIG. 3. Two sets of hi-cites (actual hi-cites make use of color rather thanreverse video for highlighting).

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 299

Page 4: A user-centered interface for information exploration in a heterogeneous digital library

from tables that an evaluation is necessary to determinewhether they are also better than citation sets for this task.

2.3.1 Experiment (comparing hi-cites to tables andcitation sets)

Fourteen users participated in an evaluation of hi-cites.All users were from Stanford University and were paid fortheir time. Thirteen users were either undergraduate orgraduate students. One user was a Stanford staff member inthe Computer Science Department. No users were studentsin the Computer Science Department, although the majoritywere students in the sciences.

Users were instructed that there were three timed tasks tobe completed and that the first task was a practice task. Eachtask involved three steps, and each step involved viewing anew collection of document descriptions in a particularpresentation style and then answering onscreen questionsabout those descriptions. Textual and oral help were pro-vided for the practice task (to ensure that users understoodthe affordances of each presentation style), but users had noexternal help for the last two tasks. The three differentpresentation styles in the experiment—tables, citation sets,and hi-cites—were randomly ordered for each task, as werethe pairings between collections and presentation styles.Each collection consisted of 12 references on a particulartopic (e.g., earthquakes), where one quarter to one-half ofthe references were Web references and the remaining ref-erences were standard bibliographic references. Referenceswere gathered from the bibliographies of books and fromWeb search engines. The time to complete each step and theerror rate for each step were measured automatically. At theend of each task, users were also asked onscreen to rankeach presentation style in terms of perceived speed. Finally,at the completion of the study, users were asked to compare

and contrast orally the different styles and to choose one aspreferred.

The questions asked in each step were of the followingtypes (and were always presented in this fixed order):

● How many descriptions have the wordX in the title?● How many descriptions have publication locations listed?● How many descriptions have the publication dateY?

Recall that the goal of this study was to discover howwell citation sets, hi-cites, and tables supported attribute-based comparisons. Thus, these questions were designed toensure that users would perform attribute-based compari-sons of the references and that the performances of userscould be analyzed quantitatively. At a theoretical level, ourreason for measuring ease of comparison stems from theobservation that information explorers “must not only makemany comparisons and inferences, but also make manyinterpretations about what may serve as attribute values”during browsing (Marchionini, 1995, p. 109). Thus, im-proved ease of comparison should lead to improved brows-ing.

Given that hi-cites were modeled after tables and de-signed with comparisons in mind, our pre-experimentalhypothesis was that:

● Task-completion times would be fastest for tables, thenhi-cites, then citation sets;

● Perceived task-completion times would be fastest fortables, then hi-cites, then citation sets;

● Error rates would be smallest for tables, then hi-cites, thencitation sets;

● Hi-cites would be the preferred condition because theyallow for rapid answering of the questions as well as forquick skimming.

2.3.2 Results and Discussion

Figures 5 to 7 show the average task-completion time,the average perceived task-completion ranking, and theaverage error rate for each user and condition. Significantdifferences were found in completion time (one-way repeat-

FIG. 5. Average task-completion times.

FIG. 4. SenseMaker user interface with titles highlighted.

300 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 5: A user-centered interface for information exploration in a heterogeneous digital library

ed-measures ANOVA,F 5 7.52, p 5 0.003) andper-ceived completion time (F 5 10.72,p , 0.005), but notin error rates (F 5 .24, p 5 0.786). The post hocStudent-Newman-Keuls test atp 5 0.05showed that tablesand hi-cites were significantly faster for these tasks thanwere citation sets, although tables and hi-cites did not differsignificantly from each other. For perceived completionrankings, hi-cites were perceived to be faster than tables,and tables were perceived to be faster than citation sets.Finally, a t-test (where the null hypothesis was that theprobability of a user choosing a condition as preferred was1/3), showed that the observed proportions of preferenceassignments for hi-cites and citation sets were significantlydifferent from 1/3 forp 5 0.05. User feedback corrobo-rated several of our hypotheses about the advantages anddisadvantages of the various conditions. Users commentedthat hi-cites worked because “with a lot of text, it just hopsout at you,” that tables were useful because you “knewwhere to look,” and that citations made it “hard to pick outthe information being asked for.”

Our experimental results were somewhat surprising inthat hi-cites either outranked tables or were not sufficientlydifferent from tables in all cases. Thus, we can conclude thathi-cites are indeed a hybrid between tables and citation setsin that they:

● Allow for ease of comparison;● Allow for ease of skimming.

Ease of comparison is demonstrated by our experimentalresults. Ease of skimming stems from the fact that hi-citesand citation sets are both compact representations of infor-mation that take up less screen space and are more spatiallycontinuous than tables.

3. Enabling User-Driven Iterative Organization ofHeterogeneous Search Results

Although hi-cites allow users to browse and easily com-pare results from heterogeneous sources, they do not ad-

dress the issue of how users can browse a very large numberof results. The power associated with searching multiplesources at once is tempered by the “information overload”that often coincides with broadly scoped searches. Thissection begins by examining the literature on classification,clustering, and relational database grouping—all techniquesthat can help users to view search results at a higher level ofgranularity. It then turns to iterative bundling, a piece of theSenseMaker interface design that provides a uniform inter-face aggregation metaphor for these techniques. Finally, itpresents two studies that have focused on iterative bundling.

3.1 Background

Previous research on document classification (Wynar &Taylor, 1992; Hearst, 1994), clustering (Willett, 1988;Frakes & Baeza-Yates, 1992) and relational-databasegrouping (Melton & Simon, 1993; Goldstein & Roth, 1994)articulated the value of organizing citations and articles inunits at a higher level of granularity. For example, a studyof the Scatter/Gather system, which enables iterative topic-based clustering of documents, showed that topic structurewas effectively communicated to the end user (Pirolli et al.,1996).

Although they have a common end goal of creatingaggregates, the techniques underlying classification, cluster-ing, and grouping are quite different. Typically, classifica-tion involves assigning categories to items either manuallyor by running a statistically based algorithm; clusteringinvolves determining similarity by performing statisticalcomparisons; and database-style grouping involves parti-tioning items by computing equality predicates.

We propose that these three techniques can be unifiedfrom the user’s point of view through the use of an aggre-gation metaphor. Unifying these techniques in the interfaceis of particular value in a heterogeneous digital library.Heretofore, the traditional environments for these tech-niques have been fairly homogeneous. Tables in a single

FIG. 7. Average error rates.

FIG. 6. Average perceived task-completion rankings.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 301

Page 6: A user-centered interface for information exploration in a heterogeneous digital library

relational database are described by uniform schemas. Sim-ilarly, documents in an information-retrieval test bed oftencome from a single source or belong to a common genre,e.g., the TIPSTER news corpus (Harman, 1995), and thustext becomes the privileged “attribute” in this domain.

Users of the heterogeneous digital library are challengedby the loss of context that occurs when multiple, indepen-dent sources are woven together into a single whole. In thissituation, the multiple perspectives that can emerge fromseeing multiple organizations of a collection can potentiallybe of great value. For example, a user could gain a richergestalt sense of a collection by iteratively viewing it orga-nized according to author, source, topic, Web site, andpublication date.

Specifically, we suggest that users will find differentorganizations to be useful for different purposes and that theunderlying system should have the responsibility for match-ing the desired organization with a computational strategy.For example, both organization by author and organizationby publication date might rely on database-style grouping;organization according to MeSH might rely on classification(Hearst & Karadi, 1997); and organization according toLCSH might rely on clustering (Larson, 1989). Note that itcan even be valuable to reorganize documents according tothe same technique and criterion. The Scatter/Gather inter-face (Cutting et al., 1992) enables users to browse throughtopic-based clusters of results that are computed in realtime. At any point in the browsing process, users may“gather” clusters of interest and then ask for these hand-picked clusters to be reclustered, or “scattered.” Becauseusers continually winnow a set of documents in this way,the net effect of “gathering” and “scattering” is to provideusers with ever more detailed and customized views of thedocuments that they find interesting.

The idea of enabling users to explore multiple organiza-tions for a collection of documents has an extensive history.Library catalogers have long organized the books held by alibrary according to author, title, and subject. Many librariesalso physically organize the books on the shelves accordingto topic—often with the help of a classification system,such as the Dewey Decimal system. Outside of the library,restaurant guides often organize their information by bothcuisine and location. Phone books organize businesses bothby name and by type.

The utility of these organizations depends on how wellthe chosen contextual dimensions for organization matchthe information’s attributes and the user’s concept space.One theory of sensemaking hypothesizes that users buildunderstanding of a subject by iteratively developing mentalrepresentations that account for the relevant data and thataddress the problem at hand (Russell et al., 1993). Organiz-ing an encyclopedia by entry author is not useful becauseencyclopedia readers are unfamiliar with the encyclopedia’sauthors. In contrast, organizing an anthology by author isvaluable because its intended readers are likely to be famil-iar with the authors represented in the anthology. Accord-ingly, an important aspect of the SenseMaker design con-

cerns ensuring that the choices for aggregation available tothe user are always relevant. For example, SenseMaker willnot offer “bundling by Web site” if none of the documentsin the current collection have URLs.

3.2 SenseMaker Interface Design: Iterative Bundling

The SenseMaker interface design embodies the principlethat users should be able to employ a variety of contextualdimensions as the basis for organizing large result collec-tions. The interface provides a uniform aggregation meta-phor that allows classification, clustering, and groupingtechniques to coexist within a single system.

Figure 4 shows how articles, citations, and bundles arerepresented in the interface. Specifically, individual articlesand citations are identified by document icons. Double-clicking on a document icon brings up the full text of thearticle, if it is available (for Web pages, the text appears inthe user’s Web browser; for other text-based documents, thetext appears in a special-purpose text browser). Aggregatesare represented as folders (referred to as “bundles” in thetextual part of the interface) that can be opened to reveal thedocument icons inside (note that hierarchical aggregation ispossible in principle but is not currently implemented).Each icon is accompanied by a textual hi-cite that givesadditional details about the item.

The interface empowers the user to experiment withdifferent bundling strategies by allowing the user to changethe “bundling criterion.” These criteria are all described atthe conceptual level (e.g., “same author”) rather than at thealgorithmic level (e.g., “group together items with the samevalue for the author attribute”).

The SenseMaker interface makes conscious reference tothe design language that was first invented for the XeroxStar (Johnson et al., 1989) and then was adopted andevolved for subsequent graphical user interfaces. Documenticons and folder icons are common elements in this designlanguage. Furthermore, folders often contain documents anddouble-clicking a folder opens it to reveal the documentsinside. Users who are familiar with this design languagethus expect SenseMaker’s document icons to stand for filesor documents. Because results in the interface often bridgethe gap between citation and original document (full textbeing just another document attribute in the underlyinginfrastructure), the leap to results is not hard to make. Users’expectation that folder icons collect document icons makesbundles (groups of similar results) easy to understand.

A second design language referenced by the SenseMakerinterface is that ofholophrasts(Hansen, 1971). A holo-phrast is a display object that stands for a piece of hierar-chical structure and that can be expanded in place to revealthat structure. A holophrast may be a piece of text or anicon. Many outlining programs turn headings into holo-phrasts. Double-clicking a heading in such a program causesit to expand and show the next level of information belowthe heading. All other information in the interface is rear-ranged to accommodate the larger block of information.

302 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 7: A user-centered interface for information exploration in a heterogeneous digital library

Similarly, the holophrast idea is used in file browsers suchas the Macintosh Finder and Windows Explorer, whereusers can cause individual file folders to reveal their con-tents in place. In the SenseMaker interface, bundles (repre-sented as folders) are holophrasts. Double-clicking on eitherobject causes the elements that make up the object to berevealed in place.

Table 1 shows the bundling strategies currently availablethrough the SenseMaker interface and their mapping toattributes. Many of these mechanisms rely on specializedequality operators rather than primitive equality operators(an example of a primitive equality operator is the stringequality operator built into the Python programming lan-guage). Clearly, many more strategies are possible.

SenseMaker has both internal and external facilities forperforming iterative bundling. SenseMaker’s ability to usethird-party bundling services ensures that it is extensibleand that it is easily able to incorporate sophisticated bun-dling mechanisms. Requiring SenseMaker to encode all ofits own bundling procedures would limit the range of bun-dling strategies available to users. Given that SenseMakeroperates in the context of the Stanford InfoBus, communi-cation with autonomous bundling services is straightfor-ward. Bundling services can either be designed directly forthe InfoBus or can be incorporated into the InfoBus viaproxies. Currently, SONIA (Sahami, Yusufali, & Baldo-nado, 1997), a text-based clustering and classification ser-vice, is the only third-party service available via Sense-Maker.

3.3 Evaluation

Following usability-engineering principles (Nielsen,1993), we performed two small, informal studies to see howusers reacted to the concept of iterative bundling. Our firststudy focused on whether an optimal bundling strategycould be found across users and tasks. If, indeed, such anoptimal strategy could be found, then there would be noneed to allow users to experiment with bundling strategies.The second study looked to see whether and how users’

behaviors are affected by the operation of different bundlingstrategies.

3.3.1 First experiment (value of bundling strategies)

Four users, all with backgrounds in computer science,participated in the first study (this user population is not asdiverse as the user population for our hi-cite experiment; wewere constrained to choose computer scientists because thecitation search services available to us at study time were inthe domain of computer science). We sought to determine ifthere is an optimal bundling strategy by comparing users’responses to viewing a predetermined static collection ofresults organized according to different bundling strategies.Specifically, we obtained two large collections of results byissuing the keyword queries “cryptography” and “neuralnetworks” to two Web search services and two computer-science citation search services, asking for 100 resultsapiece for each query. We then produced three organiza-tions of each collection. We considered each organization tobe a separate experimental condition.

We conducted this study using an early version of theSenseMaker interface. Results were presented in an HTMLtable. Columns corresponded totitle, author, and URL at-tributes. In condition 1, each row corresponded to a singleresult. In conditions 2 and 3, each row corresponded to abundle of results, where bundling was by same author forcondition 2 and by same Web site for condition 3. Inaddition, conditions 2 and 3 introduced columns for bundle-level information: bundle size and a bundle description.

Users looked at each organization for two minutes (pre-sentation order of conditions was determined randomly butstayed constant for the two collections). After each condi-tion, users reported their impressions of the collection. Afterfinishing with each collection, users were asked to rate theconditions for usefulness on a scale of 1 to 10 and todescribe what they found useful about each condition.

3.3.2 Results and Discussion for First Experiment

We encountered great diversity of opinion on the differ-ent conditions. Each condition was ranked first by some userfor some collection. Furthermore, two of the users changedtheir rankings from one collection to the next. Given thesmall number of users who participated in this study, wecannot determine the statistical significance of these results.However, the disparity in the rankings across both users andtasks suggests that there is indeed no one optimal bundlingstrategy. We can probe more deeply into why users rankedthe conditions as they did by analyzing the comments theymade about what they liked or did not like about the variouscollections.

Every user ranked bundling by Web site first for somecollection. When asked about this bundling strategy, usersnoted that this organization was valuable because they couldmake inferences about Web sites that helped them evaluatethe quality of the results. In contrast, two users mentioned

TABLE 1. Currently implemented SenseMaker bundling strategies.

Bundling Strategy Bundling Mechanism

No bundling N/ASame title Database-style grouping (title)Similar title Database-style grouping (title)Same search service Database-style grouping (search service)Same item type Database-style grouping (item type)Same Web site Database-style grouping (URL)Same Web-site collection Database-style grouping (URL)Same author Database-style grouping (author)Similar geographic range Database-style grouping (west longitude,

east longitude, north latitude, southlatitude)

Similar content Text-based clustering (full text)Topic-based classification Text-based classification (full text)

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 303

Page 8: A user-centered interface for information exploration in a heterogeneous digital library

that bundling by author was not useful because they werenot familiar with the field or with the authors who werelisted. Overall, the combination of these comments and thedisparity of rankings for the conditions suggests that usersdo find different structures informative for different reasons.

3.3.3 Second experiment (effect of bundling strategies onuser behavior)

Five users (Stanford students from a variety of disci-plines) participated in the second study. Our goal was todiscover whether and how the addition of structure to aresult collection affects user behavior for the task of brows-ing a collection to find promising results for a hypotheticalterm paper. Again, we based our analysis on user responseto predetermined static collections of results. Using meth-ods similar to those used in the first study, we obtained fourlarge collections of results by issuing the subject queries“Lincoln,” “impressionism,” “Egypt,” and “postmodern-ism” to Excite (100 results), Infoseek (100 results), and theLibrary of Congress catalog (200 results). We then pairedeach collection with a presentation condition, randomlyvarying the pairing as well as the presentation order for eachuser. In all cases, the interface look-and-feel mirrored thecurrent version of the SenseMaker interface. In condition 1,users saw an unbundled list of results (alphabetically or-dered according to the first element of the result). In con-dition 2, users saw results bundled according to author(bundles were ordered by decreasing size). All results fromInfoseek and Excite were placed into an “Unknown Author”bundle in this collection, making it an exceptionally largebundle (.200 results included). In condition 3, users sawresults bundled according to Web site (again, bundles wereordered by decreasing size). All the results from the Libraryof Congress catalog were placed into a “No Web Site”bundle in this collection, making it also an exceptionallylarge bundle (200 results included). Finally, in condition 4,users were free to switch among unbundled results, bun-dling according to author, and bundling according to Website (unbundled was the default view). For each condition,users were told how the collection of results was obtainedand were given the task of choosing five results that ap-peared promising for a hypothetical term paper on the topic.Users were asked to think aloud as they browsed.

3.3.4 Results and discussion for second experiment

In the field of ethnography, developing a coding schemefor analyzing people’s behaviors in a study is an importantand rigorous activity. We expect that an ethnographer’scoding of our study results would have great value. How-ever, we found that even an informal coding of users’activities in our study was informative for our analysis. Inreviewing the videos of each user’s activity, we noticedcommon patterns in the browsing strategies used by theusers. Accordingly, we developed a simple coding schemebased on the browsing strategies we observed (see Table 2).

We then coded each user’s activity as a sequence of brows-ing strategies taken.

Our hypothesis for this study was that user behaviorwould be affected by the presented organization. Indeed, theresults showed that users employed different browsing strat-egies in different conditions. All users utilized a “Sequen-tially Process Bundles” strategy at some point during thesession. In this strategy, users move from bundle to bundle,opening each and perusing its contents. Four of the fiveusers also utilized a “Filter Bundles” strategy at some pointduring the session. Users reported making use of a variety offiltering criteria, including familiarity of the bundle descrip-tion, quality inferences about the bundle, and judgmentsabout the similarity of the bundle description to the currenttopic. For example, one user announced, “I guess I’ll startwith things with edu . . . . I figure it has the most academicinformation vs. . . . current events or selling things.” An-other observed that bundling by author was useful because“the same person would probably write about the same typeof stuff.” In general, users made inferences about Web sitesif the organization was by Web site, and they made infer-ences about authors if the organization was by author. Manynoted that author bundling would be more useful to them ifthey were familiar with authors on the topic at hand.

From these observations of patterns in user activity, wecan draw some conclusions about the effect of collectionorganization on user behavior. First, the combination ofbundling and sorting used in this study biased users towardconsidering larger bundles first. Second, bundling encour-aged users to make use of bundle-dependent inferences inbrowsing. We conclude that organizational structure has animportant and readily discernible effect on users’ browsingstrategies.

4. Increasing the Fluidity Between Browsing andSearching

Hi-cites and iterative bundling combine to help the userbrowse, analyze, and make sense of heterogeneous results.

TABLE 2. Coding scheme for browsing strategies.

Condition Strategy name Strategy description

Unbundled ST (SurveyTitles)

Rapidly skim titles to gain an overview.

FT (FilterTitles)

Consider titles one by one for relevance;relevant titles are recorded.

Bundled SB (SurveyBundles)

Rapidly skim bundles to gain an overview.

FB (FilterBundles)

Consider bundles one by one for relevance;open relevant bundles and filter their titles.

SPB(SequentiallyProcessBundles)

Open bundles in order (occasional jumping)and filter their titles.

Focus Focus on “Unknown Author” bundle or “NoWeb Site” bundle; open this bundle andthen filter its titles.

Variablebundling

Change Change bundling strategy (the followingabbreviations are used to indicate whatnew strategy was chosen: A5 Author; WS5 Web Site; N5 None).

304 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 9: A user-centered interface for information exploration in a heterogeneous digital library

The structure that is built during this interpretation cyclesuggests some new possibilities for blurring the dividingline between browsing and searching. This section detailsstructure-based searching and structure-based filtering, twointeraction strategies that allow for fluid movement betweenbrowsing and searching in SenseMaker. This section alsodiscusses related work in this domain, and finally describesthe SenseMaker studies that we undertook to understandhow users respond to this new interaction style.

4.1 Background

Consider a student who needs to write a term paper forhis or her French class. While the user may begin exploringpossible topics with a broad query such as “Paris museums,”browsing the initial set of results will likely cause the choiceof topic evolve. What the user decides to look for nextdepends on what inferences and judgments the user hasmade about the information that has been found so far. Forexample, the user may decide to narrow the topic to the“Musee d’Orsay” once the user sees the multitude of refer-ences that are available for the more general topic. Thiscontextual evolution of a user’s interests is typical of infor-mation-exploration tasks and has been identified in both thelibrary sciences and human-computer interaction literature(Bates, 1989; O’Day & Jeffries, 1993).

This evolution of a user’s interests requires a synergyamong searching, browsing, reading, and analyzing (Belkin,Marchetti, & Cool, 1993; Marchionini, 1995). We hypoth-esize that a system that carefully intertwines browsing andsearch will increase the amount of exploration undertakenby a user. After all, real-world tasks are time-limited andresource-constrained: “satisficing” is usually the winninginformation-exploration strategy (Mann, 1993). Thus, if thecost of switching between searching and browsing werereduced, users would be more likely to undertake deeperexploration. Accordingly, a design goal for SenseMaker hasbeen to make the boundaries between the exploration modesmore fluid. Structure-based searching and structure-basedfiltering have been developed with this goal in mind.

The key insight behind structure-based searching andfiltering is the recognition that the aggregates created byusers in the SenseMaker interface can be viewed as surro-gates for queries. An aggregate can serve as the basis forfetching more results or for filtering the current set ofresults. Besides enabling users to make use of the structurethey have built for a result collection, structure-basedsearching and filtering spare users from the need to formu-late appropriate Boolean queries, a task that can be cogni-tively difficult (Greene et al., 1990).

4.1.1 Structure-based searching

More precisely, we introduced two types of structure-based searching into SenseMaker: searching bygrowingselected bundlesand searching byadding related bundles.Searching bygrowing selected bundlesenables users to take

the key concepts associated with the selected bundles anduse them to drive the search for more results. In particular,this action takes bundles selected by the user and treats themas templates for finding more documents. Figure 8 illus-trates this action for an abstract domain. It shows a collec-tion of geometric objects that is bundled by shape. Selectingthe bundle of circles as the trigger for action causes thatbundle to receive more circles. In a digital-library domain,shape might correspond to author, with a circle correspond-ing to a work by Shakespeare and a square corresponding toa work by Newton.

Two styles of searching bygrowing selected bundlesarepossible: collection-independent and collection-dependent.In the collection-independent style, only the characteristicsof the selected bundles are used to find more documents. Inthe collection-dependent style, the characteristics of theselected bundles are used in conjunction with the definingcharacteristics of the entire collection. Consider the userwho issues a query specifying a particular subject, bundlesthe resulting collection by author, and then selects threeauthor bundles as the basis for finding more results. Thecollection-independent style of searching bygrowing se-lected bundlesenables the user to find out what else thechosen authors have written, independent of the particularsubject named in the original query. The collection-depen-dent style enables the user to find more results that are bothby these authors and on the same subject articulated in thequery—useful in a heterogeneous, distributed environmentwhere users request results and choose search services in-crementally. It enables a user to make sense of a smallercollection and to determine what is of interest before pro-gressing to a more expensive stage of exploration.

Because most search services accept only queries asinput, implementations of searching bygrowing selectedbundles (including the implementation found in Sense-Maker) must work by formulating queries that will retrieveresults that are targeted to the selected bundles. In general,the algorithm for querying bygrowing selected bundlesinvolves formulating a query that describes the “template”bundles (and conjoining that query with the query thatcreated the entire current collection in the case of thecollection-dependent style) and then issuing that query to aset of sources determined by the user. The query thatdescribes the “templates” bundles is, in turn, the disjunctionof individual queries, one for each “template” bundle.

FIG. 8. Searching bygrowing selected bundles.Selecting the “circle”bundle causes the “circle” bundle to receive more circles and grow.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 305

Page 10: A user-centered interface for information exploration in a heterogeneous digital library

For bundles that have been formed by attribute-valueequality comparisons, the formulation of a query that de-scribes their key characteristics is fairly straightforward. Forexample, consider selecting the Calvino bundle from acollection bundled by author name. The correspondingquery would be “author5 Calvino.” If the bundling mech-anism has relaxed the strict criterion of string equality toallow similar author names to be considered to be the same,the query might include all the variant forms. For example,consider the case in which bundling by author name hasproduced an author bundle that includes both Ro¨scheisen(with an umlaut) and Roscheisen (without an umlaut). Inthis case, the query might be “author 5 RoscheisenorRoscheisen.” For more complex bundling strategies, thecorresponding query must be formulated by heuristicmeans.

The concept of using bundles as the triggers for actionextends beyond the use of bundles as templates. A secondstructure-based searching strategy is searching byaddingrelated bundles.Figure 9 depicts the action graphically.Here, selecting the bundle of circles causes new bundles tobe added, where the shapes associated with the new bundles(horizontal ellipses and vertical ellipses) are related mathe-matically and intuitively to the shape (circles) associatedwith the selected bundle.

Two short scenarios illustrate the value of this strategy ina digital-library domain. In the first scenario, a user issues aquery and then bundles the ensuing collection by author.The works of three or four authors catch the user’s eye.After growing these bundles to find out what else theauthors have written on this subject, the user wonderswhether and what the authors’ colleagues and students havewritten on the same subject. Accordingly, she chooses tosearch by requesting the addition of new bundles, where theauthors associated with the new bundles are restricted tothose who are colleagues or students of the selected authors.In the second scenario, a user browses through a collectionorganized into topic bundles. Having identified a few bun-dles of interest, the user chooses to search byadding relatedbundles,with the expectation that the newly added bundleswill have topics that are related to the topics of the selectedbundles.

The algorithm for implementing this type of structure-based searching involves identifying the key characteristicsof the selected bundles, accessing an external source thatrecords relationships among these characteristics, determin-

ing characteristics that are related to the key characteristicsalready computed, and then issuing a query for items withthe newly defined characteristics. While bundling strategiessometimes require sources that record equivalence relation-ships, querying byadding related bundlesrequires sourcesthat record more general relationships. Our first digital-library scenario in this section might make use of a “gene-alogy” for the field at hand. Such a repository of informationis already available for a number of fields. For example, thegenealogy for the area of artificial intelligence is well-known in that community (Belew, 1994). Our second sce-nario in this section might make use of a classificationscheme such as the Library of Congress Subject Headingsor the Dewey Decimal system.

4.1.2 Structure-based filtering

Filtering by focusing on selected bundlesallows users toemploy structure to limit a collection of results quickly andat a high level of granularity. In particular, SenseMakerenables a user to restrict the current collection of results tojust those bundles that have been selected. Figure 10 illus-trates this action graphically.

In the digital-library context, this strategy can be partic-ularly useful, as the need to browse large collections ofresults often arises. Viewing the results of a Web searchengine one result at a time can require painstaking attention.Even a simple organization of these results according todomain (e.g., one bundle for stanford.edu, another bundlefor harvard.edu, etc.) can help a user who has some knowl-edge about which sites and organizations are most likely tohave the needed information, enabling the user to focus, forexample, on just a few university sites.

The overall effect of afocusing on selected bundlesaction could also be achieved by issuing a query that is aconjunction of the query that created the collection and adisjunctive Boolean expression whose elements correspondto each of the selected bundles. The resulting Booleanexpressions are likely to be difficult for users to understandand produce. Again, one of the primary benefits of struc-ture-based searching and structure-based filtering is thatusers can operate on the concrete objects that appear beforethem on-screen and thus obviate the need to formulateabstract queries.

FIG. 10. Filtering byfocusing on selected bundles.Selecting the “circle”bundle causes only the “circle” bundle to remain.

FIG. 9. Searching byadding related bundles.Selecting the “circle”bundle causes a “horizontal ellipse” bundle and a “vertical ellipse” bundleto be added.

306 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 11: A user-centered interface for information exploration in a heterogeneous digital library

4.1.3 Related work

Several strategies and systems from the literature arerelated to the concepts of structure-based searching andstructure-based filtering, particularly relevance feedback,query by example (QBE), the RABBIT system, and theButterfly system, which we discuss here.

In a typical relevance-feedback scenario, a user interactswith the system by identifying which of the initially re-turned documents are relevant. The system then retrievesdocuments that are statistically similar to these documents.Relevance feedback was originally conceived of as an iter-ative process that converges on an ideal query for a corpusof documents. In particular, relevance feedback assumesthat an ideal query is one that induces a ranking of thedocuments in a corpus such that all relevant documents areranked higher than all nonrelevant documents (Rocchio,1971). The idea that documents can closely approximate auser’s ideal query is similar to our idea that bundles ofresults can identify new areas of interest and can be accord-ingly mapped onto new queries.

The original query-by-example system (Zloof, 1975)also enables users to move from browsing to searching byappealing to examples. QBE users interact with a tuplevisualizer to produce a table that is a partial specification ofthe tuples to be found. Figure 11 shows how “Print the titlesby Shakespeare” would be specified in the QBE interface.Example elements are underlined, while constant elementsare not underlined. The function “P.” stands for “print.”

QBE enables users to generate partially specified exam-ples of what should be found rather than to formulateabstract queries. An important difference between QBE andour structure-based actions is that SenseMaker bundles arealready fully specified.

In the RABBIT system (Williams, 1984), users begin byformulating a partial specification of what they would liketo find. In response, the system displays back not only thequery, but also an exemplar of what the query would re-trieve. This exemplar is described by attribute-value pairs.At this point, the user refines the query by critiquing theexample that is provided. The user’s critique is constrainedto one of several types, including requiring that an attributetake on the displayed value, specifying that the value for anattribute must be less than or greater than the displayedvalue, prohibiting an attribute from taking on the displayedvalue, and so forth. This approach can be seen as a struc-tured version of relevance feedback. In both RABBIT andrelevance-feedback interfaces, the underlying query is in-ferred from examples. However, the user has more controlover that inference process in RABBIT than in standard

relevance feedback. Again, our structure-based queryingand filtering is different from RABBIT in that SenseMakeracts on aggregates rather than on single items.

The Butterfly system (Mackinlay, Rao, & Card, 1995)automatically displays for a document both the documentsthat it cites and the documents that cite it (itsciters). TheButterfly designers have observed that the Butterfly inter-face could be generalized to accommodate a variety ofrelationships, not just citation-based relationships. If But-terfly were also generalized from a document-centered in-terface to a bundle-centered interface, then it would be anexample of a proactive interface for searching byaddingrelated bundles.In its current instantiation, Butterfly is aninterface that proactively displays documents (of course, anindividual document can be viewed as the base case of abundle of documents) that stand in a citation relationship tothe previously retrieved documents.

4.2 SenseMaker Interface Design: Structure-BasedSearching and Structure-Based Filtering

The affordances for initiating structure-based actions canbe seen in Fig. 4. Each bundle in the results window isaccompanied by a checkbox. These checkboxes enable us-ers to select bundles of interest. Most of the structure-basedactions we discuss in this paper were implemented and canbe initiated from the menubar. Structure-based searching islocated under the “Expand” menu, while structure-basedfiltering is located under the “Limit” menu. Actions not yetimplemented include expanding a collection byadding re-lated bundlesand the collection-independent style of ex-panding a collection bygrowing selected bundles.To re-mind the user that a variety of collection-building actions isavailable in the interface, SenseMaker labels each newcollection with a description of the action that created it (inaddition to a user-specified name, if provided).

One of the challenges in designing the interface wasdeciding how to show how the collection created by launch-ing a new action is related to the previous collection. Forexample, a user who has performed structure-based search-ing must be able to identify which results are new and whichare old. Based on our success with hi-cites in the interface,we decided to flag “new” results by displaying their accom-panying icons in red. We flag bundles as red if there is atleast one new item inside the bundle that is new.

4.3 Evaluation

We conducted two small, informal studies to evaluate allof the collection-building actions possible in SenseMaker.In this paper, we focus only on what we learned aboutstructure-based actions. A complete description of Sense-Maker actions and this study can be found in Baldonado(1998). The goal of the first experiment was to learnwhether users understood and made use of the actions thatwere provided in SenseMaker after a short tutorial on thesystem. The second study addressed the question of whether

FIG. 11. Specifying “Print the titles by Shakespeare” in the QBE inter-face.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 307

Page 12: A user-centered interface for information exploration in a heterogeneous digital library

users could develop an understanding of the available ac-tions without any training.

4.3.1 First experiment (value of structure-based actions)

Five users, all with backgrounds in computer science,participated in the first study. Each user completed twosimilar tasks, one with an early version of SenseMaker andthe other with a baseline system (version presentation orderwas varied randomly; the capabilities of both systems aredescribed below). The baseline system did not provide anystructure-based actions, so we do not describe those resultsin this paper. Before each task, users received a short (;5minutes) introduction to the system that would be used forthe task. Written instructions for each task informed theusers that they were about to write a term paper on a topicof their choice for a graduate-level seminar (a cryptographyseminar for task 1 and a neural-networks seminar for task2). The users were then given 15 minutes in which todetermine the specific topic and to write down the titles ofone or two promising references. Users were limited toreferences that could be found by using the interface di-rectly (i.e., following hypertext links in Web pages was notallowed). At the end of both tasks, users participated in astructured interview (;15 minutes) about their experiences.

4.3.2 Results and discussion for first experiment

Table 3 shows the breakdown of actions taken by thestudy participants for each of the conditions.

Our number of users and number of actions are both toosmall to draw statistical conclusions from these patterns.However, it is noteworthy that four of the five users did takeadvantage of structure-based actions. Two users mentionedexplicitly that they found the filter feature useful because ithelped them to pare their collections down. The interviewsalso revealed that users understood the concepts of struc-ture-based searching and filtering, leading us to concludethat users’ decisions to use these actions were made inten-tionally. Although there may be a novelty effect underlyingthe percentages of structure-based actions that were under-taken, we can certainly surmise from the interviews that ourusers saw some value in structure-based actions.

4.3.3 Second experiment (learnability of structure-basedactions)

Three users (Stanford students from a variety of disci-plines) participated in the second study. The overall goal ofthis study was to learn what conceptual models were devel-oped by novice SenseMaker users. The study discussed inthe last section showed that users developed a good under-standing of the collection-building actions available inSenseMaker after receiving training and completing a spe-cific task. In contrast, we sought to learn in this studywhether users could develop a good understanding of theactions without any training at all. Accordingly, users in thisstudy were given 45 minutes to experiment with the currentversion of SenseMaker. To focus the users’ explorations, wetold them that their task was to learn how SenseMakerworks and that they would be expected at the end of thesession to explain how SenseMaker could be used to doresearch for a term paper. We asked users to observe athink-aloud protocol.

4.3.4 Results and discussion for second experiment

Table 4 summarizes users’ comprehension of each of theavailable structure-based actions in SenseMaker.

The three users in the study exhibited varying degrees ofcomprehension of the underlying collection-creating ac-tions. The user with the least amount of comprehension didnot spend much time trying to understand the interface. Onereason might be that the structure-based search and filteractions are greyed out in the menus until the user selects abundle—thus, a quick look through the menus at the begin-ning of the session does not reveal all of the possibilities foraction. In contrast, the other two users quickly grasped thepurpose and intent of the structure-based actions. Overall,this study shows that we can still improve the interface

TABLE 3. User actions in SenseMaker.

Action U1 U2 U3 U4 U5

Nonstructure-based actions 75% 75% 50% 100% 50%issue new query; fetchmore hits; return toprevious collection

Structure-based actions 25% 25% 50% 0% 50%grow selected bundles;focus on selected bundles

Total Number of Actions 8 4 8 9 6

TABLE 4. Understanding of SenseMaker structure-based actions withouttraining.

Action U1 U2 U3

Grow Selected Bundles No Yes YesFocus on Selected Bundles No Yes YesDiscard Selected Bundles No Yes Yes

308 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000

Page 13: A user-centered interface for information exploration in a heterogeneous digital library

design but that we are close to making the structure-basedactions easy for novices to grasp.

5. Conclusion

The emergence of the heterogeneous digital library hasdriven the design of the SenseMaker user interface. Inparticular, three desiderata have helped shape the featuresand concepts of the system:

● Support the user in uniformly browsing collections ofheterogeneous results.

● Support the user in making sense of large collections ofheterogeneous results.

● Support the user in moving fluidly between browsing andsearching in order to facilitate the contextual evolution ofthe user’s interests.

The new features of SenseMaker that respond to theseneeds are:

● Hi-cites for displaying results and allowing easy compar-ison of their attributes.

● Iterative bundling for providing an interface-level aggre-gation metaphor and enabling the user to gain multipleperspectives.

● Structure-based searching and filtering for enabling theuser to expand or limit a collection of results based on therecognition of interesting new dimensions, rather thanrelying solely on the specification of new queries.

We have performed several user studies, ranging fromthe formal to the informal, in order to understand how wellthese features match our outlined desiderata. Given thesmall sample sizes in our informal studies, our conclusionsfrom those studies need more rigorous testing. Nevertheless,we learned much from our current set of studies. Our hi-citestudy shows statistically that hi-cites are well-matched tocertain attribute comparison tasks. Our iterative bundlingstudies suggest that users can find different bundling strat-egies informative for different reasons, and that theselection of a particular bundling strategy can affect userbehavior. These studies highlight the value of iterativelyrestructuring collections. Finally, our small studies onstructure-based actions show that users can understand andmake use of these actions.

In the future, we intend to extend SenseMaker in anumber of new directions. First, we plan to incorporateadditional sensemaking facilities. For example, the scatterplots of Envision (Heath et al., 1995) and Butterfly (Mack-inlay, Rao, & Card, 1995) would complement the currentSenseMaker interface. Second, we plan to enable Sense-Maker to communicate with other information-centric tools.An early version of SenseMaker communicated with theDLITE task-based workspace for digital libraries (Cousinset al., 1997). Extending the interactions between these twosystems would enable users to work in the digital librarymore seamlessly. Finally, we envision making SenseMaker

more context aware. The current interface helps users tounderstand the publication context of information in thedigital library. By adding a richer representation of theuser’s context to SenseMaker, the fit to the user’s needs andtasks could be improved.

6. Acknowledgments

Many thanks to Terry Winograd, who has helped toguide and shape the SenseMaker design and ideas from thestart. Moreover, the entire Stanford Digital Libraries Projectteam has provided valuable advice and feedback on Sense-Maker. Finally, I am grateful to Steve Cousins of XeroxPARC for his constructive critiques of this paper.

This work was supported by the National Science Foun-dation under Cooperative Agreement IRI-9411306. Fundingfor this agreement was also provided by DARPA, NASA,and the industrial partners of the Stanford Digital LibrariesProject.

References

Baldonado, M.Q.W. (1998). An interactive, structure-mediated approach toexploring information in a heterogeneous, distributed environment.Ph.D. dissertation, Stanford University.

Baldonado, M., Chang, C.K., Gravano, L., & Paepcke, A. (1997, Septem-ber). The Stanford digital library metadata architecture. InternationalJournal on Digital Libraries, 1:2, 108–121.

Baldonado, M.Q.W., & Winograd, T. (1998). Hi-cites: Dynamically cre-ated citations with active highlighting. In Proceedings of the ACMSIGCHI conference on human factors in computing systems, Los An-geles, California (pp. 408–415). New York: ACM Press.

Baldonado, M.Q.W., & Winograd, T. (1997). SenseMaker: An informa-tion-exploration interface supporting the contextual evolution of a user’sinterests. In Proceedings of the ACM SIGCHI conference on humanfactors in computing systems, Atlanta, Georgia (pp. 11–18). New York:ACM Press.

Bates, M.J. (1989, October). The design of browsing and berrypicking tech-niques for the online search interface. Online Review, 13:5, 407–424.

Belew, R. (1994). AI genealogy. In Mark Kantrowitz (Ed.), Prime timefreeware for AI, Issue 1-1, Prime Time Freeware.

Belkin, N.J., Marchetti, P.G., & Cool, C. (1993). BRAQUE: Design of aninterface to support user interaction in information retrieval. InformationProcessing & Management, 29:3, 325–344.

Brown, T.J. (1991). Visual display highlighting and information extraction.In Proceedings of the Human Factors Society 35th annual meeting, SanFrancisco, California (pp. 1427–1431).

Chang, C.K., Garcia-Molina, H., & Paepcke, A. (1996, August). Booleanquery mapping across heterogeneous information sources. IEEE Trans-actions on Knowledge and Database Engineering, 8:4, 515–521.

Chen, H., Schatz, B., Ng, T., Martinez, J., Kirchhoff, A., & Lin, C. (1997).A parallel computing approach to creating engineering concept spacesfor semantic retrieval: The Illinois Digital Library Initiative Project.IEEE Transactions on Pattern Analysis and Machine Intelligence, 18:8,771–782.

Cousins, S.B., Paepcke, A., Winograd, T., Bier, E.A., & Pier, K. (1997).The Digital Library Integrated Task Environment (DLITE). In Proceed-ings of the 2nd ACM international conference on digital libraries,Philadelphia, Pennsylvania (pp. 142–151).

Cutting, D.R., Karger, D.R., Pedersen, J.O., & Tukey, J.W. (1992). Scatter/Gather: A cluster-based approach to browsing large document collec-tions. In Proceedings of the 15th Annual International ACM SIGIR

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000 309

Page 14: A user-centered interface for information exploration in a heterogeneous digital library

conference on research and development in information retrieval,Copenhagen, Denmark (pp. 318–329).

Frakes, W., & Baeza-Yates, R. (1992). Information retrieval: Data struc-tures and algorithms. Englewood Cliffs, NJ: P T R Prentice-Hall.

Goldstein, J., & Roth, S.F. (1994). Using aggregation and dynamic queriesfor exploring large data sets. In Proceedings of the ACM SIGCHIconference on human factors in computing systems, Boston, MA (pp.23–29). New York: ACM Press.

Greene, S.L., Devlin, S.J., Cannata, P.E., & Gomez, L.M. (1990). No IFs,ANDs, or ORs: A study of database querying. International Journal ofMan-Machine Studies, 32, 303–326.

Hansen, W.J. (1971). User engineering principles for interactive systems.In AFIPS proceedings of the fall joint computer conference, Las Vegas,Nevada (pp. 523–532).

Harman, D. (1995). Overview of the third text retrieval conference. InHarman, D.K. (Ed.), Overview of the third text retrieval conference.Washington, DC: U.S. Government Printing Office, NIST Special Pub-lication 500-225, 1–19.

Hassan, S.W., & Paepcke, A. (1997). Stanford digital library interoper-ability protocol. Stanford digital library working paper series, SIDL-WP-1997-0054. Accessible at http://www-diglib.stanford.edu/cgi-bin/WP/get/SIDL-WP-1997-0054.

Hearst, M.A. (1994). Using categories to provide context for full-textretrieval results. In Proceedings of the RIAO ’94, New York.

Hearst, M.A., & Karadi, C. (1997). Cat-a-Cone: An interactive interface forspecifying searches and viewing retrieval results using a large categoryhierarchy. In Proceedings of the 20th annual ACM/SIGIR conference,Philadelphia, PA (pp. 246–255).

Heath, L.S., Hix, D., Nowell, L.T., Wake, W., Averboch, G., & Fox, E.(1995). Envision: A user-centered database of computer science litera-ture. Communications of the ACM, 38:4, 52–53.

Johnson, J., Roberts, T.L., Verplank, W., Smith, D.C., Irby, C.H., Beard,M., & Mackey, K. (1989). The Xerox star: A retrospective. IEEEComputer, 22:9, 11–29.

Larson, R.R. (1989). Managing information overload in online catalogsubject searching. In Proceedings of the American Society for Informa-tion Science 52nd meeting annual meeting, Washington, DC (pp. 129–135).

Mackinlay, J.D., Rao, R., & Card, S.K. (1995). An organic user interfacefor searching citation links. In Proceedings of the ACM SIGCHI con-ference on human factors in computing systems, Denver, CO (pp.67–73), New York: ACM Press.

Mann, T. (1993). Library research models: A guide to classification,cataloging, and computers. New York: Oxford University Press.

Marchionini, G. (1995). Information seeking in electronic environments.Cambridge, England: Cambridge University Press.

Melton, J., & Simon, A.R. (1993). Understanding the new SQL: A com-plete guide. San Mateo, CA: Morgan Kaufmann Publishers.

Network Development and MARC Standards Office (1996). USMARCconcise format for bibliographic data. Library of Congress. Accessible at^gopher://marvel.loc.gov:70/00/.listarch/usmarc/biblio.fl&.

Nielsen, J. (1993). Usability engineering. Boston, MA: Academic Press.O’Day, V.L., & Jeffries, R. (1993). Orienteering in an information land-

scape: How information seekers get from here to there. In Proceedingsof the ACM SIGCHI conference on human factors in computing systems(INTERCHI ’93), Amsterdam, Netherlands (pp. 438–445).

Paepcke, A., Cousins, S.B., Garcia-Molina, H., Hassan, S.W., Ketchpel,S.P., Ro¨scheisen, M., & Winograd, T. (1996). Using distributed objectsfor digital library interoperability. Computer, 29:5, 61–68.

Philipsen, G. (1994). Effects of six different highlighting modes on visualsearch performance in menu options. International Journal of Human-Computer Interaction, 6:3, 319–334.

Pirolli, P., Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/Gatherbrowsing communicates the topic structure of a very large text collec-tion. In Proceedings of the ACM SIGCHI conference on human factorsin computing systems, Vancouver, British Columbia (pp. 213–220).New York: ACM Press.

Rocchio, J.J., Jr. (1971). Relevance feedback in information retrieval. In G.Salton (Ed.), The SMART retrieval system. Englewood Cliffs, NJ:Prentice-Hall.

Russell, D.M., Stefik, M.J., Pirolli, P., & Card, S.K. (1993). The coststructure of sensemaking. In Proceedings of the ACM SIGCHI confer-ence on human factors in computing (INTERCHI ’93), Amsterdam,Netherlands (pp. 269–276). New York: ACM Press.

Sahami, M., Yusufali, S., & Baldonado, M.Q.W. (1997). Real-timefull-text clustering of networked documents. In Proceedings of thefourteenth national conference on artificial intelligence, Providence,RI (p. 845).

Wake, W.C., & Fox, E.A. (1995). SortTables: A browser for a digitallibrary. In Proceedings of the fourth international conference on infor-mation and knowledge management, Baltimore, MD (pp. 175–181).

Weibel, S., Godby, J., Miller, E., & Daniel, R. (1995). OCLC/NCSAmetadata workshop report. Accessible at^http://www.oclc.org:5047/oclc/research/publications/weibel/metadata/dublin_core_report.html&.

Williams, M.D. (1984). What makes RABBIT run? International Journal ofMan-Machine Studies, 21, 333–352.

Willett, P. (1988). Recent trends in hierarchic document clustering: Acritical review. Information Processing & Management, 24:5, 577–597.

Wynar, B.S., & Taylor, A.G. (1992). Introduction to cataloging and clas-sification. Englewood, CO: Libraries Unlimited.

Z39.50 Maintenance Agency (1995). Attribute set Bib-1 (Z39.50-1995):Semantics. Accessible at^ftp://ftp.loc.gov/pub/z3950/defs/bib1.txt&.

Zloof, M.M. (1975). Query by example. In AFIPS proceedings of thenational computer conference, Anaheim, CA (pp. 431–437).

310 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE—February 2000