Feb. 1, 2007Search and Sensemaking1 CoLiDeS and SNIF-ACT: Complementary Models for Searching and Sensemaking on the Web Muneo Kitajima, AIST Peter G. Polson

Feb. 1, 2007 Search and Sensemaking 1

CoLiDeS and SNIF-ACT: Complementary Models for

Searching and Sensemaking on the Web

Muneo Kitajima, AIST

Peter G. Polson & Marilyn H. BlackmonUniversity of Colorado


Decoding Acronyms In Title

• CoLiDeSComprehension-based Linked-model of Deliberate Search (Kitajima, Blackmon, and Polson, 2000,2005)– Derived from Kintsch’s (1998) construction-integration

cognitive architecture– Basis for usability engineering method called Cognitive

Walkthrough for the Web (CWW) (Blackmon et al., 2002).

• SNIF-ACT Scent-based Navigation and Information Foraging in the ACT Cognitive Architecture (Fu and Pirolli, In press; …..)


Goals

• To Integrate:– CoLiDeS

• Searching and sensemaking at the level of a Web pageWith

– SNIF-ACT• Searching and sensemaking at higher levels

• To Show:– Role of background knowledge– Multiple patches on a single web page– Importance of comprehension processes in

searching and sensemaking


Outline

1. Information Foraging Theory (Card and Pirolli, 1999; Pirolli, 2005)

2. Brief Summary of SNIF-ACT 3. Scent AND Background Knowledge4. CoLiDeS 5. Multiple Patches on A Web Page6. Interactions Between Search and Sense Making7. Conclusions8. CoLiDeS +SNIF-ACT


1. Information Foraging Theory


Insights from Information Foraging Theory

• World Wide Web is a collection of patches of relevant information (Websites)

• Forager faces two basic decisions– Continue foraging in current patch– Leave and find a new patch

• Decisions based on comparison of expected gain of staying in current patch versus cost of finding new patch

• Scent-based navigation within a site


Scent-Based Navigation

• Consensus– Users perform website navigation tasks by

exploration by following a trail of information scent

• In Pete’s presentation this morning– Showing that very reliable scent cues are required

for successful navigation (Hogg and Huberman,1987)

– Derivation of theory of scent from first principles


Search, Sensemaking, and Comprehension

• Information Foraging Theory: Two subproblems– Search– Sensemaking

• CoLiDeS: Both search and sensemaking require comprehension– Search requires comprehension of structure and

content of a web page– Sensemaking entails comprehension of retrieved

information


2. Brief Summary of SNIF-ACT


SNIF-ACT

• ACT-R Model of searching a Website for specific information– Based on Information Foraging Theory (Card and

Pirolli, 1999; and Pirolli, 2005)

• A user’s goal and hyperlink texts are represented as collections of chunks– Spreading activation from link chunks to goal

chunks determines scent– Link strengths can be computed from on-line

corpora


SNIF-ACT 1 and 2

Knowledge– Actions are represented as productions

• Probability of an action is determined by its utility in relation to utilities of other actions under consideration

• Utility of clicking on a link is determined by its scent

Control– Evaluate links on webpage

• one link at a time• moving down through links in sequential order

– Use satisficing problem solving strategy


SNIF-ACT 2.0 Page Level Loop

During each cycle

Decide– Click on best link found so far*– Process another link– Click on back button

• Utilities (U) of alternative actions– U (Click on best link) Scent of that link– U (Process another link) Starts high and decreases

with number of links evaluated

– U (Click on back button) Average scents of links processed on previous

pages and this page


Strengths of SNIF-ACT 2.0

• Treatment of Actions with No Scent – Derived from rational analysis of foraging

in information patches (Pirolli, 2005)• Press back button• Leave website

• Theory of Information Scent– Derived from first principles (Pirolli, 2005)– Linked to spreading activation memory

mechanisms in ACT-R


3. Scent AND Background Knowledge


Scent and Background Knowledge

– CoLiDeS assumes that scent is product of successful comprehension of link

–Familiarity => Having necessary background knowledge to comprehend words and categories that make up a link

– Familiarity of words and categories in link is as or more important than scent


Our Version of Scent

• Used Latent Semantic Analysis (LSA) to estimate similarities between goals and– Descriptions of patches (Headings)– Links

• Values of Scent–1.0 < s < 1.0, analogous to correlation

s < 0.1 weak scent0.2 < s < 0.3 moderate scents > 0.3 strong scent


Larson and Czerwinski (1998)

Task– Search for articles in experimental website

simulating Encarta online encyclopedia

– One or two words described target articles• Unfamiliar targets: Pink Floyd, Tlingit, Trilobite, …

– Unfamiliar links• Anthropology, Paleontology, Theology & Practices, …

– Search for articles involving • Unfamiliar targets • Unfamiliar correct links

Far more difficult than predicted by model that only describes scent following


Blackmon et al (2002) Experiment

• Partial replication of Larson & Czerwinski (1998)• Fix unfamiliar target problems

– Provided participants with four or five sentence definitions of target articles

• 16 and 32 link conditions– One click– Select correct topic heading from list of 16 or 32 links– Link texts from Larson and Czerwinski


Tlingit culture, Tlingit tribes


Observed first-click behavior: Unfamiliar problem hid high-scent Anthropology link

Link from Tlingit webpage Percent 1st click

Scent

Anthropology 9% .47

U.S. States, Territories, & Regions

27% .32

History of the Americas 27% .28

People in U. S. History 23% .25

Other links 14% <.25

UNFAMILIAR LINK


12 unfamiliar tasks were significantly more difficult (p<.002)


…but unfamiliar tasks had same mean scent for correct link

0.420.42 0.43

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

All 64 items 12 unfamiliar items 52 non-problemitems (familiar)

Sets of items

Mean s

cent


Experiments with Ethnic Minority Parents and Children

• Parents and under-18 children came together & did experiment at adjacent computers

• Most participants were adult parents with high school or middle school students, but a few children were elementary school students

• Each task asked them to search for things familiar to children, for example, ferns, pets, earth-moving equipment, or Mexican art

• 9 category headings always in left hand navigation column, level-2 pages reveal links for one category (e.g., history) in content area


Goals had pictures to help participants quickly grasp what they were looking for,

and texts written at 3rd-grade level


College-level vs. ethnic minority adults and children (3rd-grade reading level)

• Encarta links 13% unfamiliar scattered sparsely across the page

• < 5% low frequency words

• Powerful, consistent evidence that even college-level population flounders on unfamiliar links but helped by grouping of links into coherent patches

• Encarta links 56% unfamiliar, and some patches are virtually all unfamiliar links

• 52% low frequency words

• Our preliminary observations: ethnic minority population frequently resorted to trial-and-error exhaustive searching of entire patches


9 top-level category links were surprisingly difficult though predicted• Top-level categories

familiar (e.g., Social Science, History)

• 0% low frequency words at top-level

• College-level population good at scent following on 9 top-level category headings

• 44% (4 of 9) of top-level categories are unfamiliar

• 41% low frequency words at top level

• Ethnic minority population poor at scent following on 9 top-level category headings


4. CoLiDeS


CoLiDeS is a Comprehension-Based Model in HCI

• Simulates a user searching for information relevant to her goal on a Web site

• Based on the Construction-Integration (C-I) architecture (Kintsch, 1998)– C-I is a detailed story about utilization of

background knowledge to comprehend or action plan

– C-I can fill in gaps in knowledge using hill climbing or pure forward search as a problem solving strategy

– Hill climbing controlled by information scent


CoLiDeS Starts with a Complex Goal

• I am interested in reading recent articles that deal with prediction of sea level rise in the near future caused by global warming. I am going to browse the Science section of the New York Times Website to articles.

• Can include:– Description of search target– Navigation


CoLiDeS Parses Web Page into Patches to Select a Link

For each page for a given goal:• Attention phase: Select a patch

– Parse page into collection of patches– Describe each patch– Select a patch whose description is

comprehended AND is most similar to the goal

• Action selection phase: Select a link– Describe each link– Select a link whose description is

comprehended AND is most similar to the goal


Dec. 14 NY Times


A Page Parsed Into Patches

Logo Site Nav Bar

Ad Ad

Ad

Search WindowAd

Site N

av Links

Articles

Ad

Topics

Topics

Ad

Info

Ad


Attended to Patch


Determining Patches on A Page

– A mixture of bottom-up and top-down processes

– Bottom-up processes• Driven by perceptual features that define

visually related regions

– Top-down processes• Controlled by the user’s knowledge of Web

page layout conventions and typical pages for a given Web site or type of Web site


Examples of Phenomena to be Explained by Attention Phase

• Interactions between user’s background knowledge, goals, and attention to different patches on a Web page– Banner “Blindness”– Eye movement patterns during interactions

with a Web site


CoLiDeS’s Key Assumptions

• Core process underlying Web navigation is comprehension of texts and images

• Comprehension processes – Build mental representations of goals, patches on

a page, hyperlinks, images, and other targets for action on a page

– Compare goal with representations of patches and hyperlinks, images, and other targets for action on a page.

– Select a patch to attend to or object on page to act on based on comprehension AND similarity of descriptions


5. Multiple Patches on a Web Page


Blackmon, et al. (2003, 2005) Experiments

• Search for target article on Encarta-like website• First two levels of hierarchy presented on one or two

web page(s)• Prototype is 93 links nested under 9 category

headings– Each heading and subordinate links defined a patch– Participant task: Select correct link in correct patch– Clicking on link lead to web page with 8 article titles on page– Click on matching title

• Time limit: 130 seconds• Example task: Find encyclopedia article on

Dome of the Rock


Example of One Level Web Page

Summary of Target Article


Religion & Philosophy is the Most Attractive Patch


Webpage for Find Dome of the Rock Task

• One competing patch on page– Art, Language &

Literature (correct patch)– Religion & Philosophy

(competing patch)– Geography – History– Social Science– Performing Arts– 3 other patches

• 4 competing links in most attractive patch:– Theology & Practices– Religions & Religious

Groups– Scripture– Religious Figures– 3 more links in patch


First-click Distribution

54%

26%

8% 8%4%

70%

18%

2%4%6%

0%

10%

20%

30%

40%

50%

60%

70%

80%

Religion &Philosophy

Geography Art, Language &Literature

History Other

Heading/ patch – Art, Language & Literature is correct

Perc

ent

of

all

firs

t-cl

icks

1-level design2-level design

Correct Patch


Performance on Dome of the Rock Task

• N=38• Mean total clicks on links = 8.3• Mean time = 115 seconds• 58% time expired• 42% finally clicked the correct link,

“Architecture” nested under Art, Language, & Literature


6. Interactions Between Search and Sensemaking


Blackmon, et al. (2002, 2003, 2005) Experiments

• Examining combined impact of – Design errors that mislead a scent

following heuristic• Competing links in incorrect patches• Weak scent correct links• Unfamiliar links interfering with sensemaking

• Cognitive Walkthrough for the Web can:– Correctly identify the above problems– Guide successful correction of these

problems


Combined Analysis of Blackmon, et al. (2002, 2003, 2005)

• Task => College students search for a target article in Encarta like website

• Mean clicks and time for 324 tasks– N’s ranged from 22 to 50, mean N about 40

• Extensive exploration using multiple regression techniques

• Independent Variables– Total competing links in incorrect patches– Weak scent correct link (yes, no)– Unfamiliar correct link (yes, no)

• Examples of other variables….


Examples of Other Independent Variables

• Serial positions of links• Serial positions of patches• Correct patch scent values• Correct link scent values• Number of competing patches• Lots of other variables…


Over View of Results

• Mean number of clicks range from 1.1 to over 10

• Percent solvers ranged from 100% to 25%

• Correlation between clicks and percent solvers = .93

• Three Variables Account for 51% of Variance – Total competing links in incorrect patches– Weak scent correct link (yes, no)– Unfamiliar correct link (yes, no)


Competing Links in Incorrect Patches Increase Task Difficulty (p<.0001)

n=236 tasks

n=88 tasks

n=33 tasks

n=112 tasks


Weak-Scent Correct Links Increase Task Difficulty (p<.0001)

n=271 tasks

n=53 tasks

n=15 tasks

n=112 tasks


Unfamiliar Correct Links Increase Task Difficulty (p<.0001)

n=268 tasks

n=25 tasks

n=112 tasks

n=56 tasks


Conclusions

• Competing Links in Incorrect Patches• Unfamiliarity• Means End Analysis


Competing Links in Incorrect Patches

• Patch contains 0 high or moderate scent links– Little or no impact in performance– Leave patch quickly– Low cost of going to new patch on same page

• Patch contains 2 or more high or moderate scent links– Large impact on performance

• Garden path effects determined by TOTAL number of competing links in incorrect patches


Number of Moderate or High Scent Incorrect Links vs. Observed Clicks

2.331

3.591

5.045

5.9426.512

0

1

2

3

4

5

6

7

0 1 2 3 4 or moreNumber of incorrect links (n=324 tasks)

Mean

ob

serv

ed

clicks o

n lin

ks


Unfamiliarity

• Two Levels– Incomplete knowledge of meaning of a

superordinate concept• e.g., Anthropology, college students• Only 5% of links in Blackmon et al. (2002,…)

– No knowledge of a word• Specialized technical terms for college students• Many superordinate terms for individuals with

3rd to 6th grade reading skills


Impact of Unfamiliarity

• 5% of link terms (1st year college reading level)– Problem solving skills necessary to make

inferences– Infer meaning from other terms in patch– Minor problem

• 50% of link terms (3rd grade reading level)– Locus of unfamiliarity is superordinate concepts– Too many unknown words to be able to make

inferences – May lack necessary problem solving skills


Scent Following Is Means-End Analysis

• Limited by ability to comprehend links• Information Foraging by scent following

is a simple form of Means-Ends Analysis (MEA)– Exhibits all MEA’s failure modes

• Classical problem solving literature is directly relevant

– Operator subgoaling• Navigation goals• Patch enrichment activities


CoLiDeS + SNIF-ACT

• CoLiDeS– A webpage defines multiple patches– Both search and sensemaking require

comprehension• Search requires comprehension of structure

and content of a web page• Sensemaking entails comprehension of

retrieved information


CoLiDeS + SNIF-ACT (Cont.)

• SNIF-ACT– ACT-R cognitive architecture

• Learning mechanisms• Perceptual-motor mechanisms

– Satisficing decision cycle• Click on best link found so far*• Process another link• Click on back button


Contact Information

• Marilyn Blackmon– [email protected]

• Muneo Kitajima– [email protected]

• Peter Polson– [email protected]

mailto:[email protected]



Documents

Feb. 1, 2007Search and Sensemaking1 CoLiDeS and SNIF-ACT: Complementary Models for Searching and Sensemaking on the Web Muneo Kitajima, AIST Peter G. Polson