23
WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London www. dcs . bbk .ac. uk /~mark/

WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

Embed Size (px)

Citation preview

Page 2: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

2

Talk Overview

• Hypertext and the navigation problem

• NavigationZone’s solution

• Problems being researched

• A Demonstration

Page 3: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

3

Hypertext and Navigation

• Long history – Bush 1945, memex – trail blazing– Nelson 1965, Xanadu - network of documents

• Problem of “getting lost in hyperspace”• Navigation aids

– Bookmarks– History– Overview diagrams– Recommendations

Page 4: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

4

State-of-the-Art Navigation Aids

• Novel User-Interfaces to visualise web sites

• Clustering (e.g. Self-Organising Maps)

• Web data mining – finding user patterns

• Semi-automated navigation, BestTrail algorithm – motivation to follow …

Page 5: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

5

Typical corporate search

Page 6: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

6

A typical search scenario

1) Submit a query to a search engine• Is it too broad / too specific? • Does it capture my information needs?

2) Select a URL from the result set• Have I made the right choice?

3) Start manual navigation• Where - am I? have I come from ? am I going to ?

4) Goto (1) to reformulate the query

Page 7: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

7

Content centric approach

a

c

e

d* ba

e

d

Page 8: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

8

Problems with standard Search

• Page level relevance scoring – sensitive to query terms

• No look ahead– ‘click and discover’

• No context– results are totally isolated

• No navigation support– Users are left on their own to find their way

Page 9: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

9

Possible solutions (information retrieval)

• Improve basic IR

• Link analysis, e.g. pagerank and HITS

• Meta data tagging– Keywords and taxonomies (semantic web)

• Natural language– Q&A, sentence analysis, synonyms

Page 10: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

10

Possible solutions (information seeking)

• Suggestion engines– Link and content generation

• Categories and directories– Explicit manual construction

• Automatic classification– Machine learning techniques

Page 11: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

11

Are these feasible?

• Re-architecting corporate information infrastructure is extremely expensive

• Sophisticated approaches are not always intuitive and are yet to be proven

• Same problem every couple of years

• Mergers and acquisitions

Page 12: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

12

There is, actually, a better way!

• Treat sequence of pages, or trails, as first-class citizens for search

• Consider the topology of the area in which you are searching

• Employ navigational aids

Page 13: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

13

Context centric approach

a

c

e

d* ba

c

e

d* b

e

a

c

d* b

Page 14: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

14

The information value of a trail is higher than the sum of it parts!

Page 15: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

15

Our approach

• Provide information retrieval of the highest quality and in addition,

• Find out what is beyond the most relevant pages by ‘exploring the area’

• Present users with precise and relevant trails

• Provide navigation assistance within the UI

Page 16: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

16

NavZone user interface

Page 17: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

17

First Monday paper

Task – find answers to 5 types of questions

1) Fact Finding – What are the term dates?

2) Judgement – Is CSIS a “good” place to do research?

3) Fact Comparison – Which train station is closest to the college?

4) Judgement Comparison – Is the research in deptA better than that in deptB?

5) General Navigational – How do you get to the checkout?

NavZone Usability Study

Page 18: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

18

% of subjects, 4+ questions correct

59% Google 75% Compass83% NavZone

NavZone vs. Google and Compass

Page 19: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

19

44 Google40 Compass27 NavZone

NavZone is bandwidth “green” !

Average # clicks to complete task

Page 20: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

20

18 Compass17 Google13 NavZone

Average time taken per task (min)

Wilcoxon Test - Statistically Significant

Page 21: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

22

The main ingredients

robot

ParserHTML, XML,

PDF, PostScript,Word, Other

genericformat

crawler

BestTrail

web graph

userinterface

trail engine

postprocessor

invertedfile

indexer

BestTrail

web graph

userinterface

Page 22: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

26

Under Development

• Alternative User-Interfaces

• Seamless integration with relational databases and file systems

• Data mining and personalisation

• Mobile/PDA support

Page 23: WWW Search and Navigation Mark Levene SCIS, Birkbeck College University of London mark

27

Open Problem

• How do we make use of statistical regularities that are present in the web to improve search and navigation?

• See, Levene et al. A stochastic model for the evolution of the web., Condensed Matter Archive, cond-mat/0110016, 2001- many distributions related to the web graph follow a power law