23
Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries Texas A&M University

Managing Change on the Web Luis Francisco-Revilla Frank M. Shipman Richard Furuta Unmil Karadkar Avital Arora Center for the Study of Digital Libraries

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Managing Change on the Web

Luis Francisco-Revilla Frank M. Shipman

Richard Furuta Unmil Karadkar

Avital Arora

Center for the Study of Digital Libraries Texas A&M University

What is this talk about?

A system approach to help in managing digital libraries with collections of fluid resources with distributed location and ownership

Modern paradigms of digital libraries Pointers rather than the resources

Web-based collections NSDL

(http://www.ehr.nsf.gov/due/programs/nsdl/) Meta-documents High fluidity Changes vary in relevance Little system aid for assessing relevance

of changes

This is a problem everybody has:

Bookmark lists Yahoo! catalogues Search engines indices

Related work

David Johnson PhD Dissertation, University of Washington Document distance Weighted, asymmetric

Change monitoring systems AIDE, URL Minder, WatzNew Fine-grained yes/no detection WebWatcher (evolving)

“Interesting” Identification Syskill & Webert, Do-I-Care-Agent , Letizia Personal, reader specific, profile-based

Motivation

Managing Walden’s Paths collection Paths are meta-documents

Sequential arrangement of Web pages Rhetorically coherent Contextualized Distributed ownership Distributed authorship

Continuous revision of the collection

Mechanisms for addressing the issue

Caching the pages Caching strategies Some changes are desirable

Fluid paths Ephemeral paths Rhetorical coherence

The real issue

Mechanisms only allowed limited reaction to changes

Detecting changes is easy but determining the relevance is difficult

Humans are still required to determine the significance of changes

In order to react to changes the assessment of their relevance is required

The perception of change (overview)

Observe how humans perceive changes of Web pages

Inform and evaluate the approach and design Questions

1. Do people view the same changes in a different way when given different amounts of time?

2. What kind of changes are easily perceived?

3. Of what kind of changes do users want to be notified?

Kinds of change

Content changes (what) Presentation changes (how) Structural changes (linking) Behavioral changes

Results and implications

Presentation changes were usually perceived as irrelevant

The desire of notification and the perception of overall change increased as the degree of content change did

Time played a larger role for the perception of structural changes than for the content changes

As the degree of structural change increased, so did the desire of notification

Links are useful metrics

Path Manager: the system

Java based Paths or bookmark lists HTML pages Functional state of the document

Original Valid Last-time

Algorithms

Variation of Johnson Weighted sum of

additions, deletions and modifications for each metric

Added metric for structure changes

Flexible Asymmetric Lack normalization

Proportional Determines the

proportion of modification for each metric

Simple Symmetrical Normalized

Initial interface

Overall change relevance assessment

Document signatures

Paragraph checksums Headlines Links Keywords Global checksum

View of change metrics

Detailed view of page metrics

Path information

Web page retrieval and connectivity

Potentially slow and unpredictable Parallel retrieval

Multi-threaded Multiple attempts and retries Different states

Connection state Retrieval state Analysis state

Challenges and limitations

Heuristic identification of document structure (I.e. headings)

Indirection Behavior Dynamic pages

Conclusions

Managing distributed collections of documents remains challenging and time consuming requiring the assistance of humans

The Path Manager supports the maintenance of collection of Web pages

by recognizing, evaluating and informing the user of relevant changes

keeps track of the original, valid and last-time state of Web pages

The study conducted indicated the desire for structural changes to be included in the determination of overall change

Contact information

Luis [email protected]

Frank M. Shipman, [email protected]

Richard [email protected]

Unmil [email protected]

Avital [email protected]