25
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7 th Conference on Human Factors and the Web Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros Alex Safonov University of Minnesota Department of Computer Science and Engineering

Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Embed Size (px)

Citation preview

Page 1: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Beyond Hard-to-Reach Pages: Interactive, Parametric Web

Macros

Alex SafonovUniversity of Minnesota

Department of Computer Science and Engineering

Page 2: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Talk Outline

• Problem: user interaction with the WWW becoming more tedious

• Solution: personal Web Automation

• Challenges of Web Automation

• Lessons learned from the WebMacros system

Page 3: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Take-away Messages

• General Web users– There is an effort to simplify repetitive tasks by

automation

• Web usability specialists– Personal Web Automation as a means to improve site

usability

• Content providers– Awareness of Web Automation scripts vs. data-

scraping bots

Page 4: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Page 5: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Hard-to-Reach Pages: Examples

• Airline/hotel/car reservations

• Searches over library and citation databases

• Populating e-commerce shopping carts

• Map and weather queries

Page 6: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Patterns of a Web Task

• Identify oneself (optional)– Implicitly (cookies) or explicitly (login)

• Select the appropriate service• Specify query parameters and execute query

– HTML forms on one or several pages

• Review/iterate over returned items– E.g., save or print each paper matched in ACM DL– Returned items may span multiple pages

• Repeat query with different parameters (optional)

Page 7: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Sharing Web Interactions

• Scenario 1– Instructor populates an online bookstore

shopping cart with course textbooks; she would like students to instantly access the cart

• Scenario 2– System administrator performs a Web-based

administration task; wants to make the task available to colleagues

Page 8: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Existing Tools

• Bookmarks/favorites and histories– Links only – not procedures

• Server-based mechanisms– comparison shopping services; auction proxies;

special URLs for bookmarking;– limited flexibility: user is not in control

Page 9: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Motivation: Automate Repetitive Tasks

• Tasks such as checking airline pricing or changing system configuration are often performed many times– With the same or different parameters

• Goal: relieve the user from doing repetitive tasks by using automation

• Approach: capture and reuse user interactions with the Web

Page 10: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Related Work

• Macro scripting and Programming By Demonstration (PBD)

• Web Automation (LiveAgent, WebVCR, WebMacros)

• Hypermedia trails and tours

• Web Semantics and Web Services

Page 11: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Requirements for a Personal Web Automation system

• Users create Web automation scripts by demonstration, in a familiar environment

• Web Automation systems handle dynamic data and semantic-free markup

• Running scripts have reasonable side effects

• Privacy is maintained when sharing scripts

• Scripts support parameters

Page 12: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

WebMacros – a Personal Web Automation system

• First prototype - HFWeb 99– Records and replays a linear sequence of

navigation steps (opened URLs, followed links, and form submissions)

• Users create Web Automation scripts (Web macros) through normal Web navigation and form filling

Page 13: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Recording Web macros

• Click on – Browser opens a new window

• Open the Avis home page• Fill out the forms, navigate pages• Supply macro name

and description andclick on“Finish Recording”

Page 14: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Running Web macros

Macro playback control panel

A directory of user’s macros

Page 15: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Macros with Parameters

• During recording, user can mark form inputs as parameters

• During playback, user specifies current values

Page 16: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Batch and Interactive Playback

• Batch playback– Browser loads the final page of a macro

• Why support interactive playback?– First use of a macro– Easier to substitute parameters– Can “skip to end”

• WebMacros substitutes recorded parameter values (except private ones) into the page

Page 17: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Dealing with Dynamic Content

• A Web automation system works in an unreliable, dynamic environment

• Page retrieved at macro playback may be not what the user expects– Services may be unavailable– Verbatim replay of recorded steps may be

inappropriate• Session ids• Expired or missing cookies

Page 18: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Dealing with Dynamic Content

• WebMacros uses rules to match recorded steps against retrieved pages during replay

• How can the system determine that an incorrect page was retrieved?

Page 19: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Same Template, Different Content

• Dynamically generated pages may have different content but similar HTML markup

Page 20: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Structure-based Page Verification

• WebMacros efficiently compares HTML parse trees of recorded and retrieved pages

• An HTML parse tree of a page is compactly represented as a set of path expressions

• Similarity measure suited to template-generated pages with different numbers of items

• If similarity between structure of recorded and actual page below threshold, WebMacros alerts user

Page 21: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Sharing of WebMacros Scripts

• Cookie context can be encapsulated with macros– Allows to play macros from any computer– Allows to share macros among users

• Course textbook shopping cart– No instructor’s or student’s cookies

Page 22: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

WebMacros Architecture

• Advantages of a pure proxy architecture – Any HTTP client works; does not need a built-in JVM

– Proxy design enables remote use and sharing of macros

– Proxy does not depend on the browser for page retrieval

– Proxy does not need “security clearance” to read/write local files and modify incoming pages

• Drawbacks– User must trust the macro server if macros are stored on it

– No access to browser-generated HTML

– Non-local proxy generates extra HTTP traffic

Page 23: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

System Implementation

• Approach: HTML rewriting– During recording, WebMacros modifies URLs of links, images,

forms, and frames on the retrieved page

– These are rewritten to special URLs intercepted by the proxy

– Form fields are annotated with parameter selection radioboxes

• Macro Representation– Macros are stored in a relational database

– Originally, macro steps stored as WebL scripts – difficult to manipulate

– WebL scripts generated and executed for each step

Page 24: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Lessons Learned

• Hybrid architecture for recording and playback– Difficult to detect user actions from a proxy– Optimal: client-based (applet) recorder, proxy

playback component

• XML representation for macros– Lightweight, fast parsers now available– Not tied to a relational DBMS

Page 25: Universal Access: More People. More Situations Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros7 th Conference on Human Factors and the Web

Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web

Universal Access:More People. More Situations

Further Work

• Pilot user study is under way– improved recording and playback controls;

added the Undo feature for recording

• Detect iteration during macro demonstration– Approach: user demonstrates some example

(e.g., links), WebMacros generalizes to similar links and merges results

• Propose HTML extensions/XML DTD to make Web Automation more reliable