Upload
marion-phelps
View
214
Download
1
Embed Size (px)
Citation preview
A Transcoding Proxy for HTML Web Pages: Web Page
Sampling and Conversion Evaluation.
Andrew StoneCS525m
Worcester Polytechnic Institute
2
Overview
• Proxy Goal and Scope
• Related Work
• Project scope
• Testing Methodology
• Demo
• Conclusions
• Future Work
Worcester Polytechnic Institute
3
Proxy Goal
• Reduce data traffic– Get content displayed faster– Save bandwidth (and money)– Reduce power consumption
• Change content to suit device– Browser properties
Worcester Polytechnic Institute
4
Related Work
• HTML to WML Transcoding Proxy– http://zoo.cs.yale.edu/classes/cs490/00-01b/dugas.robert.rfd8/rfd8cs490.pdf
• iMobile EE– http://portal.acm.org/citation.cfm?id=778492&coll=portal&dl=ACM&CFID=71256236&CFTOKEN=91425173
• RSVP Browser– http://portal.acm.org/citation.cfm?id=591429&coll=portal&dl=ACM&CFID=71256236&CFTOKEN=91425173
• Navigating a Mobile XHTML App– http://portal.acm.org/citation.cfm?id=642669&coll=portal&dl=ACM&CFID=71256236&CFTOKEN=91425173
• http://www.skweezer.net
Worcester Polytechnic Institute
5
Project Scope
• Create component to transcode web pages using HTML Tidy and XML Stylesheets
• Measure web page size reduction
• Evaluate web page readability on PC with IE and Firefox and on Windows Mobile 5 Pocket IE
Worcester Polytechnic Institute
6
Issue Get Request
Internet
Proxy
Get Request
HTML Tidy
XSLTTransform
Return Content
xHTML
Transformed Content
Worcester Polytechnic Institute
7
Web Page Reduction
• Data Set: 5852 pages from 403 domains– From Paul Timmins and Sean McCormick’s “Characteristics of Today’s Mobile
Web Content”
• HTML Tidy produced 2730 transformed pages– 2417 successful XSL Transformations from 266 domains
• Before– Average Page Size including images: 46.9 KB– Average Page Size excluding images: 23.3 KB
• After– Average Page Size including images: 43.0 KB– Average Page Size excluding images: 19.4 KB
Worcester Polytechnic Institute
8
Web Page Layout Demo
Worcester Polytechnic Institute
9
Conclusions
• Real gains are in image manipulation
• ~50% of web pages have non standard HTML or HTML Tidy
• Another HTML fixing tool should be tested
• Image compression should be evaluated