9

Click here to load reader

Cost-Aware Mobile Web Browsing

Embed Size (px)

Citation preview

Page 1: Cost-Aware Mobile Web Browsing

34 PERVASIVE computing Published by the IEEE CS n 1536-1268/12/$31.00 © 2012 IEEE

T e c h n o l o g i e s f o r D e v e l o p m e n T

cost-Aware mobile Web Browsing

M any users in developing regions are subject to usage-based pricing, which results in significantly higher prices than the flat rates found in

developed countries. In the past, usage-based pricing was affordable, but with the dramatic growth in the size and complexity of webpages, such pricing can result in high monthly bills, even for minimal use.

Since 1999, the average webpage has increased in size by a factor of 22 (not including video— see www.websiteoptimization.com),1 and there’s been a 30- to 50-fold increase in webpage com-

plexity (objects per page).2–4 Under many of the existing data-rate plans for mobile users in developing regions, a single download of a webpage that’s larger than 2 Mbytes (which is very common) can easily cost

over US$1. Although Web designers often create low-bandwidth versions of webpages for mobile users, expecting a large fraction of the Web to be manually redesigned for such users is impractical.

Our cost-aware mobile Web browsing mech-anism aims to substantially reduce the cost of browsing the Web on a mobile device. The key idea of our approach is to adapt Web content as a function of a user’s mobile pricing plan. Given the user’s pricing plan and current data-usage levels and limits, we can dynamically compute a cost quota for each Web request and use a network middlebox to automatically adapt any

webpage to stay within the cost quota. Unlike the conventional approach, in which the server dic-tates a webpage’s size, our approach lets the client dictate the download limit for each Web request and forwards the best possible version of a page for a given data allowance, based on user feedback.

cost-Aware mobile BrowsingCost-aware mobile Web browsing is fundamen-tally different from conventional content adap-tation techniques (see the “Related Work in Web Content Adaptation” sidebar).5–7 The main idea behind cost-aware mobile Web browsing is to associate each webpage download with a cost quota that reflects the number of bytes that the user would need to download the page.

In a cost-based Web browsing model, a user should be able to specify the cost quota for a web-page, and the server should be able to provide a condensed version of the page to meet the speci-fied quota. In our design, cost awareness is a user-controlled metric that’s independent of the origi-nal webpage size. We made this policy decision to emphasize user control over incurred costs; it’s in-dependent of the mechanisms used to adapt web-pages to the quota. An alternative policy would be to let the system automatically determine the cost, proportional to the original webpage size.

Our model balances cost reduction with page fidelity using

• a cost quota,• the content adaption ladder, and• user feedback.

A cost-aware mobile browsing framework automatically adapts Web content as a function of a user’s data pricing plan and usage levels, reducing costs. This also enhances browsing experiences by forwarding the best possible version of a page for a given data allowance.

Sindhura Chava, Rachid Ennaji, Jay Chen, and Lakshminarayanan SubramanianNew York University

PC-11-03-Sub.indd 34 6/4/12 3:42 PM

Page 2: Cost-Aware Mobile Web Browsing

JULY–SEPTEMBER 2012 PERVASIVE computing 35

The user provides both the data plan cost structure and a target budget over a fixed time period (one month, for ex-ample). We continuously monitor the

user’s Web usage and compute a cost quota for each new Web request, so that the user doesn’t go over the target budget.

We derive a basic six-level content adaptation hierarchy for any webpage and select the appropriate adaptation for the cost quota. We also leverage

A large body of work exists on the subject of Web content

adaptation1–4 and, more specifically, content adaptation

for mobile devices.5–8

The use of proxies for Web prefetching isn’t a new idea,9,10 and

a few developing-region-specific content adaptation systems

have been developed.2,4 These systems don’t take into account

the ergonomic limitations of mobile devices. To adapt Web con-

tent for mobile devices, a proxy-based solution5 uses RSS feeds,

and Opera Mini (www.opera.com/mobile/download), a commer-

cial proxy-based system, pre-renders pages. (Opera Mini is cur-

rently being used by 72 percent of mobile Web users in Africa—

see www.opera.com/smw.) However, none of these systems adapts

content for mobile devices based on connection cost. Further-

more, these systems employ relatively cursory optimizations.

Most of the content adaptation work for mobile devices con-

centrates on content transformation to suit the display of smaller

devices. Various content transformation processes—such as

layout changes and content format reconfiguration—are used

to adapt Web content to mobile user agents.11 The idea of par-

titioning the webpage into blocks (snippets) and processing the

information only in the useful blocks by filtering out unneces-

sary ones is another approach used.12 Some content adapta-

tion solutions target only the webpages’ multimedia content.13

We’ve leveraged adaptation techniques from automated Web

re-authoring tools, which takes the capabilities of the client and

automatically re-authors the webpage to fit the client’s require-

ments.1,14,15 Another related class of works focus on user-interactive

content adaptation, where the system takes input from the user

to adapt the content based on the user’s needs and changes the

adaptation based on user feedback.16–18

REfEREnCES

1. T. Bickmore, A. Girgensohn, and J. Sullivan, “Web Page Filtering and Re-Authoring for Mobile Users,” The Computer J., vol. 42, no. 6, 1999, pp. 534–546.

2. J. Chen, L. Subramanian, and J. Li. Ruralcafe, “Web Search in the Rural Developing World,” Proc. 18th Int’l Conf. World Wide Web, Int’l World Wide Web Conf. Committee (IW3C2), 2009; www.www2009.org/proceedings/pdf/p411.pdf.

3. A. Fox and E. Brewer, “Reducing WWW Latency and Bandwidth Re-quirements by Real-Time Distillation,” Computer Networks and ISDN Systems, vol. 28, nos. 7–11, 1996, pp. 1445–1456.

4. W. Thies et al., “Searching the World Wide Web in Low-Connectivity Communities,” Proc. 11th Int’l Conf. World Wide Web, IW3C2, 2002; http://www2002.org/CDROM/alternate/714/.

5. A. Blekas, J. Garofalakis, and V. Stefanis, “Use of RSS Feeds for Content Adaptation in Mobile Web Browsing,” Proc. 2006 Int’l Cross-Disciplinary Workshop on Web Accessibility (W4A): Build-ing the Mobile Web: Rediscovering Accessibility?, ACM, 2006, pp. 79–85.

6. Y. Hwang, J. Kim, and E. Seo, “Structure-Aware Web Transcoding for Mobile Devices,” IEEE Internet Computing, vol. 7, no. 5, 2003, pp. 14–21.

7. I. Mohomed, J. Cai, and E. De Lara, “Urica: Usage-Aware Interactive Content Adaptation for Mobile Devices,” ACM SIGOPS Operating Systems Rev., vol. 40, no. 4, 2006, pp. 345–358.

8. I. Mohomed et al., “Correlation-Based Content Adaptation for Mobile Web Browsing,” Proc. ACM/IFIP/USENIX 2007 Int’l Conf. Middleware, Springer, 2007, pp. 101–120.

9. L. Fan et al., “Web Prefetching between Low-Bandwidth Clients and Proxies: Potential and Performance,” Proc. ACM SIGMETRICS Int’l Conf. Measurement and Modeling of Computer Systems, ACM, 1999, pp. 178–187.

10. X. Chen and X. Zhang, “Coordinated Data Prefetching by Utilizing Reference Information at Both Proxy and Web Servers,” ACM SIGMETRICS Performance Evaluation Rev., vol. 29, no. 2, 2001, pp. 32–38.

11. T. Laakko and T. Hiltunen, “Adapting Web Content to Mobile User Agents,” IEEE Internet Computing, vol. 9, no. 2, 2005, pp. 46–53.

12. E. Lee et al., “Topic-Specific Web Content Adaptation to Mobile Devices,” Proc. 2006 IEEE/WIC/ACM Int’l Conf. Web Intelligence, IEEE CS, 2006, pp. 845–848.

13. D. Zhang, “Web Content Adaptation for Mobile Handheld Devices,” Comm. ACM, vol. 50, no. 2, 2007, pp. 75–79.

14. O. Buyukkokten et al., “Efficient Web Browsing on Handheld Devices Using Page and Form Summarization,” ACM Trans. Information Systems, vol. 20, no. 1, 2002, pp. 82–115.

15. Y. Chen, W. Ma, and H. Zhang, “Detecting Web Page Structure for Adaptive Viewing on Small Form Factor Devices,” Proc. 12th Int’l Conf. World Wide Web, ACM, 2003, pp. 20–24.

16. P. Baudisch et al., “Collapse-to-Zoom: Viewing Web Pages on Small Screen Devices by Interactively Removing Irrelevant Content,” Proc. 17th Annual ACM Symp. User Interface Software and Technology, ACM, 2004, pp. 91–94.

17. P.K. Mishra et al., “User Interactive Web Content Adaptation for Mobile Devices,” Proc. 3rd CSI Nat’l Conf. Education and Research (ConfER 10), 2010; www.bvicam.ac.in/news/INDIACom% 202010%20Proceedings/papers/Group3/INDIACom10_392_Paper%20(1).pdf.

18. I. Mohomed et al., “Context-Aware Interactive Content Adaptation,” Proc. 4th Int’l Conf. Mobile Systems, Applications and Services, ACM, 2006, pp. 42–55.

related Work in Web content Adaptation

PC-11-03-Sub.indd 35 6/4/12 3:42 PM

Page 3: Cost-Aware Mobile Web Browsing

36 PERVASIVE computing www.computer.org/pervasive

Technologies for DevelopmenT

user feedback. When the cost quota is low, the content adaptation can be too conservative for certain webpages and provide little useful information to the user. In such cases, the user should have the option of requesting an enhanced version of the page. The system should also continuously provide informa-tion about current data-consumption charges.

Usage-Based pricingIn developing regions, the two primary reasons for usage-based pricing are limited data capacity and unpredict-able demand. Similarly, international roaming users have traditionally been charged high usage-based fees. Usage-based prices, in both developed and developing regions, are substantially higher than unlimited plan prices in developed countries.

Unfortunately, with spectrum auc-tion prices increasing around the world, cellular providers are exploring

usage-based pricing as a mechanism to increase revenues. Major mobile operators in developed countries have recently shifted to “tiered” pricing plans even for domestic data subscrib-ers rather than unlimited plans. In the US, AT&T began tiered pricing in June 2010, T-Mobile in June 2011, and Verizon in July 2011.

One of the reasons providers use usage-based pricing models for data and voice connectivity is because of the uncertainty in the demand. Typically, regions that use flat-rate pricing will place contractual constraints on users to overcome the demand uncertainty. After crossing the plan limit, the fees of these plans are roughly on the order of US$0.10 to $0.25 per Mbyte across different providers around the world. At these rates, the relative price is high when compared to the region’s purchas-ing power parity.

Most users in developing regions are subject to different forms of usage-based pricing models. The simplest usage-based plan is to charge a fixed price per downloaded Mbyte. Some other plans, which charge a fixed cost, typically limit the download size or us-age time, both of which constrain the user. Beyond these limits, most plans revert back to a simple per-Mbyte charging model.

Table 1 lists the data plan bun-dles from Safaricom, Kenya’s largest

mobile network. Within these data plans, the cost per Mbyte is relatively low ($0.023 to $0.006). However, after the bundle limit is exceeded, each ad-ditional Mbyte costs approximately $0.10. (Note that Kenya is one of the fastest-growing and most competitive mobile markets in Subsaharan Africa and represents an approximate lower bound on mobile prices.) The average popular webpage is 551 Kbytes. Any webpage with embedded objects (such as sounds, video, or flash objects) would be several Mbytes. Therefore, the cost of even a single download could be fairly high.

We focus here on developing regions, but it’s important to note that the con-strained mobile data problem isn’t iso-lated to these regions. Table 2 illustrates the roaming fees per Mbyte of data transfer for four large cellular providers in the US. Most carriers charge $15 to $20 per Mbyte of data transfer, which is extremely expensive.

Anatomy of a WebpageAnother troubling aspect of mobile Web browsing is the complexity of webpages and the amount of unnec-essary information on most websites. A relatively simple webpage contains a combination of several types of objects—including HTML, Cascading Style Sheets (CSS), favicons (shortcut icons), images, Iframes, scripts, Flash objects, and embedded objects. Some of these Web resources aren’t directly rel-evant to what users want to download.

Figure 1 summarizes the distribution of objects across webpages, based on a simple analysis of the top 100 global webpages (according to www.alexa.com). For the average webpage, Java-Script files occupy 200.41 Kbytes. Our observations corroborate the trends found in other, more comprehensive measurement studies.3,4 Our analysis didn’t include video files, because their size alone is higher than the rest of the page, and even without including the video objects, the cost of browsing a single page is high.

TABLE 1 Prepay data bundles for Safaricom, Kenya’s largest mobile network.*

Data volume Price (Ksh) Price per Mbyte (Ksh)

50 Mbytes 100 2.00

200 Mbytes 250 1.25

500 Mbytes 499 1.00

1.5 Gbytes 999 .65

3 Gbytes 1,999 .65

4 Gbytes 2,499 .61

8 Gbytes 3,999 .49

20 Gbytes 9,999 .49

30 Gbytes 14,999 .49

* All plans cost 8 Kenya shilling (Ksh) per Mbyte after the bundle volume (83.7 Ksh equals US$1; Dec. 2011).

TABLE 2 International data roaming charges for mobile broadband (Dec. 2011).

Service provider

Roaming charges per Mbyte (US$)

AT&T $19.96

Verizon $20.48

T-Mobile $15.00

Sprint $19.50

PC-11-03-Sub.indd 36 6/4/12 3:42 PM

Page 4: Cost-Aware Mobile Web Browsing

JULY–SEPTEMBER 2012 PERVASIVE computing 37

system DesignOur system directs all Web requests from a mobile user to a cost-aware HTTP proxy for Web access. We ex-plicitly assume that the HTTP proxy resides on a machine that has ample network resources and that isn’t band-width or cost constrained. The proxy maintains a registry of users and their corresponding cost plan status. The proxy is responsible for handling all the communications between the mo-bile clients and Web servers. Given a new Web request from one of the mo-bile clients, the proxy computes the us-er’s cost quota and performs different content adaptation techniques to for-ward the best version possible.

The mobile client device doesn’t re-quire any software modifications. To start using our system, the user manu-ally configures the mobile Web browser to connect through the proxy and spec-ifies which credentials to use. In an ini-tial registration phase, the client uses a Web form to communicate cost param-eters (that is, the user’s pricing plan), which implements the cost awareness. All returned pages include an embed-ded hyperlink to let the user request enhanced page versions, if desired. The adapted page versions are stored at the proxy for a short time to satisfy enhanced-version requests.

computing the cost QuotaConsider a single user with a monthly budget B. We can simplify existing pric-ing models into one of two types:

•Constant rate model: the user pays a standard rate of a per downloaded Mbyte, so the user’s quota Q is B/a Mbyte.

•Bundle rate model: The user pays a fixed cost C for a specified amount of downloaded Mbytes. Beyond that quota amount Q, the user pays a standard overage fee of a per Mbyte.

We further simplify these models using a simple quota Q for the target budget. In the bundle rate model, if

the user exceeds the specified quota Q for that month, the user must spec-ify a new target budget B beyond the fixed cost and compute a new quota as B/a Mbytes for future requests during the month.

Let N represent the average number of requests issued by the user in the pre-vious months and T represent the bill-ing cycle for one month (30 days). Let t represent the current time that has elapsed during this time period. Let n(t) represent the number of requests made so far; q(t) is the current quota consumed. Given that users might have variable usage characteristics across months, we don’t constrain the number of user requests in our model.

We assume that the requests from a user arrive as a memoryless process at an unknown rate. Using the short-term history as a sample, we predict the future number of requests during the time period T − t left in the current monthly cycle as n(t) × (T − t)/t. For the current request, this assumes that the rate of requests for this time cycle is rep-resented by n(t)/t. We can estimate the quota of a new Web request as

( ( ))( ) ( )

.Q q t tn t T t

− ×× −

This estimation might not be accu-rate when n(t) is very small, because there’s little information to predict the number of requests during the month. In this case, we allocate a Web request quota of Q/N based on the previous month history of N. When q(t) > Q, we revert to the basic text version of a web-page unless the user explicitly requests an improved version.

content Adaptation ladderGiven the anatomy of a webpage, we introduce a six-level content adapta-tion hierarchy for a given webpage. The first three levels of the webpage form the text-only ladder and focus on tex-tual versions of the page, while the final three levels form the advanced ladder and focus on adaptation and filtering

techniques that preserve the page’s basic structure.

Text-only ladder. The three levels of the text-only ladder of a webpage are

• the snippet page,• the text-only version, and• the page summarization (level 0).

In the snippet version of a page, we extract the most important textual snippet of the requested webpage. In the preprocessing step, we extract the main body of the webpage by remov-ing redundant data and advertisements, and we split the block-level HTML components into a sequence of para-graphs. To extract the main text snip-pet, we extend the basic mechanism proposed by Baoyao Zhou and his col-leagues,8 which uses a simple averaging mechanism to smooth the word count graph at a paragraph level. We extend this scheme to include the influence of a desired number of neighboring para-graphs. We normalize the word count of every paragraph by convolving the

Figure 1. Webpage anatomy of Alexa’s top 100 pages broken down by bytes (www.alexa.com). The average webpage is 551 Kbytes, with 200.41 Kbytes occupied by JavaScript files.

10%

10%

23%

13%

31% 11%

3%

HTML/txt

Stylesheet file

lframe

CSS image

Image

Flash object

JavaScript file

Favicon

PC-11-03-Sub.indd 37 6/4/12 3:42 PM

Page 5: Cost-Aware Mobile Web Browsing

38 PERVASIVE computing www.computer.org/pervasive

Technologies for DevelopmenT

word-count function with a fixed-width Gaussian function—that is, if { }ni i

N=1 is the normalized word count as

a function of the paragraph number, we assume

ˆ ,nn e

e

ik pp

i k

k

p

k pp

k

pN

= =− −

=−

2

2

2

2

2

2

where p is the number of samples to be considered before and after the current sample. A salient feature of the Gauss-ian smoothing function is that the cur-rent sample is given the highest weight, and the weight of the neighboring sam-ples decays smoothly with distance. This feature helps increase the prob-ability of extracting the main text from the page.

In the text-only version, we strip all unessential HTML tags and present a condensed HTML (with a plain CSS) containing the page’s textual content. We repeatedly use the procedure of main-text snippet extraction to ex-tract the next level of main snippets. We then take the top five results of the main text snippet extraction and form a text-only version of the requested webpage.

For page summarization (level 0), we present a summarization of the web-page, where we show the important headings with just a little information about them. This method filters out everything but the headings and the paragraphs next to each heading. This technique works well for news web-pages or other pages with textual data.

Advanced ladder. The advanced ladder supports the following three levels:

• Level 1 includes HTML, CSS, Iframe, relevant JavaScripts, and images in headings;

•Level 2 includes Level 1 plus images (compressed and downsampled); and

•Level 3 includes Level 2 plus embed-ded objects.

Level 1 extracts and presents only the critical components of the page, while Level 2 is a filtered and adapted version that removes ads and unnec-essary scripts. Level 3 presents an adapted version of the page with no filtering. While generating these dif-ferent page versions, we performed a few critical optimizations to enhance load times and reduce the webpage size.

First, we rewrote the HTML to remove all unnecessary textual in-formation to reduce the page size. In the case of JavaScript, this improves response time performance, because it reduces the downloaded file’s size. This contributed to a 5 to 10 Kbyte reduc-tion in webpage sizes.

Second, we optimized the rendering time. Moving the CSS to the HTML header allows progressive rendering. Similarly, moving the JavaScripts to the end of the HTML code enhances the user experience by downloading the lightweight components first.

Third, we filtered out the ads and fa-vicons. JavaScripts that contain partic-ular keywords are potential signatures of advertisements. To separate scripts that are related to a page from those that aren’t, we constructed a large list of keywords used to identify ad-related scripts. Similarly, we distinguished images related to the webpage from those that weren’t based on the image name and location and the location of the image pointer in the HTML. We also removed favicons across all adaptation levels, which saved 2 to 10 Kbytes.

Finally, we filtered out images and embedded objects. We used the standard approaches of downsam-pling images and removing embed-ded objects at the appropriate levels of the adaptation ladder, which can include multimedia content such as sounds, videos, flash, and so on. The typical size of a flash object in web-pages is 85.4 Kbytes and contributes to 13 percent of the average webpage size.

User feedbackEvery Web request response is associ-ated with “cached” hyperlinks at the top of the page that let the user request enhanced versions of a page present in the proxy cache. This is impor-tant when the user crosses the target quota and is automatically assigned a text-version of a page with a quota of 10 Kbytes (our default value). The user is presented four pointers: to the origi-nal page, a Level 2 page, a Level 1 page, and the text-only version.

One challenge of implementing the multilevel adaptation approach is that the proxy must first download the en-tire webpage before performing the ad-aptation. This is especially a concern for proxy-side rendering systems such as Opera Mini. To maximize parallel-ism and minimize rendering time, we perform on-the-fly adaptation at the proxy, which issues requests in batches. Given a quota, if the adapted version of a webpage containing all objects downloaded so far achieves the quota, the proxy stops downloading more ob-jects in the page. The batches of object downloads directly correspond with the three levels in our advanced ladder. In the first batch, we download objects corresponding to Level 1; Level 2 in the second batch; and Level 3 in the third batch.

implementationWe implemented a prototype system in 2,000 lines of Java code. The system ar-chitecture comprises a proxy and local storage to maintain user information.

proxyTo service page requests, the proxy has three main responsibilities. First, it listens for incoming connections on port 7500 and spawns a new thread to process the request. In addition to parsing the HTTP request header, the thread reads the user’s cost parameters from the disk and stores the user state in memory.

Second, the thread receives web-pages and dynamically computes the

PC-11-03-Sub.indd 38 6/4/12 3:42 PM

Page 6: Cost-Aware Mobile Web Browsing

JULY–SEPTEMBER 2012 PERVASIVE computing 39

cost quota. The thread uses the web-page structure to determine the types and number of Web resources in the page. Using the cost quota, the web-page’s content is adapted and filtered using the techniques we described.

Finally, the thread sends the webpage to the user and updates the user’s cost information.

local storageThe system’s second module main-tains the persistent state of each user. The storage used is a lightweight data-base that comprises XML files that store the registered users and their cor-responding download budget states. The parameters that the proxy saves are mainly the user ID, the user’s data plan, and the cost quota values. For performance, the threads servicing the requests update a user’s record after forwarding the webpages to the client.

evaluationWe evaluated our system using Alexa’s list of the most popular webpages. Our evaluation shows three key results.

Adaptation ladderTo study system behavior for the dif-ferent adaptations levels (0, 1, and 2), we manually tested the system across a wide range of sites. The system success-fully extracted a page’s critical compo-nents even when the quota was small.

Figure 2 shows an original and adapted (Level 2) webpage. (We re-placed ads and images in the original screenshots with labels for copyright reasons.) The Level 2 webpage is simi-lar to the original both in terms of for-mat and content, but the page size is approximately half the size.

Figure 3 illustrates how a webpage looks visually after being adapted into a

Figure 2. The (a) original page and (b) Level 2 adaptation. Visually, the images are downsampled, and the page retains most of its original formatting. (Figure courtesy of the BBC.)

(a)

(b)

Figure 3. The (a) original page and (b) Level 1 adaptation. Visually, the images are completely removed, but the page retains some of its original structure. (Figure courtesy of The New York Times.)

(a) (b)

PC-11-03-Sub.indd 39 6/4/12 3:42 PM

Page 7: Cost-Aware Mobile Web Browsing

40 PERVASIVE computing www.computer.org/pervasive

Technologies for DevelopmenT

Level 1 format. This adapted page still conveys the same information content but is more than three times smaller than the original.

Figure 4 shows the Level 0 adapta-tion of a webpage. Virtually all of the formatting information is removed; only the bare text and hyperlinks remain.

These examples show how our sys-tem can extract important information while filtering out irrelevant informa-tion at the different adaptation levels. The adaptations rendered the page in a presentable and readable style across all websites tested. Our system works especially well for news webpages, be-cause text summaries and related im-ages can be easily extracted. For sites such as Flickr (www.flickr.com), whose content is primarily images, the per-byte reduction factors after adaptation aren’t as significant. We hope to opti-mize our algorithms for multimedia-centric websites in future work.

size reductionTo measure the size reduction, we ran our system across the top sites from Al-exa for different levels of adaptation. Table 3 shows the cumulative distribu-tion of the reduction factor page size when using our system for different adap-tation levels at 50, 75, 90, and 95 percent of the cumulative distribution function (CDF). In the median case, Level 1 and 2 adaptations provide a size reduction by a factor of 3 and 2, respectively, while snip-pet adaptation can provide a reduction by a factor of 110. The reduction factor of page size increases up to 20, 60, and 350 times for adaptation levels (1, 2, and snippet) at 95 percent of the CDF.

cost savingsTo measure our system’s cost savings, we assume an average user makes N Web requests per day. For simplicity, let’s assume that the user picks one of the Alexa sites for the analysis. We run the system for two settings: a quota of 10 Mbytes for the day and of 5 Mbytes for the day.

Figure 4. The (a) original page and (b) Level 0 adaptation. Visually, the page is completely stripped of formatting, structure, and multimedia. (Figure courtesy of The Los Angeles Times.)

(a)

(b)

TABLE 3 Cumulative distribution of the size-reduction factor from the original page.

Type 50% 75% 90% 95%

Snippet 110 200 300 350

Level 1 3 7 15 20

Level 2 2 5 10 16

PC-11-03-Sub.indd 40 6/4/12 3:42 PM

Page 8: Cost-Aware Mobile Web Browsing

JULY–SEPTEMBER 2012 PERVASIVE computing 41

For a user who visits 100 webpages per day, the conventional Web brows-ing system yields roughly 42.4 Mbytes of download data. In contrast, our sys-tem significantly reduced the amount of data downloaded. For large values of N (> 100), our system slightly over-shot the specified quota, because the per-page quota was too small to ac-commodate even the lowest level of adaptation.

Figure 5 shows the distribution of webpage requests across various levels of adaptation for the two different quota constraints. For example, in Figure 5a, 54 percent of the pages forwarded to the client with a 10 Mbyte budget are Level 3 adapted versions. When the user has only 5 Mbytes left, 53 percent of the pages returned are Level 1 adapted versions.

small-scale User studyWe evaluated the merits of our proxy with a small-scale user study compris-ing seven participants with no prior ex-posure to our system. Participants were graduate students in the US. We pre-sented a hypothetical situation in which they would soon run out of monthly data. We showed them the roaming charges of the major service providers and the typical costs they’d incur for different levels of mobile usage under roaming. Each participant was allot-ted approximately 15 to 30 minutes of browsing on an iPhone rather than a cheaper device to minimize contami-nation of our results by dissatisfaction due to device limitations. Quota num-bers weren’t used in this study, and we manually set the levels of adaptation so that participants could try all levels of adaptation.

At the end of the study, in a semi-structured interview, we asked each participant to rate their overall satisfac-tion with the experience given the sce-nario (on a scale from 1 to 5). We also asked whether they’d use such a system and if they had any other feedback and comments. Table 4 summarizes our findings.

The overall response was positive. Four participants readily expressed an interest in using such a system. Among the participants who responded with only moderate satisfaction, any dis-satisfaction was primarily due to the failure of our participants to realize that when left with very small quotas, they couldn’t load all of the webpage data they wanted. As a result, one par-ticipant was surprised by the snippet versions of pages, and the other par-ticipant thought the system was too aggressive. Users didn’t observe any noticeable delay in browsing due to our intermediating proxy.

W ith the increasing move by mobile operators to-ward adopting usage-based pricing due to

capacity constraints, our framework presents a wide range of research chal-lenges. Here, we’ve presented a sim-plistic model of usage-based pricing. The problem becomes more interesting when operators adopt dynamic pricing models based on congestion or usage levels or when the operator can offer

different adapted versions of a web-page at different prices. The usage-based pricing structure can also force webpage designers to rethink how they design such pages for mobile environ-ments with highly constrained data plan requirements.

From the content provider’s perspec-tive, this framework imposes a new challenge of automatically designing content for different bandwidth envi-ronments. The increasing complexity of webpages also imposes several new challenges to the adaptation layer regarding the structure of pages, im-portance levels of different content, and

Figure 5. Distribution of 100 webpage requests, where the remaining quota was (a) 10 Mbytes and (b) 5 Mbytes.

54%

32%8%

26%

53%

(a) (b)

4%7%

3%

3%10%

Level 3

Level 2

Level 1

Level 0

Main snippet

TABLE 4 User study results.

UserOverall

satisfaction (0–5)Use the proxy

1 4 Yes

2 4 Yes

3 4 Yes

4 3.5 Yes

5 3.5 Maybe

6 3 Maybe

7 3 No

PC-11-03-Sub.indd 41 6/4/12 3:42 PM

Page 9: Cost-Aware Mobile Web Browsing

42 PERVASIVE computing www.computer.org/pervasive

Technologies for DevelopmenT

the automatic adaptation of a webpage across different templates. From a mid-dlebox perspective, building a single usable solution that understands differ-ent webpage formats and works across the entire gamut of webpages is a chal-lenging engineering problem.

REFEREnCES 1. J. Chen, “Rearchitecting Web and

Mobile Information Access for Emerg-ing Regions,” doctoral dissertation, New York University, July 2011.

2. J. Domènech et al., “A User-Focused Eval-uation of Web Prefetching Algorithms,” Computer Comm. Rev., vol. 30, no. 10, 2007; http://dx.doi.org/10.1016/j.comcom. 2007.05.003.

3. J. Charzinski, “Traffic Properties, Client Side Cachability and CDN Usage of Popular Web Sites,” Measurement, Modelling, and Evaluation of Computing Systems

and Dependability in Fault Tolerance, LNCS 5987, Springer, 2010, pp. 136–150.

4. S. Ramachandran “Make the Web Faster—Web Metrics: Size and Number of Resources,” Google Developers, 26 May 2010; http://code.google.com/speed/articles/web-metrics.html.

5. “Web Content Adaptation,” white paper, T.F. MediaLab, 2004.

6. J. Korva et al., “Online Service Adap-tation for Mobile and Fixed Terminal Devices,” Mobile Data Management, Springer, 2001, pp. 254–259.

7. E. Lee et al., “Topic-Specific Web Content Adaptation to Mobile Devices,” Proc. 2006 IEEE/WIC/ACM Int’l Conf. Web Intelligence, IEEE CS, 2006, pp. 845–848.

8. B. Zhou, Y. Xiong, and W. Liu, “Efficient Web Page Main Text Extraction Towards Online News Analysis,” Proc. IEEE Int’l Conf. E-Business Engineering (ICEBE 09), IEEE, 2009, pp. 37–41.

the AUThoRSSindhura Chava is a technology analyst at a financial firm in New York. Her interests lie in operating systems, firmware engineering, and computer archi-tecture. Chava received her MS in computer science from New York University. Contact her at [email protected].

Rachid Ennaji is a software engineer at Amazon. He’s interested in different aspects of distributed computing and systems. Ennaji received his MS in computer science from New York University. Contact him at [email protected].

Jay Chen is a visiting assistant professor at New York University, Abu Dhabi, and a researcher at the Center for Technology and Economic Development (CTED) in Abu Dhabi. His research is in the area of information and communi-cation technology for development (ICTD), and his current focus is on provid-ing affordable information access for clients with low or poor connectivity and sustainable energy. Chen received his PhD in computer science from New York University. Contact him at [email protected].

Lakshminarayanan Subramanian is an assistant professor in the Courant Institute of Mathematical Sciences at New York University. His research inter-ests include networks, distributed systems, and computing for development. He co-leads the Networks and Wide-Area Systems (NeWS) group, which investigates software solutions for distributed systems, wireline and wireless networking, operating systems, security and privacy, and other technologies and applications for the developing world. He also co-leads the Cost-effective Appropriate Technologies for Emerging Regions (CATER) Lab at NYU (which fo-cuses on developing and deploying low-cost, innovative technology solutions to some of the problems in developing regions in terms of communication, healthcare, and microfinance). Contact him at [email protected].

How to Reach Us

WritersFor detailed information on submitting articles, write for our Editorial Guide-lines ([email protected]) or ac-cess www.computer.org/ pervasive/author.htm.

Letters to the EditorSend letters to Kathy Clark-Fisher, Lead Editor IEEE Pervasive Computing 10662 Los Vaqueros Circle Los Alamitos, CA 90720 [email protected]

Please provide an email address or day-time phone number with your letter.

On the WebAccess www.computer.org/ pervasive for information about IEEE Pervasive Computing.

Subscription Change of AddressSend change-of-address requests for magazine subscriptions to [email protected]. Be sure to specify IEEE Pervasive Computing.

Membership Change of AddressSend change-of-address requests for the membership directory to [email protected].

Missing or Damaged CopiesIf you are missing an issue or you received a damaged copy, contact [email protected].

Reprints of ArticlesFor price information or to order reprints, send email to pervasive@ computer.org or fax +1 714 821 4010.

Reprint PermissionTo obtain permission to reprint an article, contact William Hagen, IEEE Copyrights and Trademarks Manager, at [email protected].

MOBILE AND UBIQUITOUS SYSTEMS

PC-11-03-Sub.indd 42 6/4/12 3:42 PM