39
How to live with low/intermittent bandwidth/connectivity Krithi Ramamritham IIT Bombay [email protected]

How to live with low/intermittent bandwidth/connectivity

  • Upload
    cardea

  • View
    49

  • Download
    2

Embed Size (px)

DESCRIPTION

How to live with low/intermittent bandwidth/connectivity. Krithi Ramamritham IIT Bombay [email protected]. Web sites have traditionally served static content But, dynamic content generation has come into vogue - PowerPoint PPT Presentation

Citation preview

Page 1: How to live with low/intermittent bandwidth/connectivity

How to live with low/intermittent bandwidth/connectivity

Krithi RamamrithamIIT Bombay

[email protected]

Page 2: How to live with low/intermittent bandwidth/connectivity

2

Web Content• Web sites have traditionally served static

content

• But, dynamic content generation has come into vogue– generated on the fly by running dynamic scripts, e.g., Active

Server Pages (ASP), Java Server Pages (JSP), Servlets– allows generation of different content for the same request

Page 3: How to live with low/intermittent bandwidth/connectivity

3

Web PageAd Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

A News content site

Dynamic Web Pages…

Page 4: How to live with low/intermittent bandwidth/connectivity

4

Generic Architecture

Data sourcesEnd-hosts

servers

sensors

wired hosts

mobile hosts

Net

wor

k

Net

wor

k

Page 5: How to live with low/intermittent bandwidth/connectivity

5

Coherency of Dynamic Data

• Strong coherency– The client and source always in sync with each other– Strong coherency is expensive!

• Relax strong coherency: - coherency– Time domain: t - coherency

• The client is never out of sync with the source by more than t time units

• eg: Traffic data not stale by more than a minute– Value domain: v - coherency

• The difference in the data values at the client and the source bounded by v at all times

• eg: Only interested in temperature changes larger than 1 degree

Page 6: How to live with low/intermittent bandwidth/connectivity

6

Generic Architecture

Data sources

Proxies/caches

End-hosts

servers

sensors

wired host

mobile host

Net

wor

k

Net

wor

k

Page 7: How to live with low/intermittent bandwidth/connectivity

7

The Push Approach

• Proxy registers the data item of interest and the coherency requirement with the server

• Server pushes interesting changes

+ Achieves Strong Consistency + Keeps network overhead minimum-- Poor Scalability (has to maintain state

and has to keep connections open)-- Low Resiliency

Server Proxy UserPush Push

Page 8: How to live with low/intermittent bandwidth/connectivity

8

The Pull Approach

Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR)

+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with respect to state

space and computation– Weak cache consistency – Heavy polling for stringent coherence requirement or highly

dynamic data– Network overheads higher than for Push

Server Proxy UserPull Push

Page 9: How to live with low/intermittent bandwidth/connectivity

9

Typical End-to-end Web Site Architecture

Users

ApplicationServerCluster

Data

WebServerCluster

.

.

.

.

Page 10: How to live with low/intermittent bandwidth/connectivity

10

WS vs. AS

• Web servers– Do well defined and quantifiable local work

• e.g., processing HTTP headers, serving static content • Application servers

– Run multi-layer programs• e.g., scripts involving calls to backends

… …

WebSwitch

WebServerCluster

ApplicationServerCluster

… …

WebSwitch

WebServerCluster

ApplicationServerCluster

Page 11: How to live with low/intermittent bandwidth/connectivity

11

Inside the Application Layer3-tier model

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

HTML

Objects

Row Set

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

Page 12: How to live with low/intermittent bandwidth/connectivity

12

Inside the Application Layer…

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

1. JSP invokes a Servlet2. Servlet contacts CMS

3. CMS requests data

4. DBMS calls storage system

Page 13: How to live with low/intermittent bandwidth/connectivity

13

Performance and Scalability Issues• Computationally-intensive logic executed at

multiple tiers

• Cross-tier communication

• Object instantiation and cleanup processing

• External I/O calls

• Database connection pool latencies

• Content conversion and formatting

Page 14: How to live with low/intermittent bandwidth/connectivity

14

Optimizing the Application LayerTraditional Means

• Optimize each tier independently:– Presentation-level caches built inside application server

processes– Main memory database employed over persistent DBMS– Persistent object storage techniques employed inside

content management systems … and so on

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

ADDT’LSERVICES

Local cacheand optimization

code

Page 15: How to live with low/intermittent bandwidth/connectivity

15

Query result caching

• Many application server products offer this feature

-- mitigates only local database access latency-- only a subset of query results may be reused

in page generation-- page fragments may not all be from

databases

Page 16: How to live with low/intermittent bandwidth/connectivity

16

Middle tier database caching

• Caching database tables in main memoryOracle 9i CacheMain-memory databases, e.g., TimesTen

-- mitigates only database access latency-- caching at table granularity results in poor

cache utilization-- main-memory databases are difficult to

integrate and maintain and can be expensive

Page 17: How to live with low/intermittent bandwidth/connectivity

17

Page Level Caching

• Dynamically generated HTML pages are cached

+ Can completely offload work from web/app server– Low reusability for highly personalized web pages– URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages– Often introduces excessive invalidations -- e.g., even if a single element on the page changes

Page 18: How to live with low/intermittent bandwidth/connectivity

18

Optimizing the Application LayerIssues

• Traditional techniques impact specific components within the application, but not the entire application

– No mitigation of component-to-component interaction latencies

– Different synchronization and invalidation policies risk data integrity

– Each optimization scheme consumes programmer timefor development and maintenance

Page 19: How to live with low/intermittent bandwidth/connectivity

19

Key ideas

• Re-use program results to eliminate redundant work • Facilitate single-point, architecture-wide optimization

Apply to both programmatic objects and result fragments

Page 20: How to live with low/intermittent bandwidth/connectivity

20

Optimizing the Application Layer

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

• Servlets• COM+• EJB

• JSP• ASP

LegacySystems

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

cache

Enables the resultsof programs to bere-used.

Page 21: How to live with low/intermittent bandwidth/connectivity

21

Usually….

LegacySystems

1. JSP invokes a Servlet

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

Databases

ADDT’LSERVICES

• Commerce• Content Mgt.• Personalization

2. Servlet contacts CMS

3. CMS requests data

4. DBMS calls storage system

Plus, at each step there are communication delays and logic processing delays

Page 22: How to live with low/intermittent bandwidth/connectivity

22

Novel Solution…

PRESENTATION

BUSINESS LOGIC

DATA CONNECTOR

• JDBC• ODBC

...Code

Block(s)

...Code

Block(s)

Function Parameter(s) Result

Real-time storage engine

Tags trigger calls to the storage engine.

Can store any program output, but is most commonly an HTML fragment or a Programmatic Object.Chutney

tags

When the Result of a Function with a specific Parameter set is already known (and up-to-date), the work normally necessary to produce that Result is bypassed.

Appl. Programming Interface

Page 23: How to live with low/intermittent bandwidth/connectivity

23

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Applicationlogic

Databasecalls

HTMLformatting...

Code Blocks Perform Work

Page 24: How to live with low/intermittent bandwidth/connectivity

24

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Web Page

Ad Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

(Example: News content site)Certain components can be cached

Code Blocks <-> Components

Page 25: How to live with low/intermittent bandwidth/connectivity

25

DCA: Our Solution

Codeblock

Applicationlogic

Databasecalls

HTMLformatting

Page generation scriptCodeblock

...

Request

Code Block Output

End tag

Start tag

Wor

kby

pass

ed

DynamicContent

Accelerator

Page 26: How to live with low/intermittent bandwidth/connectivity

26

DCA in a Typical End-to-end Web Site Architecture

• A single instance of the DCA serves a rack of application servers

• Application servers communicate with DCA through a lightweight API

Users DynamicContent

Accelerator

ApplicationServerCluster

DataWeb

ServerCluster

Page 27: How to live with low/intermittent bandwidth/connectivity

27

Cache Management

• A critical aspect of any caching solution

• DCA supports novel cache management strategies:

– Prediction-based cache replacement– Observation-based cache invalidation

Page 28: How to live with low/intermittent bandwidth/connectivity

28

Cache Replacement• Prediction-based

replacement⁻ fragments having lowest

probability of access replaced⁻ Least-Likely-to-be-Used (LLU)

– Access probabilities based on:• Current user navigational

patterns over site graph (in the form of clickstreams)• Historical user navigational

patterns over site graph (in the form of association rules)

News

Sports

Hockey

Schedules ScoresPlayers Teams

Site Graph

(News, Sports, Hockey) Schedules = 20%(News, Sports, Hockey) Players = 15%(News, Sports, Hockey) Teams = 10%(News, Sports, Hockey) Scores = 55%

LLU

Page 29: How to live with low/intermittent bandwidth/connectivity

29

Cache Invalidation

• DCA supports common cache invalidation techniques:

– Time-based: Each cache element assigned a TTL– Event-based: Updates to the database send an invalidation

message to the cache– On demand: Manual invalidation of selected elements

• DCA supports additional invalidation techniques….

Page 30: How to live with low/intermittent bandwidth/connectivity

30

Cache Invalidation…• Other invalidation techniques supported:

– Observation-based• User-initiated updates are observed in scripts; each

such update sends an invalidation message to the cache

• Most appropriate for auction sites, online trading sites• Invalidation does not require communication with the

databases– Keyword-based:

• Elements can be associated with keywords; e.g., a retailer may wish to invalidate all “seasonal” items

– Regular expression-based: • Elements can be invalidated based on regular

expression matching

Page 31: How to live with low/intermittent bandwidth/connectivity

31

Performance Study…

Test Site

– Fictitious online retail site, allows browsing of product catalog

– Pages generated using JSP scripts– Site content stored in Oracle database– Database schema based on Dublin Core Metadata Open

Standard– Contains 200,000 products and 44,000 categories– Each page consists of 3 components, each involving a

database call

Page 32: How to live with low/intermittent bandwidth/connectivity

32

Performance Study…

Test Setup

– Content Database Server: Oracle 8.1.6

– Web/Application Server: WebLogic 6.0 running on cluster of 2 machines

– Server machines:have 1 GB RAM, dual P III-933 Mhz processorsrun Windows 2K Advanced Server

Page 33: How to live with low/intermittent bandwidth/connectivity

33

Testing Methodology...

• Baseline Parameters:– Cache Size, i.e., percentage of fragments that fit into cache: 75%– Cache replacement policy: LLU

• User load is varied by sending requests from client machines running Radview’s WebLoad

• Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links)

Page 34: How to live with low/intermittent bandwidth/connectivity

34

Performance Impact80% faster response times through existing application infrastructure

Source: Fortune 100 client results

0

10

20

30

40

50

60

0 100 200 300 400 500

Number of Users

Aver

age

Resp

onse

Tim

e (s

econ

ds)

non-Chutney

Chutney

Page 35: How to live with low/intermittent bandwidth/connectivity

35

Chutney Throughput Impact250% increase in transaction rates

Source: Fortune 100 client results

0

100

200

300

400

500

600

700

0 100 200 300 400 500

Number of Users

Tran

sact

ions

Per

Sec

ond

non-Chutney

Chutney

Page 36: How to live with low/intermittent bandwidth/connectivity

36

Alternative: CDNs

Sources

Repositories

Clients

ContentDistributionNetworks

e.g., Akamai

Push BasedPush BasedCore InfrastructureCore Infrastructure

Page 37: How to live with low/intermittent bandwidth/connectivity

37

Conclusion• Increased use of dynamic page generation technologies => increases load on application servers => serious performance and scalability problems for e-business sites • DCA (Dynamic Content Acceleration) => significantly reduces the load on the server side

infrastructure, allows e-business sites to scale => significantly outperforms existing middle tier caching

solutions

Page 38: How to live with low/intermittent bandwidth/connectivity

IIT Bombay’s aAQUA Community Forum

Farmers get information and

get their questions answered

-- In the local context

-- In their local language

www.aAQUA.org

Capitalizes on existing human and infrastructural resources:

Agri-extension center – KVK, Baramati

NGO – Vigyan Ashram, Pabal

Government – MCIT

Page 39: How to live with low/intermittent bandwidth/connectivity

39

Access over low bandwidth:Resource Optimization

Resource constraintsLow/unpredictable bandwidth => disconnected operation/access

Exploitcaching prefetching (through prediction of future needs)Profiling by user type, location =>offline aAQUA

Data characteristicsStatic data – text, images – land records, photos

can be cached/hoardedDynamic data – weather/price information

cached info need to be refreshed carefullyContinuous media – VoIP, video data

QoS considerations