40
Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay [email protected]

Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay [email protected]

Embed Size (px)

Citation preview

Page 1: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

Caching Solutions to increase availability of

Web Content

Krithi Ramamritham

IIT Bombay

[email protected]

Page 2: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

2

Web Content

• Web sites have traditionally served static content

• But, dynamic content generation has come into vogue– generated on the fly by running dynamic scripts, e.g., Active

Server Pages (ASP), Java Server Pages (JSP), Servlets

– allows generation of different content for the same request

Page 3: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

3

Web PageAd Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navig

ati

on C

om

ponent

A News content site

Dynamic Web Pages…

Page 4: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

IIT Bombay’s aAQUA Community Forum

Farmers get information and

get their questions answered

-- In the local context

-- In their local language

www.aAQUA.org

Capitalizes on existing human and infrastructural resources:

Agri-extension center – KVK, Baramati

NGO – Vigyan Ashram, Pabal

Corporate infrastructure -- ITC e-chaupal

Government – MCIT

Page 5: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

5

Typical End-to-end Web Site Architecture

Users

ApplicationServerCluster

Data

WebServerCluster

.

.

.

.

Page 6: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

6

WS vs. AS

• Web servers– Do well defined and quantifiable local work

• e.g., processing HTTP headers, serving static content

• Application servers– Run multi-layer programs

• e.g., scripts involving calls to backends

Page 7: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

7

WEB BROWSER CLIENT

JAVACLIENT

WEBSERVER

with plug-in

zzzzz

APPLICATION SERVER

CORE FUNCTIONALITY

Presentation Logic

Business Logic

ASPJSP

Servlets

COM+CORBA

EJB

Connectors

ADOJDBCODBC

LEGACYAPPLICATIONS

DIRECTORYSERVICES

(LDAP)RMI/IIOP

HTTP(Internet)

DATABASE

VALUE-ADDED SERVICES

CommerceContent Management

Personalization

DynamicContent

Accelerator

Servlets

Application Layer Details

Page 8: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

8

The Problem: Page Generation Delays

• Causes of page generation delays include (in addition to pure processing overhead):

– Remote database accesses: Heavy I/O loads, Network delays – XML-HTML transformations: Extensive processing delays– Personalization logic: e.g., Broadvision, Vignette, etc.– Interaction bottlenecks: e.g., database connection pools

=> serious performance and scalability problems for web sites due to increased load on server-side infrastructure

Page 9: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

10

Reducing delays

• Approaches fall into 3 broad categories:

– Database caching

– Page level caching

– Fragment level caching

Page 10: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

11

Alternative: CDNs

Sources

Repositories

Clients

ContentDistributionNetworks

Page 11: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

12

Push Based Core Infrastructure

• Resilient and efficient content distribution network

(CDN) for dynamic data.

• Existing CDNs : Akamai, Dynamai

Sources

CooperatingRepositories

Clients

Page 12: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

13

Generic Architecture

Data sourcesEnd-hosts

servers

sensors

wired hosts

mobile hosts

Netw

ork

Netw

ork

Page 13: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

14

Generic Architecture

Data sources

Proxies/caches

End-hosts

servers

sensors

wired host

mobile host

Netw

ork

Netw

ork

Page 14: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

15

The Push Approach

• Proxy registers the data item of interest and the coherency requirement with the server

• Server pushes interesting changes

+ Achieves Strong Consistency + Keeps network overhead minimum-- Poor Scalability (has to maintain state

and has to keep connections open)-- Low Resiliency

Server Proxy UserPush Push

Page 15: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

16

The Pull Approach

Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR)

+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with respect to state

space and computation– Weak cache consistency – Heavy polling for stringent coherence requirement or highly

dynamic data– Network overheads higher than for Push

Server Proxy UserPull Push

Page 16: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

17

Database Caching

Two broad types:

• Query result caching

• Middle tier database caching– caching database tables in main memory

Page 17: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

18

Query result caching

• Many application server products offer this feature

• [Luo et. al., 2000] proposed query result caching at Web proxy caches

-- mitigates only local database access latency-- only a subset of query results may be reused in

page generation-- page fragments may not all be from databases

Page 18: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

19

Middle tier database caching

• Caching database tables in main memory

Oracle 9i Cache

Main-memory databases, e.g., TimesTen

-- mitigates only database access latency

-- caching at table granularity results in poor cache utilization

-- main-memory databases are difficult to integrate and maintain and can be expensive

Page 19: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

20

Page Level Caching

• Dynamically generated HTML pages are cached [Iyengar & Challenger, 1997; Zhu & Yang, 2000]

• Several commercially available products follow this approach, e.g., SpiderCache, Xcache, Dynamai

+ Can completely offload work from web/app server– Low reusability for highly personalized web pages– URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages– Often introduces excessive invalidations -- e.g., even if a single element on the page changes

Page 20: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

21

Reducing page generation delays

• Approaches fall into 3 broad categories:

– Database caching

– Page level caching

– Fragment level caching

Page 21: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

22

Page generation script

.

.

.

Codeblock

Write to Out

Codeblock

Write to OutHTML Buffer

HTML sent to user

How Dynamic Scripting Works

Page 22: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

23

Page generation script

.

.

.

Codeblock

Write to Out

Codeblock

Write to Out

Applicationlogic

Databasecalls

HTMLformatting.

.

.

Code Blocks Perform Work

Page 23: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

24

Page generation script

.

.

.

Codeblock

Write to Out

Codeblock

Write to Out

Web Page

Ad Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navig

ati

on C

om

ponent

(Example: News content site)Certain components can be cached

Code Blocks <-> Components

Page 24: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

25

DCA: Our Solution

Codeblock

Applicationlogic

Databasecalls

HTMLformatting

Page generation script

Codeblock

.

.

.

Request

Code Block Output

End tag

Start tag

Work

byp

ass

ed

DynamicContent

Accelerator

Page 25: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

26

DCA in a Typical End-to-end Web Site Architecture

• A single instance of the DCA serves a rack of application servers

• Application servers communicate with DCA through a lightweight API

Users DynamicContent

Accelerator

ApplicationServerCluster

Data

WebServerCluster

Page 26: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

27

Cache Management

• A critical aspect of any caching solution

• DCA supports novel cache management strategies:

– Prediction-based cache replacement

– Observation-based cache invalidation

Page 27: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

28

Cache Replacement

• Prediction-based replacement⁻ fragments having lowest

probability of access replaced⁻ Least-Likely-to-be-Used (LLU)

– Access probabilities based on:• Current user navigational

patterns over site graph (in the form of clickstreams)• Historical user navigational

patterns over site graph (in the form of association rules)

News

Sports

Hockey

Schedules ScoresPlayers Teams

Site Graph

(News, Sports, Hockey) Schedules = 20%

(News, Sports, Hockey) Players = 15%

(News, Sports, Hockey) Teams = 10%

(News, Sports, Hockey) Scores = 55%

LLU

Page 28: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

29

Cache Invalidation

• DCA supports common cache invalidation techniques:

– Time-based: Each cache element assigned a TTL

– Event-based: Updates to the database send an invalidation message to the cache

– On demand: Manual invalidation of selected elements

• DCA supports additional invalidation techniques….

Page 29: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

30

Cache Invalidation…

• Other invalidation techniques supported:– Observation-based

• User-initiated updates are observed in scripts; each such update sends an invalidation message to the cache

• Most appropriate for auction sites, online trading sites• Invalidation does not require communication with the

databases

– Keyword-based: • Elements can be associated with keywords; e.g., a

retailer may wish to invalidate all “seasonal” items

– Regular expression-based: • Elements can be invalidated based on regular

expression matching

Page 30: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

31

Other Fragment Level Caching…

+ can offload presentation layer tasks

– runs in the application server process space

=> competes for server resources

– application server cluster

=> multiple cache instances,

duplication of content,

additional synchronization overhead

app servers (e.g., BEA’s WebLogic, IBM’s WebSphere) cache fragments produced by JSP scripts

ApplicationServerCluster

Page 31: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

32

Other Fragment Level Caching….

• Weave system [VLDB 2000] caches XML fragments, as well as query results and HTML pages– Requires use of declarative web site

specification language

Page 32: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

33

Performance Study

Metric:– Average page generation time

time required to construct HTML page

Page 33: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

34

Performance Study…

Test Site

– Fictitious online retail site, allows browsing of product catalog

– Pages generated using JSP scripts– Site content stored in Oracle database– Database schema based on Dublin Core Metadata Open

Standard– Contains 200,000 products and 44,000 categories– Each page consists of 3 components, each involving a

database call

Page 34: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

35

Performance Study…

Test Setup

– Content Database Server: Oracle 8.1.6

– Web/Application Server: WebLogic 6.0 running on cluster of 2 machines

– Server machines:have 1 GB RAM, dual P III-933 Mhz processorsrun Windows 2K Advanced Server

Page 35: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

36

Testing Methodology

• DCA compared to 2 middle tier caching solutions:

Main Memory Database: TimesTen used to cache the content database (entire database is cached, runs on database server machine)

Application Server Cache: WebLogic Server JSP caching (WLS Cache)

• For both WLS and DCA, 2 (of 3) page components are cached

• Usually, DCA runs on a separate machine (512 MB RAM, P III-600Mhz processor, running Windows 2K Advanced Server)

Page 36: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

37

Testing Methodology...

• Baseline Parameters:

– Cache Size, i.e., percentage of fragments that fit into cache: 75%

– Cache replacement policy: LLU for DCA

• User load is varied by sending requests from client machines running Radview’s WebLoad

• Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links)

Page 37: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

38

Page Gen. Times vs. Number of Users

TimesTen vs. DCA -- 3x to 9x improvement

TimesTen only mitigates local database access latency -- still requires query processing, formatting operations

0

500

1000

1500

2000

2500

3000

0 200 400 600 800 1000

Load (Number of Users)

Pa

ge

Ge

ne

rati

on

Tim

e (

mill

ise

co

nd

s)

No Cache

TimesTen

WLS Cache

DCA

Page 38: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

39

Page Generation Times...

WLS vs. DCA -- 2x to 5x improvement

WLS runs in application server process space, competes for server resourcesWLS utilizes multiple caches, causing redundant cachingDCA runs as single, standalone logical cache

0

500

1000

1500

2000

2500

3000

0 200 400 600 800 1000

Load (Number of Users)

Pa

ge

Ge

ne

rati

on

Tim

e (

mill

ise

co

nd

s)

No Cache

TimesTen

WLS Cache

DCA

Page 39: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

40

Sensitivity to Cache Size

0

50

100

150

200

250

0 200 400 600 800 1000

Load (Number of Users)

Pag

e G

ener

atio

n T

ime

(mill

isec

on

ds)

Cache Size 75%

Cache Size 60%

Cache Size 90%

As expected, performance improves as cache size increases

Since cached elements are typically quite small (e.g., a few hundred bytes), larger cache sizes are feasible in practice

Page 40: Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay krithi@cse.iitb.ernet.in

41

Conclusion

• Increased use of dynamic page generation technologies

=> increases load on application servers

=> serious performance and scalability problems

for e-business sites

• DCA (Dynamic Content Acceleration)

=> significantly reduces the load on the server side infrastructure, allows e-business sites to scale

=> significantly outperforms existing middle tier caching solutions