Upload
horatio-wilson
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Scalable Web Server on Heterogeneous Cluster
CHEN Ge
Current Approaches
Why use Global Object Space– Current Web server side cache
approach generally based on single node design:• Limited physical memory• Usually only caching URL mapping tables• File content caching largely relies on the
OS’ file system caching• Web cluster support emphasis on load
distributing algorithms, but not on cluster wide file content caching
Current Approaches
– Problems comes with current Web server caching approaches• Limited physical memory as file content caching
will result in either trashing of cached contents, or complex cache management algorithms with slight performance improvement and great computation or storage overhead
• Single node design is not scalable when apply to cluster environment. Some current cluster wide file content support (such as Rice’s LARD policy) is not scalable
• Relies on OS’ file system’s file content cashing is not efficient, and lack support of cluster wide file content caching
Current Approaches
– Current Web server cluster approaches• Usually emphasis on load distribution, but
rarely address on the cluster wide caching problem• Usually adopts L4/L5 switches to
distribute load among the cluster nodes• Requires homogeneous hardware and
software cluster environment
Current Approaches
– Problems comes with current Web server cluster approaches• Emphasis load distribution based on
separated cluster node policy and L4/L5 switch can not handle ‘hot’ object well• Almost all cluster support requires
homogeneous cluster environment, which will not able to utilize resources for different hardware and software platforms
Global Object Space
Global Object Space has two main aims:– Utilize the giant total physical memory
a cluster system can provide to cache object content
– Using the global object to provide uniform access to resources of various platforms, which is achieved by using Java
Global Object Space
Current Web
server
•Limited physical memory for file content caching
•Complex cache management
•‘Hot’ object problem
•Requires L4/L5 Switch
•Not scalable
•Requires homogeneous cluster
GOS
Java
•Better load balance
•Good response time for hot objects
•Large Throughput
•Good Scalability
•Heterogeneous cluster support
•Uniform access to resources of different platforms
Global Object Space
Global Object Space – Physical Relationship of Components
Physical Memory of a node
Inter-node high-speed network
Cached object(file) content
Global Object Space
Jigsaw’s Request Handle Object
Global Object Space – Logical Relationship of Components
Global Object Space Service Interface Protocol (GOSSIP)
Global Object Space
Marco-view of Request Handle?– A node get a html document request:
http://www.dotcom.com/doc/year2k/index.html
The Request Handle Object will call GOS for the requested document
– GOS will use GOSSIP to make up the Reply Object which will be returned to the Request Handle Object
– Request Handle Object will reply the client with the returned Reply Object
Global Object Space
How to GOSSIP ?– There are two table on each node:
• Global Object Space (GOS) Table– Hold entries for each object in the system
• Hot Object Cache (HOC) Table– Hold entries for locally duplicated objects which are
hot, which means those are accessed very frequently.
– When an incoming object request received from Request Handle Object, the URL is looked up in the HOC table, if it is in the table, that means it is cached in local physical memory, it forms the Reply Object based on the cached entry.
Global Object Space
How to GOSSIP– If the requested URL has no entry in the HOC
Table, it will be parsed one item by one, until it reaches an item in the GOS tableFor example:http://www.dotcom.com/doc/year2k/index.htmlIf this document is cached in another node, the entry will have something likeGOSEntry.key=“http://www.dotcom.com/doc”GOSEntry.nodeaddr=10.8.102.2Then the GOS Object will create a connection with the remote GOS Service Object, send the remaining URL to it, here it is “/year2k/index.html”
Global Object Space
How to GOSSIP– The remote GOS service object will try
to fetch this cached object and send back the object content, or read it from disk first if there’s a cache miss
Global Object Space
How to GOSSIP --- For normal Objects
Client
Node 1
Request:
http://www.dotcom.com/doc/year2k/index.html
Node 2
/year2k/index.html
Real object content
Global Object Space
How to GOSSIP --- For hot node
Node 1Request:
http://www.dotcom.com/doc/year2k/index.html
Node 2
Real object content
Client
Real object content
HOC Table Hit
HOC Table Hit
http redirect
Global Object Space
How to GOSSIP --- For hot Objects on hot node
Node 1
Request:
http://www.dotcom.com/doc/year2k/index.html
Node 2
Real object content
Client
HOC Table Hit
Global Object Space
How to GOSSIP– The GOS Service Object will maintain
a field in the local object mapping table which contains the access frequency of local objects, when it finds an object becomes a hot object, it will make the hot flag in the replied object on so that the remote GOS object will add an entry to the HOC table and cache the object content in its local memory
Global Object Space
How to GOSSIP–When a GOS service Object find a
previous hot object no-longer hot, it will broadcast to all the nodes in the system, so that other nodes will remove it from the HOC table and local memory cache
Global Object Space
Further thoughts about GOSSIP– Cache distribution• Actually, with such a mechanism, it is not
necessary to cache the file content in the node where the file really exists.
Requested FileCached File Content
Global Object Space
How GOS Achieves the Goals– Better Load Balance
• As GOS will distribute load according to the requested object, not only some simple or some complex round-robin load balancers like L4/L5 switch, it will direct request to the node where the object cached. “Hot” objects’ copies will be duplicated among the cluster nodes, so that load will be more evenly distributed even when there are intensive request for certain few objects in the system
• As GOS can put the cache of object in the global objects in other nodes’ physical memory, it can solve the problem of a hot server, whose files is much more frequently accessed
Global Object Space
– Good Response time for hot objects• As GOS will duplicate hot objects’ copies
among the nodes, requests for hot objects will be redirected to different nodes to serve the requests, then the response time for hot objects is shortened as the server will not so busy as when all the request for the same object should be processed by one node
Global Object Space
– Large Throughput• With GOS, the requests are distributed to
the individual nodes, and each node can setup connections with clients directly to server the clients with all the objects in the system, so the throughput of whole web server is increased comparing the use of L4/L5 switches which potentially will become bottlenecks. More, the better load balance under hot objects conditions makes its throughput larger than current L4/L5 switches
Global Object Space
– Good scalability• GOS does not require each node hold the
whole URL mapping table in the system. In some current LARD system, each node has a whole mapping table of all the files in the system, which will become extreme large when the system scales.• GOS do not relies on a single L4/L5
switch to redistribute requests. This will eliminate the potential bottle neck when the system scale to large number of node
Global Object Space
– Run on heterogeneous clusters• Written in Java, the web server can run on
nodes of different platforms• With the support of GOS, requests
received by any nodes will be able to access resources of all the nodes in the system, this provides uniform access to different platforms in the system
Global Object Space
Problems need to solve for building GOS– How to efficiently redirect http request
• As we will not use L4/L5 switches, traditional method of distribute request among nodes will not be suitable for GOS
• Using Java as the implementation language, we can not do much on the lower levels of network communication
– GOS will introduce extra overheads when fetching object contents from other nodes’ physical memory, an efficient implementation needed
Global Object Space
Further thoughts about GOSSIP– Dynamic content caching is a rather
difficult task according to current available references. We can consider a scheme of load balancing by distribute running of the same dynamic generating process to several node. The script file itself can be cache as normal file content