Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
MC714: Sistemas Distribuıdos
Prof. Lucas Wanner
Instituto de Computacao, Unicamp
Aula 27: Sistemas Web Distribuıdos
Distributed Web-based systems
EssenceThe WWW is a huge client-server system with millions of servers; each server hosting thousandsof hyperlinked documents.
Documents are often represented in text (plain text, HTML, XML)Alternative types: images, audio, video, applications (PDF, PS)Documents may contain scripts, executed by client-side software
Client machine
Browser
OS
Server machine
Web server
1. Get document request (HTTP)
3. Response
2. Server fetchesdocument fromlocal file
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 2 / 13
Multi-tiered architectures
ObservationAlready very soon, Web sites were organized into three tiers.
Web server Database serverCGI process
CGI program
1. Get request
3. Start process to fetch document
5. HTML document created
HTTP request handler6. Return result
4. Database interaction
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 3 / 13
Web services
ObservationAt a certain point, people started recognizing that it is was more than just user↔ site interaction:sites could offer services to other sites⇒ standardization is then badly needed.
Service description (WSDL)
Client machine
Client application
Stub
Server application
Stub
Communication subsystem
Communication subsystem
SOAP
Service description (WSDL)Service description (WSDL)
Directory service (UDDI)
Publish serviceLook up
a service
Generate stub from WSDL description
Server machine
Generate stub from WSDL description
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 4 / 13
Clients: Web browsers
ObservationBrowsers form the Web’s most important client-side sofware. They used to be simple, butthat was long ago.
User interface
Browser engine
Rendering engine
Network comm.
HTML/XML parser
Display back end
Client-side script
interpreter
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 5 / 13
Apache Web server
Observation:More than 46% of all ∼170 million active Web sites are Apache.
Hook Hook Hook Hook
Function
... ... ...
Module Module Module
Apache coreFunctions called per hook
Link between function and hook
Request Response
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 6 / 13
Server clusters
EssenceTo improve performance and availability, WWW servers are often clustered in a way that istransparent to clients.
Frontend
Webserver
Webserver
Webserver
Webserver
Request Response
Front end handlesall incoming requestsand outgoing responses
LAN
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 7 / 13
Server clusters
ProblemThe front end may easily get overloaded, so that special measures need to be taken.
Transport-layer switching: Front end simply passes the TCP request to one of theservers, taking some performance metric into account.Content-aware distribution: Front end reads the content of the HTTP request andthen selects the best server.
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 8 / 13
Server Clusters
SwitchClient
Webserver
Webserver
Distributor
Distributor
Dis-patcher
1. Pass setup requestto a distributor
2. Dispatcher selectsserver
3. Hand offTCP connection
4. InformswitchSetup request
Other messages
5. Forwardothermessages
6. Server responses
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 9 / 13
Web proxy caching
Basic idea (cnt’d)Cooperative caching, by which you first check your neighbors on a cache miss
Webproxy
Webserver
Webproxy
WebproxyCache
Cache
Cache
Client
Client
ClientClient
Client
ClientClient
Client
Client
2. Ask neighboring proxy caches
1. Look inlocal cache
HTTP Get request
3. Forward requestto Web server
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 10 / 13
Replication in Web hosting systems
ObservationBy-and-large, Web hosting systems are adopting replication to increase performance. Muchresearch is done to improve their organization. Follows the lines of self-managing systems.
Web hosting system
Metric estimation
Analysis
+/-+/-+/-
Reference input
Initial configuration
Uncontrollable parameters (disturbance / noise)
Observed output
Measured outputAdjustment triggers
Corrections
Replica placement
Consistency enforcement
Request routing
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 11 / 13
Handling flash crowds
ObservationWe need dynamic adjustment to balance resource usage. Flash crowds introduce aserious problem.
(a) (b)
(c) (d)
2 days 2 days
6 days 2.5 days
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 12 / 13
Server replication
Content Delivery NetworkCDNs act as Web hosting services to replicate documents across the Internet providingtheir customers guarantees on high availability and performance (example: Akamai).
Origin server
Client
CDN server
CDN DNS server
Regular DNS system
Cache
1. Get base document
2. Document with refs to embedded documents
6. Get embedded documents (if not already cached)
5. Get embedded documents
7. Embedded documentsReturn IP address client-best server
DNS lookups 3
4
Source: Maarten van Steen, Distributed Systems: Principles and Paradigms 13 / 13