View
222
Download
1
Category
Tags:
Preview:
Citation preview
p2pWeb
Slide1Peer-To-Peer : Concept, Tools and Applications
The p2pweb Project
Low cost Peer to Peer solutions for high availability web hosting
19 Mai 2005
Séminaire
« Peer-To-Peer : Concept, Tools and Applications »
Ecole d’ingénieurs de Genève
p2pWeb
Slide2Peer-To-Peer : Concept, Tools and Applications
Agenda
1. The Project goals
2. Web hosting solutions and architecture
3. The p2pweb solution
4. Project constraints and key technologies5. Related projects
6. The project components– Global server load balancing system– Distributed set of web server– Monitoring system– Node architecture and hardware
7. Conclusion
p2pWeb
Slide3Peer-To-Peer : Concept, Tools and Applications
To explore and implement low cost solutions for high availability web hosting
“Do More with Less”
Our targets are :• small or medium structures (associations, NGO, etc …)• with limited resources (money, IT people)• with important web hosting needs (bandwidth available)
– rich and complex web site– medium to high web traffic– high availability and visibility needs
It may fit very well the needs of many project in Least Developed Countries : TeleCentres Networks, Rural Organisations, Universities, Cultural Centres, Public
Libraries, Community Multimedia Centres, Health Networks, etc ...
The Project goals
p2pWeb
Slide4Peer-To-Peer : Concept, Tools and Applications
Afromix.org (personal web site)A portal of African and Caribbean Cultures since 1993
A complex web site using multiple technologies
• in house Perl Content Management System (CMS)
• an extended discographic database (1600 artist, more than 50 styles from all Africa and French West Indies)
• multilingual (French, English, Spanish) site running on a JAVA application server (Tomcat)
• about 25 000 files, 400 000 pages/month, 2 million hits/month, 60 000 unique visitors/month
Mediaport.net (community web site)One of the first French web pioneer, first developed in INA
• mostly static content (near 10 000 files)
• multilingual (French, English) site running on a PHP CMS (ezpublish)
• it’s the main p2pweb test platform and it will evolve to an open web hosting solution for artistic and cultural web projects (an editorial committee is forming)
Example of hosted web site
p2pWeb
Slide5Peer-To-Peer : Concept, Tools and Applications
The web hosting market
• Free web hosting– Very limited
• static html or small PHP site (limited computing resources)
• can’t use your own domain name
• Professional web hosting– A broad range of services
• private virtual server
• dedicated server
• Co/location
– But price is quite high• 100-200€/month for one dedicated server
• and maintenance can be complex
p2pWeb
Slide6Peer-To-Peer : Concept, Tools and Applications
Centralized architecture
Server in one location :
Server and Internet link are single point of failure (SPOF)
p2pWeb
Slide7Peer-To-Peer : Concept, Tools and Applications
Centralized architecture (cont.)
Database cluster
SAN Storage
Application Servers
Load Balancers
Web servers
Reverse Proxy / Cache / SSL accelerators
Load Balancers
Multi-homing with BGP routing
High availability architecture
Datacenter hosting
- BGP routing
- hardware load balancing
- SAN storage
In theory, no SPOF
•but very complex architecture
•very high cost
p2pWeb
Slide8Peer-To-Peer : Concept, Tools and Applications
CDN Architecture
Content Delivery Network
Service delivered by companies like Akamai, Speedera, and others.
Edge servers provide caching and data replication for fast delivery to clients worldwide.
A solution for very high traffic web site.
Very expensive solution.
p2pWeb
Slide9Peer-To-Peer : Concept, Tools and Applications
alternative web hosting
• Community based web hosting– Initiatives from various associations
ouvaton.coop, globenet.net, autre.net, altern.net, ...
– Most of the time, people share their money and knowledge to buy and administer one or two dedicated server.
• Home server– We now have sufficient bandwidth (ADSL) computing power
(PCs), good software (apache, linux …)
– We lack reliability !
p2pWeb
Slide10Peer-To-Peer : Concept, Tools and Applications
First idea : big home server
p2pWeb
Slide11Peer-To-Peer : Concept, Tools and Applications
Second idea (better one)
Lots of people (family, friends, co-workers, …) already have :
• An ADSL Internet access or Permanent High Speed Connection
• One or more PCs (with a lot of unused disk space)
So, what about sharing those resources to build a more powerful and resilient network of web servers
p2pWeb
Slide12Peer-To-Peer : Concept, Tools and Applications
Web Hosting : the p2pweb way
ADSL ISP 1
ADSL ISP 2
ADSL ISP 3
Each member of the p2pweb network share a portion of his Internet bandwidth (most of the time an ADSL line) and host a small server.
The result is a powerful network that is the sum of the bandwidth and computing resources of all the members.
p2pWeb
Slide13Peer-To-Peer : Concept, Tools and Applications
A peer to peer solution
• Somehow, it’s a return to the very fundamentals principles of Internet:– a cooperative solution (network of servers)– a distributed solution (no central control)– a fault tolerant solution (resilience)
• But with all the power of existing internet and open source technologies– consumer computers and internet access– overlay network and services over the Internet– It is a peer to peer solution !
p2pWeb
Slide14Peer-To-Peer : Concept, Tools and Applications
The project constraints
• Unreliable component– Node failure is not an exception, it’s the rule.– Internet link failure, power outage, server crash …
• Automatic function– Murphy’s law : servers will always crash when there
is nobody to fix the problem (at night, when you are on vacation …)
• Pragmatic approach – Build from existing component – Simple and efficient solutions are priority choices
p2pWeb
Slide15Peer-To-Peer : Concept, Tools and Applications
Key technologies
Mass market products are available at low cost now !
• ADSL lines– 1 Mb/s Up - 15Mb/s Down for 30€ / month (free.fr)
• ADSL router / firewall / ethernet or wifi– D-LINK, NetGear, LINKSYS from 75 to 150 €
• Small Servers– PC barebones (Asus, Biostar, Shuttle …)
• from 300 to 500 € – mini iMac (Apple)
• 499 €• Open Source Software
– BSD, Linux, apache, tomcat, etc …
p2pWeb
Slide16Peer-To-Peer : Concept, Tools and Applications
Related projects
YouServ (IBM) http://www.almaden.ibm.com/cs/people/bayardo/userv/
• YouServ is software that forms a webserving "grid" by allowing its users to pool their desktop computing resources to create one large, virtual web-space.
• An intranet project, more oriented on desktop file sharing.
• Unfortunately not open source
Vergenet (Simon Horman) http://www.vergenet.net/
• Vergenet has servers located in Sydney, Amsterdam, London, Tokyo and Indiana. These servers are all running Linux and a variant of Super Sparrow to load balance traffic between them.
• Super Sparrow enables users to load balance traffic between geographically separated points of presence by finding the site network-wise closest to clients. This is done by accessing BGP routing information (but it require direct access to a BGP router)
p2pWeb
Slide17Peer-To-Peer : Concept, Tools and Applications
Related projects (cont.)
Coral (New York University) http://www.coralcdn.org/
• Coral is peer-to-peer content distribution network, comprised of a world-wide network of web proxies and name servers
• Publishing through Coral is as simple as appending a short string to the hostname of objects' URLs; a peer-to-peer DNS layer transparently redirects browsers to participating caching proxies
• an URL like www.myserver.com/some/path.html becomes www.myserver.com.nyud.net:8090/some/path.html
• Coral is in fact running on top of the planet-lab network (a grid computing research network : http://www.planet-lab.org/)
Globule (Vrije University Amsterdam) http://www.globule.org/
• Globule is a module for the Apache Web server that allows a given server to replicate its documents to other Globule servers. Clients are automatically redirected to one of the available replicas.
• The project provide both content replication and HTTP or DNS based redirection mechanisms
p2pWeb
Slide18Peer-To-Peer : Concept, Tools and Applications
P2PWeb - Project Components
• A global server load balancing system– Two main functions
• Load balance the traffic on the web servers• Provide failover = only send traffic on alive web servers
• A distributed set of web server– And a set of tools to :
• Publish content on the servers• Keep all servers in sync (replication mechanism)
• Monitoring services
p2pWeb
Slide19Peer-To-Peer : Concept, Tools and Applications
Global server load balancing
• Load balancing– achieved using Round Robin DNS
• simple system, with well known limits (http://www.tenereillo.com/GSLBPageOfShame.htm)
• Failover– achieved by coupling a monitoring system (NAGIOS) with the
DNS• DNS entries have short TTL (time to live)• NAGIOS monitors each web servers• When a server change state (for example DOWN) a special handler is
called that update the DNS entry and reload the DNS• The failed server is no longer announced by the DNS
To have a fully redundant system, we use 3 independents DNS (all primary), each running its own NAGIOS instance
p2pWeb
Slide20Peer-To-Peer : Concept, Tools and Applications
GSLB : Failover illustrated
Initial DNS entries : all server are upwww 300 IN A 82.66.103.28www 300 IN A 195.101.152.113www 300 IN A 82.232.203.167www 300 IN A 66.35.250.210
Server 195.101.152.113 failsIn the syslog trace, we can see :22:22:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;1;Connection refused by host 22:23:47 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;2;Connection refused by host 22:24:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;Connection refused by host
After 3 unsuccessfull try, a notification is send by email to the admin22:24:46 nagios: SERVICE NOTIFICATION: nagios;ns1;HTTP-P2PWEB;CRITICAL;notify-by-email;Connection refused by host
The specific handler is called22:24:47 nagios: SERVICE EVENT HANDLER: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;http_p2pweb_handler
And the DNS is reloaded22:24:47 named[17379]: master/p2pweb.net.zone:1: no TTL specified; using SOA MINTTL instead
And now we can verify that the DNS entries arewww 300 IN A 82.66.103.28;www 300 IN A 195.101.152.113www 300 IN A 82.232.203.167www 300 IN A 66.35.250.210
Failover time is : 2 or 3 minutes (NAGIOS) + DNS max TTL (here 5 minutes) = less than 10 minutes
p2pWeb
Slide21Peer-To-Peer : Concept, Tools and Applications
GSLB : next steps
Improvements :– Better service provisioning (manual process for now)– Better support for “long downtime”
• When a server crash for a long period of time and then recovers its content may be outdated
• We must not announce it back until it has re-synchronize itself
– Proximity load balancing• The goal is to load balance traffic between geographically distributed servers by
finding the site network-wise closest to clients.
• A technology used in the CDN (Content Delivery Network) world
We can use part of the globule project, as Globule support DNS redirection based on 'AS-path length' policy (used in BGP routing) which tries to redirect clients to a server close to them.
These BGP information's can be collected through routeviews.org (no direct access to a BGP router needed)
p2pWeb
Slide22Peer-To-Peer : Concept, Tools and Applications
Web server content management
We have a set of web servers and we need tools to :– Publish content on all servers– Keep them in sync (content replication)
Two main replication strategies• primary backup : one master server to form replicas• active replication : if any changes, one replica propagates them back to all the other ones
ADSL ISP 1
ADSL ISP 2
ADSL ISP 3
p2pWeb
Slide23Peer-To-Peer : Concept, Tools and Applications
static content replication
One server play the master’s role– Content is published first on the master (for example via FTP)– Then the content is either pushed or pulled on the replica
The easiest way is to use rsync (rsync.samba.org)Content can be pulled via anonymous rsync from masterContent can be pushed via rsync over ssh (using private/public key pair for
security)
ADSL ISP 1
ADSL ISP 2
ADSL ISP 3
Master
Replica
Replica
Replica
p2pWeb
Slide24Peer-To-Peer : Concept, Tools and Applications
Content replication : rsync
rsync is a file transfer program for Unix systems. rsync provides a very fast method for bringing remote files into sync. It does this by sending just the differences in the files across the link, without requiring that both sets of files are present at one of the ends of the link beforehand.
Anonymous rsync server (pull mode)• Run as a standalone daemon or can be launched by inetd• Advanced security options (read-only, chroot, IP access list)• Use : run from crontab on each mirror
rsync -a master.mydomain.com::www/ /data/www/
Rsync over SSH (push mode)• Need ssh access on each mirror• And ssh cryptographic keys exchange for unattended operation• Use : run on demand or from crontab on master
rsync -a /data/www/ user@mirror.mydomain.com::/data/www/
Useful options--compress compress file data during the transfer--bwlimit=KBPS limit I/O bandwidth; KBytes per second
p2pWeb
Slide25Peer-To-Peer : Concept, Tools and Applications
Content distribution : Satellite
For a lot of geographically distributed mirrors, an interesting solution can be Datacasting over satellite
• Technology used by some CDN vendors– Skycache, cidera, Skystream.com, panamsat.com
• Now available at lower cost from worldspace.fr (SatPost Solution)
p2pWeb
Slide26Peer-To-Peer : Concept, Tools and Applications
Use of CMS
Nowadays most webmasters use CMS (Content Management System) tools for publishing
– A lot of open source and commercial tools• Spip, mambo, typo3, phpnuke, … (php)
• Bricolage, metadot, slashcode, … (perl)
• Cofax, opencms, magnolia, jahia, … (java)
• Plone, cps, zwook, … (python)
• But none of them has direct support for a distributed architecture
• Most use a database as a backstore
• Database distributed transaction and replication is a hard problem
p2pWeb
Slide27Peer-To-Peer : Concept, Tools and Applications
CMS : a pragmatic solution
The webmaster publish using the CMS as usual– The content is exported as static html files– Then distributed on the replicas using rsync
Constraint : the CMS must support export with “static like URLs”Either directly or thru URL rewriting/article/sport/2005/4/13/football.html (good)/article.php?id_category=3&id_article=25 (bad for mirroring)
ADSL ISP 1
ADSL ISP 2
ADSL ISP 3
webmaster
Master : static html files
Replica
CMS
Back officehtml export
Replica
Replica
p2pWeb
Slide28Peer-To-Peer : Concept, Tools and Applications
CMS : distributed architecture (1)
Example : a non-governmental organization has activity over 4 countries and want to provide a global web presence. The same global web design and tools are used on all servers.
Local publishingEach local webmaster publish news about his country using the CMS on the local server
Content exchange using web servicesEach local web server “collect” (pull) new articles from the other servers using some RSS (Really Simple Syndication) web services
Global web presenceGlobal content is (re)constructed on each server (from all data from the others) and served on Internet
Such solution may be constructed by hacking/customizing existing CMS
ADSL ISP 1
ADSL ISP 2
ADSL ISP 3
Ivory coast
Senegal
Burkina faso
Mali
XML content exchange
p2pWeb
Slide29Peer-To-Peer : Concept, Tools and Applications
CMS : distributed architecture (2)
CMS + Message-oriented middleware (MOM)
A MOM is a client/server infrastructure that increases the interoperability, portability and flexibility of an application by allowing the application to be distributed over multiple heterogeneous platforms.
Thru the use of queue system, a MOM can provide asynchronous reliable data exchange.
MOM is typically asynchronous and peer-to-peer and supports– Point to point communication– Publish and subscribe communication
There is a standardized interface in Java : JMS (java Message Service) APIVarious open source implementation in the java world
ActiveMQ (activemq.codehaus.org)OpenJMS (openjms.sourceforge.net)Joram (joram.objectweb.org)MantaRay (mantamq.org)
No CMS use it now (as far as i know), but it may be a very good solution
p2pWeb
Slide30Peer-To-Peer : Concept, Tools and Applications
Performance monitoring
We collaborate with the webperf.org project– WebPerf is a system for measuring response time of specified URLs
from multiple locations on the internet.– The project is founded on the premise that there are lot of other
companies who also require such a monitoring service. If the other companies are willing to monitor our URLs, we will montior theirs (a free co-peering arrangement).
Some perl script installed on local node collect data from other web site, then data are pushed to a central repository for further analysis.
A web interface allow members to display various statistics.
A view of one’s web site as seen from all other the world.
p2pWeb
Slide31Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (1)
p2pWeb
Slide32Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (2)
p2pWeb
Slide33Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (3)
p2pWeb
Slide34Peer-To-Peer : Concept, Tools and Applications
Node architecture and security
ADSL or Cable modem
Ethernet router/firewallOptional Wifi access point
Private Ethernet LAN
Ethernet link
Internet
Security
Mandatory
•Hardware router/firewall with NAT capabilities
•Internal private network using RFC 1918 IP address (192.168.x.y)
No incoming traffic from the outside other than required
Controlled via redirect on the firewall•http (port 80)
•ssh (port 22, optional)
Web server
P2pweb traffic
p2pWeb
Slide35Peer-To-Peer : Concept, Tools and Applications
Node hardware (example)
Run on the corner of a desk
•An ethernet and wifi switchConnect other computers (not shown here)
•A web and application serverMac mini (apple) running apache2 and tomcat
•A firewallEmbedded PC (www.pcengines.ch) running pf (packet filter) on OpenBSD from a compact flash
No noise, and low electric power consumption (near 50W)
p2pWeb
Slide36Peer-To-Peer : Concept, Tools and Applications
Conclusion
• It can be done (at low cost)
• It runs, with good results(service uptime measured by siteuptime.com)
www.p2pweb.nethosted by the p2pweb networkmonitored Since: 9/23/2004Outages: 40Total Uptime: 99.560%Downtime/year: 38,5 hours
www.afromix.orghosted on a single node monitored Since: 9/23/2004Outages: 37Total Uptime: 97.634%Downtime/year: 207,3 hours
• Still a lot of improvementsNot already an easy to use solution : node admin still require good Unix knowledge
• Most important : a new way to design web applications
p2pWeb
Slide37Peer-To-Peer : Concept, Tools and Applications
The Future
What we can provide right nowP2pweb.net : a global load balancing solution for any distributed web project
• Just provide the servers IP addresses and a health check URL
Mediaport.net : a Community web hosting solution
• We can host various web projects
We are looking for Partnerships in the following domains :Packaging an easy and ready to use solution for deploying web mirrors
(industrializing the solution)
• dedicated LINUX or BSD Distro with preinstalled packages
• “all in one” solution : Java CMS + MOM in one webapp application
Helping in deploying such solution in Least Developed Countries
The P2PWeb Solution fits perfectly for Least Developed Countries with weak bandwidth and low connectivity,
p2pWeb
Slide38Peer-To-Peer : Concept, Tools and Applications
Contacts
P2pweb is a SourceForge project (bsd license)www.p2pweb.net or mediaport.sourceforge.net
Contacts :about the project :
fgaillard@w3architect.com
you want to be hosted on mediaport.net :fabrice.gaillard@mediaport.net
pierre.genillon@mediaport.net
p2pWeb
Slide39Peer-To-Peer : Concept, Tools and Applications
Questions
Thank you
• Questions ?
Recommended