What web servers do
● Implement HTTP protocol● Listen for HTTP requests from browsers● Try to fulfill them with static content from file
system● Modern web servers also
– Forward dynamic content requests to other systems– Do lots of useful tasks using modules
C10K problem
● Dan Kegel, 1999● Web servers should handle ten thousand
clients simultaneously● Operating system kernel limitations● Operating system provided functionality● Web server design flaws
C10K problem solution – OS kernel
● Open source nature of unix kernels allowed to quickly identify all C10K bottlenecks and fix them
● Networking related algorithms and data structures in unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n)
● As a result networking capabilities of unix kernels are virtually limitless (limited by hardware resources)
C10K - OS functionality
● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue - *BSD)– Better performance than traditional poll/select
– Can receive all pending event using one system call
● AIO - The POSIX asynchronous I/O (AIO) interface allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background). The application can elect to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all.
C10K – web server design
● Non-blocking I/O for networking and disk– Don't block waiting on action completion, serve
other requests and wait for notifications about I/O completion
● Many threads– Use all available CPU cores to achieve maximum
concurrency, avoid locking data structures● Each thread serves many requests
– Don't create thread per request, reuse threads, while some non-blocking action completes process other requests
C10M problem – Next decade
● 10 million concurrent connections per server● Current unix kernels can't handle that
– Application thread locks in kernel– Hardware drivers (NIC)– Memory management
● Solution: new generation of high load unix kernels– 1 main application per server– Minimize system call ammount– Minimize kernel work
Dynamic content
● Web servers can't create dynamic content themselves
● We need application created in some programming language
● We need some method how web server can communicate with application– CGI– Apache modules– FastCGI, SCGI, ...– WSGI, PSGI, JSGI, ...
CGI - Common Gateway Interface
● Oldest method of getting dynamic content from web servers
● For each browser request web server defines set of environment variables derived from request and server configuration
● Web server starts application in prepared environment
● Send POST data as standard input (if any)● Waits for standard output from executed file
and returns it to browser
CGI application
● Can be ANY script or binary file executable in UNIX
● No libraries required● Use request information from environment
variables● Or ignore it completely if not needed● Process standard input if needed● Output additional HTTP headers and then
generated document body in standard output
CGI enivronment variables
● REQUEST_METHOD: name of HTTP method
● PATH_INFO: path suffix, if appended to URL after program name and a slash
● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present
● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi
● QUERY_STRING: the part of URL after ? character. The query string may be composed of *name=value pairs separated with ampersands (such as var1=val1&var2=val2...) when used to submit form data transferred via GET method as defined by HTML application/x-www-form-urlencoded
● REMOTE_HOST: host name of the client, unset if server did not perform such lookup
● REMOTE_ADDR: IP address of the client (dot-decimal)
● Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers
● Only few more
CGI example
#!/bin/bash
echo "Content-type: text/plain"
echo ""
echo "Hello world!"
echo "Today is:" `date`
CGI issues
● Each request forces to create new process, big overhead for process creation and destruction
● All script files must be interpreted on each request, another big overhead
● Not scalable● Not suitable for modern web servers● Still widely used in embedded systems (e.g. wifi
router web management console) which require occasional requests
FastCGI● Multiple processes started● Web server communicate over sockets or TCP● Each process serves many requests● Good performance● Complete separation of web server and
dynamic content system● Great scalability – put FastCGI processes
across server farm
Other communication methods
● Integrate dynamic content generation system with web server process (Apache modules)
● CGI derivatives (SCGI)● *SGI implement programming language specific
method of communication between web server and selected programming language (WSGI – Python, PSGI – Perl)
● Proxy requests to applications that implement communication via HTTP
LAMP
● Linux Apache MySQL PHP● Most common web server stack● Simple to install and configure● Simple to develop web applications● Acceptable performance and security
Apache● One of the oldest web servers● Still actively developed● Most popular web server today and in recorded
web server history● Highly configurable and extensible using
modules● All in one solution● Runs on many OS, most often on unix servers
PHP
● One of the most popular web application programming language
● Easy to learn (bad coding practices)● Interpreted language● Functions from unix libraries and tools● Huge ammount of ready applications, libraries
and modules
MySQL
● Unix distributions moving towards MariaDB (MySQL fork) after acquisition by Oracle
● Fast relation DB implementation● Fairly easy to user● Different storage engines (faster without
transactions, slower with, memory based, etc.)● Query caching● User quotas
Historical installation
● Acquire source files for all required software (Apache MySQL PHP)
● Acquire all dependencies and install them● Configure make files via ./configure● Compile everything● Configure each piece of software so it works
with other● Use it
Modern installation
● Use OS package manager– root@server# apt-get install libapache2-mod-php5
apache2 php5 mysql-server● Use it
Simple web site example● Create database user, database, table structure
and maybe some data● Using MySQL command prompt accessed by
– $ mysql -u root -p– > CREATE DATABASE `example` COLLATE
'utf8_general_ci';– > CREATE TABLE `posts` (...)– > CREATE USER 'example'@'localhost' IDENTIFIED
BY PASSWORD '…'– > GRANT ... ON `example`.* TO 'example'@'localhost';– > INSERT INTO `posts` (`title`, `info`) VALUES ('a', 'a');
Simple web site example II
● Or be lazy and use some web interface like phpMyAdmin or Adminer– Download single file adminer.php– Drop it into /var/www– Navigate your browser to
http://localhost/adminer.php– Do all the tasks in browser without really knowing
SQL
Simple web site example III
● Create file example.php in /var/www● Write your HTML and PHP code inside
– Connect to database– Select data– Show data
● Your simple web site is ready● Navigate your browser to
http://localhost/example.php● Enjoy result
nginx
● Contestant for 2nd place in web server rating● Event-driven● High-performance (thousands req/s)● Small memory footprint per request● Efficient CPU usage● Advanced configuration and functionality via
modules● Often used as FrontEnd to big websites● CloudFlare built on top of it
High-load web systems
● Big dynamic web site can't reside in only 1 server
● Need some strategy how to split load across multiple web servers
● One possible strategy– One entry point “FrontEnd” which receives all
requests and can handle the load (e.g., Varnish, nginx)
– Backends process requests from FrontEnd (nginx, Apache)
Kas ir Varnish?
● Starpniekserveris (proxy server)– Reversais– Kešojošais– Programmējams
● Slodzes dalītājs (load balancer)● Dinamiskā satura ģenerētājs● Rīki – žurnalēšanas, atkļūdošanas,
monitorēšānas
Kādēļ Varnish?● Fantastiska veiktspēja pat uz lētā gala
serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma● C + LABI C programmētāji● Izmanto Unix arhitektūras labumus
● Pēc tūninga desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s
● Brīva programmatūra (free open source)● Pieprasījuma orientēta domēnspecifiska
konfigurēšanas/programmēšanas valoda VCL● Gandrīz viss, kas nepieciešams augstas
noslodzes tīmeklim, vienā
Kešošana● Jebkura dinamiskas tīmekļa lapas ģenerēšana
ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana● Lētā gala serveris var ģenerēt pāris simtus šādu
dinamisku lapu sekundē● Jebkurš izstrādes ietvars padara dinamiskas
lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāks (it īpaši Java EE, Zend Framework)● Jau tikai daži desmiti pieprasījumi sekundē
● Rupja matemātika 100x100=10 000 reižu lēnāk kā statiska lapa
Kešošana II● Ideja – būtu ideāli atgriezt dinamisku saturu ar
veiktspēju līdzīgu statiskām lapām● Mēs varam saglabāt tās lapas, kas ir vienādas
lietotājam un būtiski nemainās noteiktā laika posmā
● Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai
● Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva
Varnish kešošana● Pēc pieprasījuma adreses (pilnas vai regulāras
izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot – standarta kešošanas pieeja praktiski visur
● Lietotāji – Facebook, Twitter, WikiLeaks, ThePirateBay
● Izstrādāts Norvēģijā● Reklamējas, ka var paātrināt lapas atgriešanu no 300
līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs
● Ātra salīdzinoši ar citām kešošanas pieejām
DSL VCL● Vienkārša sintakse (līdzīga C), kas tiek notranslēta
uz C un tad nokompilēts uz mašīnkodu● =, ==, !=, ~, !~, !, &&, ||, +, “string”● if () {} else {}, set, unset, return
● 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt
● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp
sub vcl_recv {
if (req.request == "GET" && req.url ~ “\.js$”) {
return (lookup); }
}
Integrēšana● Fiksētais kešošanas laiks var nebūt optimāls
● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju
● Retāk – serveri veic nevajadzīgu darbu
● Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina
acl purge { "192.168.0.0"/24; }
sub vcl_recv { if (req.request == "PURGE" ) {
if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } }
sub vcl_hit { if (req.request == "PURGE") {
purge;
error 200 "Purged."; } }
Dinamiskā satura ģenerēšana ESI● Bieži vien tīmekļa lapas sastāv no blokiem, kuru
mainība ir dažāda● Vai arī ir neliels informācijas bloks, kas atbilst katram
lietotājam (piemēram, “Sveiks, Jānis Bērziņš | Tev ir [0] jauns ziņas”)
● Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu uz Varnish
<TABLE><TR><esi:include src=”sveiks.html”/></TR>
<TR><TD><esi:include src=”index.html”/></TD>
<TD><esi:include src=”article.html”/></TD></TR>
</TABLE>● Varnish parsē <esi> birkas un saliek elementus kopā, visi
elementi konfigurēti un kešoti kā neatkarīgi
Slodzes dalīšana● Vienu adresi var apstrādāt vairāki ar bakendi● Dažādus url var apstrādāt dažādi bakendi● Monitorēšana
● Beigto serveru atslēgšana (restart, upgrade, repair)● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)
● Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai
● Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance)
● Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār
Varnish lietojums Latvijā$ curl -I www.tvnet.lv
● HTTP/1.1 200 OK
● Server: Apache
● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT
● Expires: Wed, 07 Nov 2012 20:10:08 GMT
● Cache-Control: max-age=60
● Vary: Accept-Encoding
● Content-Type: text/html; charset=UTF-8
● Content-Length: 185924
● Date: Wed, 07 Nov 2012 20:10:15 GMT
● X-Varnish: 2025605055 2025545136
● Age: 67
● Via: 1.1 varnish
● Connection: keep-alive
● $ curl -I www.delfi.lv
● HTTP/1.1 200 OK
● X-Fe-Node: nuffy
● Content-type: text/html; charset=utf-8
● Server: lighttpd/1.4.31 (PLD Linux)
● Content-Length: 159097
● Date: Wed, 07 Nov 2012 20:20:58 GMT
● X-Varnish: 734492112 734450241
● Age: 58
● Via: 1.1 varnish
● Connection: keep-alive
Nestandarta lietojumi - WAF● Programmējamība ļauj veidot nestandarta lietojumus,
piemēram, WAF● Definējam pēc iespējas precīzākas saņemto pieprasījumu
apstrādes adreses un metodes
– req.url ~ “^/topic/([0-9])$” nevis “^/topic”– req.request == “GET”
● Beigās izmantojam return(error);● Ierobežojam piekļuvi backend serveriem (vai atvienojam no
interneta)● Uzbrucēji tagad uzbrūk frontendam, aizsargājam to● Nepalīdz pret loģiskām (un daudzām citām) ievainojamībām
New trend
● Web application is central thing● Develop application in some framework● No separate web server, it is now just a part of
application (it is library from used framework)● Extremely customizable
Situācija šobrīd
● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas:● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi● Vienlaicīgi apkalpojamo klientu skaits● Savietojamība ar citām tehnoloģijām● Nākotnes attīstības iespējas
Notikumvirzītie programmēšanas ietvari
●Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js)
●Maza izplatība tīmekļa risinājumos●Risina standarta tehnoloģiju problēmas●Reaktora projektējums, C10K problēma●Ļauj tīmekļa programmētājiem veidot tīkla risinājumus