Web Servers
Guntis BārzdiņšArtūrs LavrenovsNormunds Grūzītis
What a basic web server does
What a basic web server does
● Implements the HTTP protocol● Listens for HTTP requests from clients (e.g. browsers)
● Tries to fulfill them with static content from the file system● A web server itself serves only static files
● Receives content from clients (e.g. via HTML forms, incl. uploading of files)
● Forwards dynamic content requests for external execution
● Does other useful tasks via extension modules
Web server market share
F: Apache 1.1, modules supportedH: Apache supports HTTP/1.1 virtual hostingI: Microsoft IIS/4.0 and Active Server PagesM: Apache 2.0Q: Microsoft .NET frameworkN,O,R: Code Red worm, Nimda worm, SQL Slammer wormV: Google App EngineW: Microsoft Hyper-V
Jun 2015
Apache
Constantly has been the most popular server
Highly configurable and extensible (compiled modules)
Runs on many operating systems (primarily, on Unix)
SSL / TSL support
Supports various authentication schemes
Flexible URL rewriting and aliasing
Virtual Hosts
Custom log files, etc.
Apache modules
mod_access Access control based on client hostname or IP address
mod_alias Mapping different parts of the host filesystem in the document tree,
and URL redirection
mod_auth_xxx Various user authentication approaches (file, dbm, form, etc.)
mod_autoindex Automatic directory listings
mod_cgi Execution of CGI scripts
Apache modules
mod_include Server-parsed documents (SSI)
mod_mime Determining document types using file extensions
mod_proxy Caching proxy abilities
mod_rewrite Powerful URI-to-filename mapping using regular expressions
mod_usertrack User tracking using Cookies
Apache modules
mod_ssl Provides strong cryptography via the Secure Sockets
Layer (SSL) and Transport Layer Security (TLS) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL
Since Apache 1.3+ (1998) Latest version: Apache 2.4 (since 2012)
Private and Public keys Thawte (thawte.com), Verisign (verisign.com)
Apache modules
Third-party modules for server-side scripting:
mod_php Executes PHP within Apache
mod_python Executes Python within Apache
mod_ruby Executes Ruby within Apache
mod_jk Connects Tomcat with Apache
etc.
Compiling and installing Apache
./configure --enable-layout=Debian
Use Debian style directory layout
--enable-suexec Allows you to uid and gid for spawned processes (CGI, SSI)
--enable-MODULE=shared Compiles, installs and adds the module as .so
--disable-MODULE Some modules are compiled by default (e.g. autoindex, cgi) and
have to be disabled explicitly
vs. e.g. apt-get install <module>
Apache directory layout
Debian
/etc/init.d/apache2
Apache control script
/etc/apache2/
Apache configuration files
/var/www/
Default Document Root
/usr/lib/cgi-bin/
Default directory for scripts
/var/log/apache2/
Log files (access.log, error.log)
/usr/bin/
htpasswd, htdigest, htdbm
/usr/lib/apache2/modules/
Apache modules
/usr/lib/apache2/suexec
CGI wrapper
Apache access log
LogFormat "%v %h %l %u %t \"%r\" %>s %b" commonCustomLog /usr/local/apache/logs/access_log common
%v – virtual host %h – remote host %u – user %t - time %r – HTTP request %>s – status code %b – size
www.atlants.lv 159.148.85.46 - - [21/Nov/2004:17:23:36 +0200]
"GET /index.php?m=5 HTTP/1.1" 200 32257
Apache error log
ErrorLog /usr/local/apache/logs/error_logLogLevel warn
[Sun Nov 21 09:13:42 2004] [error] PHP Fatal error: Call to undefined function PN_DBMsgError() in /home/msaule/public_html/referer.
php on line 85
[Sun Nov 21 12:41:09 2004] [error] [client 81.198.145.117] File does not exist: /home/sms/public_html/favicon.ico
php on line 85
[Sun Nov 21 13:02:50 2004] [error] [client 66.249.66.173] File does not exist: /home/code/public_html/robots.txt
[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/_vti_bin/owssvr.dll
[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/MSOffice/cltreq.asp
Configuring Apache
Edit httpd.conf
Check configuration: apachectl configtest
Restart Apache
Test changes
http://httpd.apache.org/docs/
Virtual hosts
<VirtualHost *>
ServerName www.jrt.lv
ServerAlias www.jrt.com
CustomLog /usr/local/apache/logs/jrt_access_log common
ErrorLog /usr/local/apache/logs/jrt_error_log
DocumentRoot /home/jrt/public_html
</VirtualHost>
Configuring Apache
.htaccess (directory-level, read on every request)
AuthType Basic
AuthUserFile /home/someuser/passwd
AuthName "Admin"
require valid-user
htpasswd
htpasswd -c <password file> <username>
user1:Y90u499mUj6xE
user2:DOrWgcNwzaQUQ
Configuring Apache
Script Engine (PHP, Python, ...)
Browser Web Server
HTMLPNGCSS
...
Database Server(MySQL, ...)
Dynamic content
LAMP
● Linux - Apache - MySQL - PHP● The most common web server stack● Simple to install and configure● Simple to develop web applications● Acceptable performance and security
● apt-get install apache2 mysql-server php5 libapache2-mod-php5
MySQL
● Unix distributions moving towards MariaDB after the acquisition of MySQL by Oracle● MySQL fork, being led by the original developers of MySQL
● Fast relation DB implementation● Fairly easy to user (app developer)● Different storage engines
● With/without without transactions, memory based, etc.
● Query caching● User quotas
PHP
● One of the most popular programming languages for web applications
● Easy to learn (though, bad coding practices)● Interpreted language● Functions from Unix libraries and tools● Huge amount of ready applications, libraries and
modules
● Create a database● Using the MySQL command prompt accessed by
– $ mysql -u root -p– > CREATE DATABASE `example` COLLATE
'utf8_general_ci';– > CREATE TABLE `posts` (...)– > CREATE USER 'example'@'localhost' IDENTIFIED BY
PASSWORD '...'– > GRANT ... ON `example`.* TO 'example'@'localhost';– > INSERT INTO `posts` (`title`,`info`) VALUES
('a','a');
Simple web app
Simple web app
● Or be lazy and use a web interface like phpMyAdmin or Adminer– Download single file adminer.php
– Drop it into /var/www/
– Navigate your browser to http://localhost/adminer.php
– Do all the tasks in browser without really knowing SQL
Simple web app
● Create file example.php in /var/www/● Write your HTML with PHP code inside
– Connect to database
– Select data
– Show data
● Your simple web site is ready● Navigate your browser to http://localhost/example.php● Enjoy result
Simple web app
Webservers cannot create dynamic content by themselves
Two options how to server dynamic content [Apache] modules
CGI / SSI, FastCGI, SCGI, WSGI, ...
Potentially many programming languages PHP, Perl, Python, Java, ...
C, C++, shell scripts, ...
Dynamic content
CGI - Common Gateway Interface
● A standard environment for web servers to interface with external executable programs● Any script or binary executable
● For each request, webserver defines set of environment variables derived from the request and the server configuration
● Web server starts the external program in the prepared environment● No additional libraries required
● Sends GET/POST data as standard input
● Waits for standard output from executed program, and returns it to the client● With additional HTTP headers
● REQUEST_METHOD: name of HTTP method
● PATH_INFO: path suffix, if appended to URL after program name and a slash
● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present
● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi
● QUERY_STRING: the part of URL after the ? character (GET)
● REMOTE_HOST: host name of the client
● REMOTE_ADDR: IP address of the client (dot-decimal)
● Variables passed by the user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers
● Few more
CGI enivronment variables
CGI example
#!/bin/bash
echo "Content-type: text/plain"
echo ""
echo "Hello world!"
echo "Today is:" `date`
SSI – Server Side Includes
• Directives in HTML pages that are evaluated by the server while the pages are being served
• Without having to serve the entire page via a CGI program
• Configure httpd.conf or .htaccess: Options +Includes
• Two ways to tell Apache which files should be parsed:
• Parse any file with a particular file extension:
• AddType text/html .shtml
• AddOutputFilter INCLUDES .shtml
• Parse files if they have the execute bit set:
• XBitHack on
• For existing files: chmod instead of changing the file name
SSI – Server Side Includes
• <!--#echo var="DATE_LOCAL" -->
• <!--#flastmod file="index.html" -->
• <!--#include virtual="/footer.html" -->
• <!--#include virtual="/cgi-bin/counter.pl" -->
• <!--#exec cmd="ls" -->
• Setting variables
• Conditional expressions
• A simple but Turing complete programming language
• Loops can be implemented via recursive redirects
CGI issues
● Each request forks a new process: a big overhead for process creation and destruction
● All scripts must be interpreted on each request: another overhead● May be reduced by using compiled CGI programs
● Not scalable● Not suitable for modern web servers (needs)● Still widely used in embedded systems (e.g. WiFi
router web management consoles) that require occasional requests
FastCGI
● One or more persistent processes started (pre-forked)● Web server communicates over sockets or TCP● Each process serves many requests● Performance comparable to modules● Facilitates reuse of resources (DB connections, in-
memory caching, etc.)● Separation of web server and dynamic content system● Scalability – deploy processes across a server farm● apt-get install libapache2-mod-fastcgi php5-fpm
Other communication methods
● Integrate the dynamic content generation system with the web server process (Apache modules)
● CGI derivatives● Simple Common Gateway Interface (SCGI): similar to
FastCGI but is designed to be easier to implement
● *SGI (web-server gateway interfaces) implement programming language specific method of communication between web server and applications● WSGI – Python, PSGI – Perl, Rack - Ruby
● Proxy requests to applications that implement communication via HTTP
C10K problem
● Dan Kegel, 1999● Web servers should handle 10,000 clients
simultaneously (not the same as 10K requests)● Operating system kernel limitations● Functionality provided by the operating system● Web server design flaws
C10K – OS kernel
● Open source nature of Unix kernels allowed to quickly identify C10K bottlenecks and fix them
● Networking related algorithms and data structures in Unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n)
● As a result networking capabilities of Unix kernels are virtually limitless (limited by hardware resources)
C10K – OS functionality
● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue – *BSD)– Better performance than traditional poll/select
– e.g. on a large number of file descriptors
– Can receive all pending event using one system call
● AIO – the POSIX asynchronous I/O (AIO) interface – allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background)
● The application can select to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all
C10K – web server design
● Non-blocking I/O for networking and disk– Don't block waiting on action completion, serve other
requests and wait for notifications about I/O completion
● Many threads– Use all available CPU cores to achieve maximum
concurrency, avoid locking data structures
● Each thread serves many requests– Don't create thread per request, reuse threads, while some
non-blocking action completes process other requests
C10M problem
● 10 million concurrent connections per server● Doubling the CPU speed does not double the number of
open connections● Current Unix kernels can't handle that
– Application thread locks in kernel– Hardware drivers (NIC)– Memory management
● Solution: new generation of high load Unix kernels– 1 main application per server– Minimize system call amount– Minimize kernel work
nginx
• A C10K webserver● Apache implements a thread per connection model
● nginx does not create a new process/thread per connection (does not use the thread scheduler as a packet scheduler)● Typically, one single-threaded worker process per CPU
● Each worker can asynchronously handle thousands of concurrent connections (handles the scheduling itself)
• Event-driven: event is a new connection
• Asynchronous: handles interaction for more than one connection at a time
• Non-blocking: does not stop disk I/O because the CPU is busy; works on other events until the I/O is freed up
nginx
● Efficient CPU usage● Less cores needed
● Small memory footprint per request● High-performance
● Thousands connections/requests per second
● Often used as front-end to high-load websites● Load-balancing (reverse proxy), caching etc.
High-load web systems
● Busy dynamic web sites cannot reside in one server● Need some strategy how to split load across multiple
web servers● One possible strategy
– One entry point, front-end, which receives all requests and splits the load (e.g. nginx, Varnish)
– Back-ends process requests from redirected from the front-end (e.g. nginx, Apache)
Varnish
● Starpniekserveris (proxy server)– Reversais
– Kešojošais
– Programmējams
● Slodzes dalītājs (load balancer)● Dinamiskā satura ģenerētājs● Rīki: žurnalēšana, atkļūdošana, monitorēšana● Lietotāji: Facebook, Twitter, WikiLeaks, ThePirateBay
● Izstrādāts Norvēģijā
● Fantastiska veiktspēja pat uz lētā gala serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma
● C + labi C programmētāji
● Izmanto Unix arhitektūras priekšrocības
● Pēc «tjūninga» desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s
● Pieprasījuma orientēta domēnspecifiska konfigurēšanas/programmēšanas valoda VCL
Varnish
● Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana
● Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku lapu sekundē
● Jebkurš izstrādes ietvars padara dinamiskas lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāku
● Jau tikai daži desmiti pieprasījumi sekundē
● Rupja matemātika: 100x100=10 000 reižu lēnāk kā statiska lapa
Kešošana
● Ideāli būtu atgriezt dinamisku saturu ar veiktspēju līdzīgu statiskām lapām
● Saturu, kas noteiktā laika intervālā būtiski nemainās, iespējams uz laiku saglabāt, lai atkalizmantotu
● Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai
● Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva
Kešošana
● Pēc pieprasījuma adreses (pilnas vai regulāras izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot
● Reklamējas, ka var paātrināt lapas atgriešanu no simtiem līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs● Ātrs, salīdzinoši ar citām kešošanas pieejām
Varnish kešošana
DSL VCL● Vienkārša sintakse (līdzīga C), kas tiek notranslēta
uz C un tad nokompilēts uz mašīnkodu● =, ==, !=, ~, !~, !, &&, ||, +, “string”● if () {} else {}, set, unset, return
● 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt
● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp
sub vcl_recv {
if (req.request == "GET" && req.url ~ “\.js$”) {
return (lookup); }
}
VCL apstrādes arhitektūra
Integrēšana● Fiksētais kešošanas laiks var nebūt optimāls
● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju
● Retāk – serveri veic nevajadzīgu darbu
● Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina
acl purge { "192.168.0.0"/24; }
sub vcl_recv { if (req.request == "PURGE" ) {
if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } }
sub vcl_hit { if (req.request == "PURGE") {
purge;
error 200 "Purged."; } }
Dinamiskā satura ģenerēšana ESI● Bieži vien tīmekļa lapas sastāv no blokiem, kuru
mainība ir dažāda● Vai arī ir neliels informācijas bloks, kas atbilst katram
lietotājam (piemēram, “Sveiks, [Jāni Bērziņ], Tev ir [0] jauns ziņas”)
● Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu ar Varnish
<TABLE><TR><esi:include src=”sveiks.html”/></TR>
<TR><TD><esi:include src=”index.html”/></TD>
<TD><esi:include src=”article.html”/></TD></TR>
</TABLE>● Varnish parsē <esi> birkas un saliek elementus kopā, visi
elementi konfigurēti un kešoti kā neatkarīgi
Slodzes dalīšana● Vienu adresi var apstrādāt vairāki ar bakendi● Dažādus url var apstrādāt dažādi bakendi● Monitorēšana
● Beigto serveru atslēgšana (restart, upgrade, repair)● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)
● Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai
● Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance)
● Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār
Varnish lietojums Latvijā$ curl -I www.tvnet.lv
● HTTP/1.1 200 OK
● Server: Apache
● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT
● Expires: Wed, 07 Nov 2012 20:10:08 GMT
● Cache-Control: max-age=60
● Vary: Accept-Encoding
● Content-Type: text/html; charset=UTF-8
● Content-Length: 185924
● Date: Wed, 07 Nov 2012 20:10:15 GMT
● X-Varnish: 2025605055 2025545136
● Age: 67
● Via: 1.1 varnish
● Connection: keep-alive
● $ curl -I www.delfi.lv
● HTTP/1.1 200 OK
● X-Fe-Node: nuffy
● Content-type: text/html; charset=utf-8
● Server: lighttpd/1.4.31 (PLD Linux)
● Content-Length: 159097
● Date: Wed, 07 Nov 2012 20:20:58 GMT
● X-Varnish: 734492112 734450241
● Age: 58
● Via: 1.1 varnish
● Connection: keep-alive
Situācija šobrīd
● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas:● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi● Vienlaicīgi apkalpojamo klientu skaits● Savietojamība ar citām tehnoloģijām● Nākotnes attīstības iespējas
Notikumvirzītie programmēšanas ietvari
● Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js)
● Maza izplatība tīmekļa risinājumos● Risina standarta tehnoloģiju problēmas● Reaktora projektējums, C10K problēma● Ļauj tīmekļa programmētājiem veidot tīkla risinājumus