Concurrent HTTPProxy with Caching
Ashwin BharambeAshwin Bharambe
Monday, Dec 4, 2006Monday, Dec 4, 2006
2
Outline
ParsingParsingSome quick hints
Threads Threads Review of the lecture
SynchronizationSynchronizationUsing semaphores; preview of Wed. lecture
Caching in the proxyCaching in the proxy
Questions?Questions?
3
Parsing a HTTP request
Things to keep in mindThings to keep in mindRead all lines of the request, not just the first rio_readlineb
Look for Host:, Connection: headers
How do you parse?How do you parse?strtok? Complex semantics Modifies the string passed as the argument
sscanf sscanf(“%s %s %s”, line, req, url, version)
Hand-coded strchr(‘ ’) and strdup
4
Allocating Buffer Space
Size of request is not known before-handSize of request is not known before-handClient can send an arbitrary number of headers
Size of response is not known before-handSize of response is not known before-handServer may not set a Content-Length headerSome servers set it incorrectly!
How do you allocate space beforehand, then?How do you allocate space beforehand, then?You cannot! Use realloc(), periodically adding more space
n = rio_readnb(…);if (used + n > alloced) { req = realloc (...); alloced += chunk_size;}
5
Concurrent servers
WebBrowser
WebServer
WebBrowser
WebBrowser
WebServer
WebServer
Proxy
Iterative servers can only serve one client at Iterative servers can only serve one client at a timea time
Concurrent servers handle multiple requests Concurrent servers handle multiple requests in parallelin parallel
6
Implementing concurrency1. Processes1. Processes
Fork a child process for every incoming client connectionDifficult to share data among child processes
2. Threads2. ThreadsCreate a thread to handle every incoming client connectionOur focus today
3. I/O multiplexing with Unix 3. I/O multiplexing with Unix select()select()Use select() to notice pending socket activity Manually interleave the processing of multiple open connectionsMore complex!
~ implement your own app-specific thread package!
7
Traditional view of a process
Process = process context + code, data, & Process = process context + code, data, & stackstack
shared libraries
run-time heap
0
read/write data
Program context: Data registers Condition codes Stack pointer (SP) Program counter (PC)Kernel context: VM structures Descriptor table brk pointer
Code, data, and stack
read-only code/data
stackSP
PC
brk
Process context
8
Alternate view of a process
Process = thread + code, data, & kernel Process = thread + code, data, & kernel contextcontext
shared libraries
run-time heap
0
read/write dataThread context: Data registers Condition codes Stack pointer (SP) Program counter (PC)
Code and Data
read-only code/data
stackSP
PC
brk
Thread (main thread)
Kernel context: VM structures Descriptor table brk pointer
9
A process with multiple threads
Multiple threads can be associated with a processMultiple threads can be associated with a processEach thread has its own logical control flow (instruction flow)Each thread shares the same code, data, and kernel contextEach thread has its own thread ID (TID)
shared libraries
run-time heap
0
read/write data
Shared code and data
read-only code/dataThread 1 context: Data registers Condition codes SP1 PC1
stack 1
Thread 1 (main thread)
Kernel context: VM structures Descriptor table brk pointer
Thread 2 context: Data registers Condition codes SP2 PC2
stack 2
Thread 2 (peer thread)
10
Threads vs. processes
How threads and processes are similarHow threads and processes are similarEach has its own logical control flow.Each can run concurrently.Each is context switched.
How threads and processes are differentHow threads and processes are differentThreads share code and data, processes (typically) do not.Threads are less expensive than processes.
Process control (creating and reaping) is twice as expensive as thread control.
Linux/Pentium III numbers:~20K cycles to create and reap a process.~10K cycles to create and reap a thread.
11
Posix threads (pthreads)Creating and reaping threadsCreating and reaping threadspthread_createpthread_joinpthread_detach
Determining your thread IDDetermining your thread IDpthread_self
Terminating threadsTerminating threadspthread_cancelpthread_exitexit [terminates all threads] return [terminates current thread]
12
Hello World, with pthreads
/* * hello.c - Pthreads "hello, world" program */#include "csapp.h"
void *thread(void *vargp);
int main() { pthread_t tid;
Pthread_create(&tid, NULL, thread, NULL); Pthread_join(tid, NULL); exit(0);}
/* thread routine */void *thread(void *vargp) { printf("Hello, world!\n"); return NULL;}
Thread attributes (usually NULL)
Thread arguments(void *p)
return value(void **p)
Upper case Pthread_xxx
checks errors
13
Hello World, with pthreads
main thread
main thread waits for peer
thread to terminate
exit() terminates
main thread and any peer threads
peer thread
call Pthread_create()
call Pthread_join()
Pthread_join() returns
printf()
return NULL;(peer threadterminates)
Pthread_create() returns
14
Thread-based echo serverint main(int argc, char **argv){ int listenfd, *connfdp, port, clientlen; struct sockaddr_in clientaddr; pthread_t tid;
if (argc != 2) { fprintf(stderr, "usage: %s <port>\n", argv[0]); exit(0); } port = atoi(argv[1]);
listenfd = open_listenfd(port); while (1) { clientlen = sizeof(clientaddr); connfdp = Malloc(sizeof(int)); *connfdp = Accept(listenfd,(SA *)&clientaddr,&clientlen); Pthread_create(&tid, NULL, thread, connfdp); }}
15
Thread-based echo server
/* thread routine */void *thread(void *vargp){ int connfd = *((int *)vargp);
Pthread_detach(pthread_self()); Free(vargp);
echo_r(connfd); /* thread-safe version of echo() */ Close(connfd); return NULL;}
?
pthread_detach() is recommended in the proxy lab
16
Issue 1: detached threads
A thread is eitherA thread is either joinablejoinable or or detacheddetachedJoinableJoinable thread can be reaped or killed by other threads. thread can be reaped or killed by other threads.
must be reaped (pthread_join) to free resources.
DetachedDetached thread can’t be reaped or killed by other thread can’t be reaped or killed by other threads.threads.
resources are automatically reaped on termination.
Default state is joinable.Default state is joinable.pthread_detach(pthread_self()) to make detached.
Why should we use detached threads?Why should we use detached threads?pthread_join() blocks the calling thread
17
Issue 2: avoid unintended sharing
What happens if we pass the address of connfd to What happens if we pass the address of connfd to the thread routine as in the following code?the thread routine as in the following code?
connfdp = Malloc(sizeof(int));*connfdp = Accept(listenfd,(SA *)&clientaddr,&clientlen);Pthread_create(&tid, NULL, thread, connfdp);
connfd = Accept(listenfd,(SA *)&clientaddr,&clientlen);Pthread_create(&tid, NULL, thread, (void *)&connfd);
18
Issue 3: thread-safe
Easy to share data structures between threadsEasy to share data structures between threadsBut we need to do this correctly!
Recall the shell lab:Recall the shell lab:Job data structuresShared between main process and signal handler
Synchronize multiple control flowsSynchronize multiple control flows
19
Synchronizing with semaphoresSemaphores are counters for resources Semaphores are counters for resources shared between threadsshared between threads
Non-negative integer synchronization variable
Two operations: P(s) & V(s)Two operations: P(s) & V(s)Atomic operationsP(s): [ while (s == 0) wait(); s--; ]V(s): [ s++; ]
If initial value of s == 1If initial value of s == 1Serves as a mutual exclusive lock
Just a very brief descriptionDetails in the next lecture
20
Sharing with POSIX semaphores
#include "csapp.h"#define NITERS 1000
unsigned int cnt; /* counter */sem_t sem; /* semaphore */
int main() { pthread_t tid1, tid2;
Sem_init(&sem, 0, 1);
/* create 2 threads and wait */ ...... exit(0);}
/* thread routine */void *count(void *arg){ int i;
for (i=0;i<NITERS;i++){ P(&sem); cnt++; V(&sem); } return NULL;}
21
Thread-safety of library functions
All functions in the Standard C Library are thread-safeAll functions in the Standard C Library are thread-safeExamples: malloc, free, printf, scanf
Most Unix system calls are thread-safeMost Unix system calls are thread-safewith a few exceptions:
Thread-unsafe function Reentrant versionasctime asctime_rctime ctime_rgethostbyaddr gethostbyaddr_rgethostbyname gethostbyname_rinet_ntoa (none)localtime localtime_rrand rand_r
22
Thread-unsafe functions: fixes
Return a ptr to a Return a ptr to a staticstatic variablevariable
Fixes: Fixes: 1. Rewrite code so caller passes pointer to struct
Issue: Requires changes in caller and callee
hostp = Malloc(...));gethostbyname_r(name, hostp, …);
struct hostent *gethostbyname(char name){ static struct hostent h; <contact DNS and fill in h> return &h;}
23
Thread-unsafe functions: fixes2. Lock-and-copy
Issue: Requires only simple changes in caller However, caller must free memory
struct hostent *gethostbyname_ts(char *p) { struct hostent *q = Malloc(...); P(&mutex); /* lock */ p = gethostbyname(name); *q = *p; /* copy */ V(&mutex); return q;}
24
Caching
What should you cache?What should you cache?Complete HTTP response Including headers
You don’t need to parse the response But real proxies do. Why?
If size(response) > MAX_OBJECT_SIZE, don’t If size(response) > MAX_OBJECT_SIZE, don’t cachecache
25
Cache Replacement
Least Recently Used (LRU)Least Recently Used (LRU)Evict the cache entry whose “access” timestamp is farthest into the past
When to evict?When to evict?When you have no space!Size(cache) + size(new_entry)
> MAX_CACHE_SIZEWhat is Size (cache)? Sum of size (cache_entries)
26
Cache Synchronization
A single cache is shared by all proxy threadsA single cache is shared by all proxy threadsMust carefully control access to the cache
What operations should be locked? What operations should be locked? add_cache_entryremove_cache_entrylookup_cache_entry
For the ambitious:For the ambitious:Multiple readers can peacefully co-existBut if a writer arrives, that thread MUST synchronize access with others
27
Summary
Threading is a clean and efficient way to Threading is a clean and efficient way to implement concurrent serverimplement concurrent server
We need to synchronize multiple threads for We need to synchronize multiple threads for concurrent accesses to shared variablesconcurrent accesses to shared variables
Semaphore is one way to do thisThread-safety is the difficult part of thread programming
Final review session:Final review session:Friday 1-2:30pm WeH 7500 (all TAs)