cgi programming with c

Embed Size (px)

Citation preview

  • 8/8/2019 cgi programming with c

    1/7

    ContentWhy CGI programming?

    A basic exampleAnalysis of the exampleSo what is CGI programming?Using a C program as a CGI scriptThe Hello world test

    How to process a simple formUsing METHOD="POST"Further reading

    Getting Started with CGI Programming in C

    This is an introduction to writing CGIprograms in the C language. The reader isassumed to know the basics of C as wellhow to write simple forms in HTML and tobe able to install CGI scripts on a Webserver. The principles are illustrated withvery simple examples.

    Two important warnings:

    To avoid wasting your time, please checkfrom applicable local docu-ments or by contacting local webmasterwhether you can install and runCGI scripts written in C on the server. At the same time, please checkhow to do that in detailspecifically, where you need to put your CGIscripts.

    This document was written to illustrate the idea of CGI scripting to C pro-grammers. In practice, CGI programs are usually written in other lan-guages, such as Perl, and for good reasons: except for very simple cases,CGI programming in C is clumsy and error-prone.

    Why CGI programming?As my document How to write HTML formsbriefly explains, you need a server side-scriptin order to use HTML forms reliably. Typically, there are simple server-side scriptsavailable for simple, common ways of processing form submissions, such as sending thedata in text format by E-mail to a specified address.

    However, for more advanced processing, such as collecting data into a file or database, orretrieving information and sending it back, or doing some calculations with the submitteddata, you will probably need to write a server-side script of your own.

    CGI is simply an interfacebetween HTML forms and server-side scripts. It is not the onlypossibilitysee the excellent tutorial How theweb works: HTTP and CGI explainedby

    Lars Marius Garshol for both an introduction to the concepts of CGI and notes on otherpossibilities.

    If someone suggests using JavaScript as an alternativeto CGI, ask him to read myJavaScript and HTML: possibilities and caveats. Briefly, JavaScript is inherently unreliableat least if not backed up with server-side scripting.

    A basic exampleThe above-mentioned How the web works: HTTP and CGI explainedis a great tutorial.The following introduction of mine is just another attempt to present the basics; pleaseconsult other sources if you get confused or need more information.

    Let us consider the following simple HTML form:

    Multiplicand 1: Multiplicand 2:

    It will look like the following on your current browser:

    Multiplicand 1:

    Multiplicand 2:

    Multiply!

    You can try it if you like. Just in case the server used isnt running and accessible whenyou try it, heres what you would get as the result:

    Multiplication results

    The product of 4 and 9 is 36.

    converted by Web2PDFConvert.com

    http://www.cs.tut.fi/~jkorpela/forms/javascript.htmlhttp://www.garshol.priv.no/download/text/http-tut.htmlhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.garshol.priv.no/download/text/http-tut.htmlhttp://www.cs.tut.fi/~jkorpela/forms/javascript.htmlhttp://www.garshol.priv.no/http://www.garshol.priv.no/download/text/http-tut.htmlhttp://www.cs.tut.fi/~jkorpela/forms/index.htmlhttp://www.cs.tut.fi/~jkorpela/perl/index.htmlhttp://www.cs.tut.fi/~jkorpela/html-primer.htmlhttp://www.cs.tut.fi/~jkorpela/forms/index.htmlhttp://www.eskimo.com/~scs/C-faq/top.html
  • 8/8/2019 cgi programming with c

    2/7

    Analysis of the exampleWe will now analyze how the example above works.

    Assume that you type 4 into one input field and 9 into another and then invoke submis-siontypically, by clicking on a submit button. Your browser will send, by the HTTPprotocol, a request to the server at www.cs.tut.fi. The browser pick up this server namefrom the value ofACTION attribute where it occurs as the host name part of a URL.(Quite often, theACTION attribute refers, often using a relative URL, to a script on thesame server as the document resides on, but this is not necessary, as this exampleshows.)

    When sending the request, the browser provides additional information, specifying arelative URL, in this case

    /cgi-bin/run/~jkorpela/mult.cgi?m=4&n=9This was constructed from that part of theACTION value that follows the host name, byappending a question mark ? and the form data in a specifically encoded format.

    The server to which the request was sent (in this case, www.cs.tut.fi) will then process itaccording to its own rules. Typically, the servers configuration defines how the relativeURLs are mapped to file names and which directories/folders are interpreted as containingCGI scripts. As you may guess, the part cgi-bin/ in the URL causes such interpretation inthis case. This means that instead of just picking up and sending back (to the browser thatsent the request) an HTML document or some other file, the server invokesa script or aprogram specified in the URL (mult.cgi in this case) and passes some data to it (the data

    m=4&n=9 in this case).

    It depends on the server how this really happens. In this particular case, the server actually runs the

    (executable) program in the file mult.cgi in the subdirectory cgi-bin of userjkorpelas home directory. It

    could be something quite different, depending on server configuration.

    So what is CGI programming?The often-mystified abbreviation CGI, for Common Gateway Interface, refers just toa convention on how the invocation and parameter passing takes place in detail.

    Invocation means different things in different cases. For a Perl script, the server wouldinvoke a Perl interpreter and make it execute the script in an interpretive manner. For anexecutable program, which has typically been produced by a compiler and a loader from a

    source program in a language like C, it would just be started as a separate process.

    Although the word scripttypically suggests that the code is interpreted, the term CGIscript refers both to such scripts and to executable programs. See the answer to questionIs it a script or a program?in CGI Programming FAQby Nick Kew.

    Using a C program as a CGI scriptIn order to set up a C program as a CGI script, it needs to be turned into a binaryexecutable program. This is often problematic, since people largely work on Windowswhereas servers often run some version of UNIX or Linux. The system where you developyour program and the server where it should be installed as a CGI script may have quitedifferent architectures, so that the same executable does not run on both of them.

    This may create an unsolvable problem. If you are not allowed to log on the server andyou cannot use a binary-compatible system (or a cross-compiler) either, you are out ofluck. Many servers, however, allow you log on and use the server in interactive mode, asa shell user, and contain a C compiler.

    You need to compile and load your C program on the server (or, inprinciple, on a system with the same architecture, so that binaries producedfor it are executable on the server too).

    Normally, you would proceed as follows:

    1. Compi le and test the C program in normal interactive use.

    2. Make any changes that might be needed for use as a CGI script. The program should

    read its input according to the intended form submission method. Using the defaultGET method, the input is to be read from the environment variable. QUERY_STRING.(The program may also read data from filesbut these must then reside on theserver.) It should generate output on the standard output stream (stdout) so that itstarts with suitable HTTP headers. Often, the output is in HTML format.

    3. Compi le and test again. In this testing phase, you might set the environment variableQUERY_STRING so that it contains the test data as it will be sent as form data. E.g., if

    converted by Web2PDFConvert.com

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.webthing.com/tutorials/cgifaq.htmlhttp://www.webthing.com/tutorials/cgifaq.1.html#2http://www.cs.tut.fi/~jkorpela/perl/index.htmlhttp://www.w3.org/TR/REC-html40/interact/forms.html#form-content-type
  • 8/8/2019 cgi programming with c

    3/7

    you intend to use a form where a field named foo contains the input data, you cangive the commandsetenv QUERY_STRING "foo=42" (when using the tcsh shell)orQUERY_STRING="foo=42" (when using the bash shell).

    4. Check that the compiled version is in a format that works on the server. This mayrequire a recompilation. You may need to log on into the server computer (usingTelnet, SSH, or some other terminal emulator) so that you can use a compiler there.

    5. Upload the compiled and loaded program, i.e. the executable binary program (and anydata files needed) on the server.

    6. Set up a simple HTML document that contains a form for testing the script, etc.

    You need to put the executable into a suitable directory and name it according to server-specific conventions. Even the compilation commands needed here might differ from whatyou are used to on your workstation. For example, if the server runs some flavor of Unixand has the Gnu C compi ler available, you would typically use a compilation commandlike gcc -o mult.cgi mult.c and then move (mv) mult.cgi to a directory with a name likecgi-bin. Instead ofgcc, you might need to use cc. You really need to check localinstructions for such issues.

    The filename extension .cgi has no fixed meaning in general. However, there can beserver-dependent(and operating system dependent) rules for naming executable files.Typicalextensions for executables are .cgi and .exe.

    The Hello world testAs usual when starting work with some new programming technology, you shouldprobably first make a trivial program work. This avoids fighting with many potentialproblems at a time and concentrating first on the issues specific to the environment, hereCGI.

    You could use the following program that just prints Hello world but preceded by HTTPheaders as required by the CGI interface. Here the header specifies that the data is plain

    ASCII text.

    #include int main(void) {

    printf("Content-Type: text/plain;charset=us-ascii\n\n");

    printf("Hello world\n\n");return 0;

    }

    After compiling, loading, and uploading, you should be able to test the script simply byentering the URL in the browsers address bar. You could also make it the destination of anormal link in an HTML document. The URL of course depends on how you set things up;the URL for my installed Hello world script is the following:http://www.cs.tut.fi/cgi-bin/run/~jkorpela/hellow.cgi

    How to process a simple form

    For forms that use METHOD="GET" (as our simple example above uses, since thisis the default), CGI specifications say that the data is passed to the script orprogram in an environment variable called QUERY_STRING.

    It depends on the scripting or programming language used how a program can access thevalue of an environment variable. In the C language, you would use the library functiongetenv (defined in the standard library stdlib) to access the value as a string. You mightthen use various techniques to pick up data from the string, convert parts of it to numericvalues, etc.

    The outputfrom the script or program to primary output stream (such as stdin in the Clanguage) is handled in a special way. Effectively, it is directed so that it gets sent back tothe browser. Thus, by writing a C program that it writes an HTML document onto itsstandard output, you will make that document appear on users screen as a response tothe form submission.

    In this case, the source program in C is the following:

    #include #include int main(void){

    converted by Web2PDFConvert.com

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.cs.tut.fi/~jkorpela/forms/mult.chttp://www.cs.tut.fi/cgi-bin/run/~jkorpela/hellow.cgi
  • 8/8/2019 cgi programming with c

    4/7

    char *data;long m,n;printf("%s%c%c\n","Content-Type:text/html;charset=iso-8859-1",13,10);printf("Multiplication results\n");printf("Multiplication results\n");data = getenv("QUERY_STRING");if(data == NULL) printf("

    Error! Error in passing data from form to script.");else if(sscanf(data,"m=%ld&n=%ld",&m,&n)!=2)

    printf("

    Error! Invalid data. Data must be numeric.");elseprintf("

    The product of %ld and %ld is %ld.",m,n,m*n);

    return 0;}

    As a disciplined programmer, you have probably not iced that the program makes no check against

    integer overflow, so it will return bogus results for very large operands. In real life, such checks would be

    needed, but such considerations would take us too far from our topic.

    Note: The first printffunction call prints out data that will be sent by the server as anHTTP header. This is required for several reasons, including the fact that a CGI script cansend any data (such as an image or a plain text file) to the browser, not just HTMLdocuments. For HTML documents, you can just use the printffunction call above as such;

    however, if your character encoding is different from ISO 8859-1 (ISO Latin 1), which isthe most common on the Web, you need to replace iso-8859-1 by the registered name ofthe encoding (charset) you use.

    I have compiled this program and saved the executable program under the name mult.cgiin my directory for CGI scripts at www.cs.tut.fi. This implies that anyform with action="http://www.cs.tut.fi/cgi-bin/run/~jkorpela/mult.cgi" will, when submitted, be processedby that program.

    Consequently, anyone could write a form of his own with the sameACTIONattribute and pass whatever data he likes to my program. Therefore, theprogram needs to be able to handle any data. Generally, you need tocheck the data before starting to process it.

    Using METHOD="POST"

    The idea ofMETHOD="POST"

    Let us consider next a different processing for form data. Assume that we wish to write aform that takes a line of text as input so that the form data is sent to a CGI script thatappends the data to a text fileon the server. (That text file could be readable by theauthor of the form and the script only, or it could be made readable to the world throughanother script.)

    It might seem that the problem is similar to the example considered above; one wouldjust need a different form and a different script (program). In fact, there is a difference.The example above can be regarded as a pure query that does not change the state ofthe world. In particular, it is idempotent, i.e. the same form data could be submitted as

    many times as you like without causing any problems (except minor waste of resources).However, our current task needs to cause such changesa change in the content of a filethat is intended to be more or less permanent. Therefore, one should useMETHOD="POST". This is explained in more detail in the document MethodsGET andPOST in HTML forms - whats the difference?Here we will take it for granted thatMETHOD="POST" needs to be used and we will consider the technical implications.

    For forms that use METHOD="POST", CGI specifications say that the data ispassed to the script or program in the standard input stream (stdin), and thelength (in bytes, i.e. characters) of the data is passed in an environment variablecalled CONTENT_LENGTH.

    Reading input

    Reading from standard input sounds probably simpler than reading from an environmentvariable, but there are complications. The server is notrequired to pass the data so thatwhen the CGI script tries to read more data than there is, it would get an end of file indi-cation! That is, if you read e.g. using the getchar function in a C program, it is undefinedwhat happens after reading all the data characters; it is not guaranteed that the functionwill return EOF.

    converted by Web2PDFConvert.com

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.cs.tut.fi/~jkorpela/forms/methods.htmlhttp://www.cs.tut.fi/~jkorpela/chars/sorted.htmlhttp://www.cs.tut.fi/~jkorpela/chars.html#encinfo
  • 8/8/2019 cgi programming with c

    5/7

    When reading the input, the program must not try to read more thanCONTENT_LENGTH characters.

    Sample program: accept and append data

    A relatively simple C program for accepting input via CGI and METHOD="POST" is thefollowing:

    #include #include #define MAXLEN 80

    #define EXTRA 5/* 4 for field name "data", 1 for "=" */#define MAXINPUT MAXLEN+EXTRA+2

    /* 1 for added line break, 1 for trailing NUL */#define DATAFILE "../data/data.txt"

    void unencode(char *src, char *last, char *dest){for(; src != last; src++, dest++)

    if(*src == '+')*dest = ' ';

    else if(*src == '%') {int code;

    if(sscanf(src+1, "%2x", &code) != 1) code = '?';*dest = code;src +=2; }

    else*dest = *src;

    *dest = '\n';*++dest = '\0';

    }

    int main(void){char *lenstr;char input[MAXINPUT], data[MAXINPUT];long len;

    printf("%s%c%c\n","Content-Type:text/html;charset=iso-8859-1",13,10);printf("Response\n");lenstr = getenv("CONTENT_LENGTH");if(lenstr == NULL || sscanf(lenstr,"%ld",&len)!=1 || len > MAXLEN)

    printf("

    Error in invocation - wrong FORM probably.");else {

    FILE *f;fgets(input, len+1, stdin);unencode(input+EXTRA, input+len, data);f = fopen(DATAFILE, "a");

    if(f == NULL)printf("

    Sorry, cannot store your data.");

    elsefputs(data, f);

    fclose(f);printf("

    Thank you! Your contribution has been stored.");}

    return 0;}

    Essentially, the program retrieves the information about the number of characters in theinput from value of the CONTENT_LENGTH environment variable. Then it unencodes(decodes) the data, since the data arrives in the specifically encoded format that wasalready mentioned. The program has been written for a form where the text input fieldhas the name data (actually, just the length of the name matters here). For example, if theuser types

    Hello there!then the data will be passed to the program encoded asdata=Hello+there%21(with space encoded as + and exclamation mark encoded as %21). The unencode routinein the program converts this back to the original format. After that, the data is appendedto a file (with a fixed file name), as well as echoed back to the user.

    converted by Web2PDFConvert.com

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.w3.org/TR/REC-html40/interact/forms.html#form-content-typehttp://www.cs.tut.fi/~jkorpela/forms/collect.c
  • 8/8/2019 cgi programming with c

    6/7

    Having compiled the program I have saved it as collect.cgi into the directory for CGIscripts. Now a form like the following can be used for data submissions:

    Your input (80 chars max.):

    Sample program: view data stored on a file

    Finally, we can write a simple program for viewing the data; it only needs to copy thecontent of a given text file onto standard output:

    #include #include #define DATAFILE "../data/data.txt"int main(void){FILE *f = fopen(DATAFILE,"r");int ch;if(f == NULL) { printf("%s%c%c\n",

    "Content-Type:text/html;charset=iso-8859-1",13,10);

    printf("Failure\n");printf("

    Unable to open data file, sorry!"); }

    else { printf("%s%c%c\n",

    "Content-Type:text/plain;charset=iso-8859-1",13,10); while((ch=getc(f)) != EOF)

    putchar(ch);fclose(f); }

    return 0;}

    Notice that this program prints (when successful) the data as plain text, preceded by aheader that says this, i.e. has text/plain instead oftext/html.

    A form that invokes that program can be very simple, since no input data is needed:

    Finally, heres what the two forms look l ike. You can now test them:

    Form for submitting data

    Please notice that anything you submit here will become visible to the world:

    Your input (80 chars max.):

    Send

    Form for checking submitted data

    The content of the text file to which the submissions are stored will be displayed as plaintext.

    View

    Even though the output is declared to be plain text, Internet Explorer may interpret itpartly as containing HTML markup. Thus, if someone enters data that contains suchmarkup, strange things would happen. The viewdata.c program takes this into account bywriting the NUL character ('\0') after each occurrence of the greater-than character lt;, so

    that it will not be taken (even by IE) as starting a tag.

    Further readingYou may now wish to read The CGI specification, which tells you all the basic detailsabout CGI. The next step is probably to see what the CGI Programming FAQcontains.Beware that it is relatively old.

    converted by Web2PDFConvert.com

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.web2pdfconvert.com/?ref=PDFhttp://www.webthing.com/tutorials/cgifaq.htmlhttp://hoohoo.ncsa.uiuc.edu/cgi/interface.htmlhttp://www.cs.tut.fi/~jkorpela/forms/viewdata.chttp://www.cs.tut.fi/~jkorpela/forms/viewdata.c
  • 8/8/2019 cgi programming with c

    7/7

    There is a lot of material, including introductions and tutorials, in the CGI Resource Index.Notice in particular the section Programs and Scripts: C and C++: Libraries and Classes,which contains libraries that can make it easier to process form data. It can be instructiveto parse simple data format by using code of your own, as was done in the simpleexamples above, but in practical application a library routine might be better.

    The C language was originally designed for an environment where only ASCII characterswere used. Nowadays, it can be usedwith cautionfor processing 8-bit characters.There are various ways to overcome the limitation that in C implementations, a characteris generally an 8-bit quantity. See especially the last section in my book Unicode

    Explained.

    Date of last modification: 2010-06-16.This page belongs to division Web authoring and surfing, subdivision Formsin the free information siteIT and communicationby Jukka Yucca Korpela.

    http://www.web2pdfconvert.com/?ref=PDFhttp://www.cs.tut.fi/~jkorpela/personal.htmlhttp://www.cs.tut.fi/~jkorpela/indexen.htmlhttp://www.cs.tut.fi/~jkorpela/forms/index.htmlhttp://www.cs.tut.fi/~jkorpela/www.htmlhttp://www.cs.tut.fi/~jkorpela/iso8601.htmlhttp://www.oreilly.com/catalog/unicode/http://www.cgi-resources.com/Programs_and_Scripts/C_and_C++/Libraries_and_Classes/http://www.cgi-resources.com/http://www.cgi-resources.com/Documentation/CGI_Tutorials/http://www.cgi-resources.com/Documentation/Introduction_to_CGI/