CGI Common Gateway Interface _________________________________________________________________ Overview _________________________________________________________________ The Common Gateway Interface (CGI) is a standard for interfacing external applications with information servers, such as HTTP or Web servers. A plain HTML document that the Web daemon retrieves is static, which means it exists in a constant state: a text file that doesn't change. A CGI program, on the other hand, is executed in real-time, so that it can output dynamic information. For example, let's say that you wanted to "hook up" your Unix database to the World Wide Web, to allow people from all over the world to query it. Basically, you need to create a CGI program that the Web daemon will execute to transmit information to the database engine, and receive the results back again and display them to the client. This is an example of a gateway, and this is where CGI, currently version 1.1, got its origins. The database example is a simple idea, but most of the time rather difficult to implement. There really is no limit as to what you can hook up to the Web. The only thing you need to remember is that whatever your CGI program does, it should not take too long to process. Otherwise, the user will just be staring at their browser waiting for something to happen. _________________________________________________________________ Specifics _________________________________________________________________ Since a CGI program is executable, it is basically the equivalent of letting the world run a program on your system, which isn't the safest thing to do. Therefore, there are some security precautions that need to be implemented when it comes to using CGI programs. Probably the one that will affect the typical Web user the most is the fact that CGI programs need to reside in a special directory, so that the Web server knows to execute the program rather than just display it to the browser. This directory is usually under direct control of the webmaster, prohibiting the average user from creating CGI programs. There are other ways to allow access to CGI scripts, but it is up to your webmaster to set these up for you. At this point, you may want to contact them about the feasibility of allowing CGI access. If you have a version of the NCSA HTTPd server distribution, you will see a directory called /cgi-bin. This is the special directory mentioned above where all of your CGI programs currently reside. A CGI program can be written in any language that allows it to be executed on the system, such as: * C/C++ * Fortran * PERL * TCL * Any Unix shell * Visual Basic * AppleScript It just depends what you have available on your system. If you use a programming language like C or Fortran, you know that you must compile the program before it will run. If you look in the /cgi-src directory that came with the server distribution, you will find the source code for some of the CGI programs in the /cgi-bin directory. If, however, you use one of the scripting languages instead, such as PERL, TCL, or a Unix shell, the script itself only needs to reside in the /cgi-bin directory, since there is no associated source code. Many people prefer to write CGI scripts instead of programs, since they are easier to debug, modify, and maintain than a typical compiled program. _________________________________________________________________ [Back]Return to the o verview CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- The Common Gateway Interface _________________________________________________________________ The Common Gateway Interface, or CGI, is a standard for external gateway programs to interface with information servers such as HTTP servers. The current version is CGI/1.1. _________________________________________________________________ CGI Documentation If you have no idea what CGI is, you should read this introduction. Once you have a basic idea of what CGI is and what you can use it for, you should read this primer which will help you get started writing your own gateways. If you are interested in handling the output of HTML forms with your CGI program, you will want to read this guide to handling forms with CGI programs. Security is a crucial issue when writing CGI programs. Please read these tips on how to write CGI programs which do not allow malicious users to abuse them. When you get more advanced, you should read the interface specification which will help you utilize CGI to the fullest extent. If you are a server software author, it will help you add CGI compliance to your information server. There is now also a tutorial for writing ErrorDocument handling CGI scripts. _________________________________________________________________ Examples of CGI behavior and programs You may wish to look at this page of examples which demonstrate how the client URL affects the interface variables. We have created an archive of CGI programs on our FTP server. These programs were written by various people around the world in a variety of programming languages. Some of the entries are libraries which may make writing your CGI program easier. You can now search the CGI documentation contained herein. Click here to search now. If you would like to submit one of your CGI programs to the archive, you should first package it with any documentation, copyright notices, etc. Then, upload it to hoohoo.ncsa.uiuc.edu into the directory /incoming/cgi and send mail to cgi@ncsa.uiuc.edu with a short description of what the file is. _________________________________________________________________ CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- The Common Gateway Interface After reading this document, you should have an overall idea of what a CGI program needs to do to function. _________________________________________________________________ How do I get information from the server? Each time a client requests the URL corresponding to your CGI program, the server will execute it in real-time. The output of your program will go more or less directly to the client. A common misconception about CGI is that you can send command-line options and arguments to your program, such as command% myprog -qa blorf CGI uses the command line for other purposes and thus this is not directly possible. Instead, CGI uses environment variables to send your program its parameters. The two major environment variables you will use for this purpose are: * QUERY_STRING QUERY_STRING is defined as anything which follows the first ? in the URL. This information could be added either by an ISINDEX document, or by an HTML form (with the GET action). It could also be manually embedded in an HTML anchor which references your gateway. This string will usually be an information query, i.e. what the user wants to search for in the archie databases, or perhaps the encoded results of your feedback GET form. This string is encoded in the standard URL format of changing spaces to +, and encoding special characters with %xx hexadecimal encoding. You will need to decode it in order to use it. If your gateway is not decoding results from a FORM, you will also get the query string decoded for you onto the command line. This means that each word of the query string will be in a different section of ARGV. For example, the query string "forms rule" would be given to your program with argv="forms" and argv="rule". If you choose to use this, you do not need to do any processing on the data before using it. * PATH_INFO CGI allows for extra information to be embedded in the URL for your gateway which can be used to transmit extra context-specific information to the scripts. This information is usually made available as "extra" information after the path of your gateway in the URL. This information is not encoded by the server in any way. The most useful example of PATH_INFO is transmitting file locations to the CGI program. To illustrate this, let's say I have a CGI program on my server called /cgi-bin/foobar that can process files residing in the DocumentRoot of the server. I need to be able to tell foobar which file to process. By including extra path information to the end of the URL, foobar will know the location of the document relative to the DocumentRoot via the PATH_INFO environment variable, or the actual path to the document via the PATH_TRANSLATED environment variable which the server generates for you. _________________________________________________________________ How do I send my document back to the client? I have found that the most common error in beginners' CGI programs is not properly formatting the output so the server can understand it. CGI programs can return a myriad of document types. They can send back an image to the client, and HTML document, a plaintext document, or perhaps even an audio clip. They can also return references to other documents. The client must know what kind of document you're sending it so it can present it accordingly. In order for the client to know this, your CGI program must tell the server what type of document it is returning. In order to tell the server what kind of document you are sending back, whether it be a full document or a reference to one, CGI requires you to place a short header on your output. This header is ASCII text, consisting of lines separated by either linefeeds or carriage returns (or both) followed by a single blank line. The output body then follows in whatever native format. * A full document with a corresponding MIME type In this case, you must tell the server what kind of document you will be outputting via a MIME type. Common MIME types are things such as text/html for HTML, and text/plain for straight ASCII text. For example, to send back HTML to the client, your output should read: Content-type: text/html output of HTML from CGI script

Sample output

What do you think of this? * A reference to another document Instead of outputting the document, you can just tell the browser where to get the new one, or have the server automatically output the new one for you. For example, say you want to reference a file on your Gopher server. In this case, you should know the full URL of what you want to reference and output something like: Content-type: text/html Location: gopher://httprules.foobar.org/0 Sorry...it moved

Go to gopher instead

Now available at a new location on our gopher server. However, today's browsers are smart enough to automatically throw you to the new document, without ever seeing the above since. If you get lazy and don't want to output the above HTML, NCSA HTTPd will output a default one for you to support older browsers. If you want to reference another file (not protected by access authentication) on your own server, you don't have to do nearly as much work. Just output a partial (virtual) URL, such as the following: Location: /dir1/dir2/myfile.html The server will act as if the client had not requested your script, but instead requested http://yourserver/dir1/dir2/myfile.html. It will take care of most everything, such as looking up the file type and sending the appropriate headers. Just be sure that you output the second blank line. If you do want to reference a document that is protected by access authentication, you will need to have a full URL in the Location:, since the client and the server need to re-transact to establish that you access to the referenced document. Advanced usage: If you would like to output headers such as Expires or Content-encoding, you can if your server is compatible with CGI/1.1. Just output them along with Location or Content-type and they will be sent back to the client. _________________________________________________________________ [Back]Return to the overview CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- The CGI Specification _________________________________________________________________ This is the specification for CGI version 1.1, or CGI/1.1. Further revisions of this protocol are guaranteed to be backward compatible. The server and the CGI script communicate in four major ways. Each of the following is a hotlink to graphic detail. * Environment variables * The command line * Standard input * Standard output _________________________________________________________________ [Back]Return to the overview CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- Configuration _________________________________________________________________ These rules apply to all of HTTPd's configuration files. * Case insensitive Except where pathnames are involved, these files are not case sensitive. * Comment lines begin with # Lines which should be ignored begin with #, the hash sign. This must be the first character on the line. Comments must be on a line by themselves. * One directive per line Each line of these files consists of: Directive data [data2 ... datan] Directive is a keyword HTTPd recognizes, followed by whitespace. data is specific to the directive. Any additional data entries should be separated by whitespace. * Extra whitespace is ignored You can put extra spaces or tabs between Directive and data. To embed a space in data without separating it from any subsequent arguments use a \ character before the space. ---------- CGI Command line options _________________________________________________________________ Specification The command line is only used in the case of an ISINDEX query. It is not used in the case of an HTML form or any as yet undefined query type. The server should search the query information (the QUERY_STRING environment variable) for a non-encoded = character to determine if the command line is to be used, if it finds one, the command line is not to be used. This trusts the clients to encode the = sign in ISINDEX queries, a practice which was considered safe at the time of the design of this specification. For example, use the finger script and the ISINDEX interface to look up "httpd". You will see that the script will call itself with /cgi-bin/finger?httpd and will actually execute "finger httpd" on the command line and output the results to you. If the server does find a "=" in the QUERY_STRING, then the command line will not be used, and no decoding will be performed. The query then remains intact for processing by an appropriate FORM submission decoder. Again, as an example, use this hyperlink to submit "httpd=name" to the finger script. Since this QUERY_STRING contained an unencoded "=", nothing was decoded, the script didn't know it was being submitted a valid query, and just gave you the default finger form. If the server finds that it cannot send the string due to internal limitations (such as exec() or /bin/sh command line restrictions) the server should include NO command line information and provide the non-decoded query information in the environment variable QUERY_STRING. _________________________________________________________________ Examples Examples of the command line usage are much better demonstrated than explained. For these examples, pay close attention to the script output which says what argc and argv are. _________________________________________________________________ [Back]Return to the interface specification CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- CGI Environment Variables _________________________________________________________________ In order to pass data about the information request from the server to the script, the server uses command line arguments as well as environment variables. These environment variables are set when the server executes the gateway program. _________________________________________________________________ Specification The following environment variables are not request-specific and are set for all requests: * SERVER_SOFTWARE The name and version of the information server software answering the request (and running the gateway). Format: name/version * SERVER_NAME The server's hostname, DNS alias, or IP address as it would appear in self-referencing URLs. * GATEWAY_INTERFACE The revision of the CGI specification to which this server complies. Format: CGI/revision _________________________________________________________________ The following environment variables are specific to the request being fulfilled by the gateway program: * SERVER_PROTOCOL The name and revision of the information protcol this request came in with. Format: protocol/revision * SERVER_PORT The port number to which the request was sent. * REQUEST_METHOD The method with which the request was made. For HTTP, this is "GET", "HEAD", "POST", etc. * PATH_INFO The extra path information, as given by the client. In other words, scripts can be accessed by their virtual pathname, followed by extra information at the end of this path. The extra information is sent as PATH_INFO. This information should be decoded by the server if it comes from a URL before it is passed to the CGI script. * PATH_TRANSLATED The server provides a translated version of PATH_INFO, which takes the path and does any virtual-to-physical mapping to it. * SCRIPT_NAME A virtual path to the script being executed, used for self-referencing URLs. * QUERY_STRING The information which follows the ? in the URL which referenced this script. This is the query information. It should not be decoded in any fashion. This variable should always be set when there is query information, regardless of command line decoding. * REMOTE_HOST The hostname making the request. If the server does not have this information, it should set REMOTE_ADDR and leave this unset. * REMOTE_ADDR The IP address of the remote host making the request. * AUTH_TYPE If the server supports user authentication, and the script is protects, this is the protocol-specific authentication method used to validate the user. * REMOTE_USER If the server supports user authentication, and the script is protected, this is the username they have authenticated as. * REMOTE_IDENT If the HTTP server supports RFC 931 identification, then this variable will be set to the remote user name retrieved from the server. Usage of this variable should be limited to logging only. * CONTENT_TYPE For queries which have attached information, such as HTTP POST and PUT, this is the content type of the data. * CONTENT_LENGTH The length of the said content as given by the client. _________________________________________________________________ In addition to these, the header lines received from the client, if any, are placed into the environment with the prefix HTTP_ followed by the header name. Any - characters in the header name are changed to _ characters. The server may exclude any headers which it has already processed, such as Authorization, Content-type, and Content-length. If necessary, the server may choose to exclude any or all of these headers if including them would exceed any system environment limits. An example of this is the HTTP_ACCEPT variable which was defined in CGI/1.0. Another example is the header User-Agent. * HTTP_ACCEPT The MIME types which the client will accept, as given by HTTP headers. Other protocols may need to get this information from elsewhere. Each item in this list should be separated by commas as per the HTTP spec. Format: type/subtype, type/subtype * HTTP_USER_AGENT The browser the client is using to send the request. General format: software/version library/version. _________________________________________________________________ Examples Examples of the setting of environment variables are really much better demonstrated than explained. _________________________________________________________________ [Back]Return to the interface specification CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- CGI Script Input _________________________________________________________________ Specification For requests which have information attached after the header, such as HTTP POST or PUT, the information will be sent to the script on stdin. The server will send CONTENT_LENGTH bytes on this file descriptor. Remember that it will give the CONTENT_TYPE of the data as well. The server is in no way obligated to send end-of-file after the script reads CONTENT_LENGTH bytes. _________________________________________________________________ Example Let's take a form with METHOD="POST" as an example. Let's say the form results are 7 bytes encoded, and look like a=b&b=c. In this case, the server will set CONTENT_LENGTH to 7 and CONTENT_TYPE to application/x-www-form-urlencoded. The first byte on the script's standard input will be "a", followed by the rest of the encoded string. _________________________________________________________________ [Back]Return to the interface specification CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- CGI Script Output _________________________________________________________________ Script output The script sends its output to stdout. This output can either be a document generated by the script, or instructions to the server for retrieving the desired output. _________________________________________________________________ Script naming conventions Normally, scripts produce output which is interpreted and sent back to the client. An advantage of this is that the scripts do not need to send a full HTTP/1.0 header for every request. Some scripts may want to avoid the extra overhead of the server parsing their output, and talk directly to the client. In order to distinguish these scripts from the other scripts, CGI requires that the script name begins with nph- if a script does not want the server to parse its header. In this case, it is the script's responsibility to return a valid HTTP/1.0 (or HTTP/0.9) response to the client. _________________________________________________________________ Parsed headers The output of scripts begins with a small header. This header consists of text lines, in the same format as an HTTP header, terminated by a blank line (a line with only a linefeed or CR/LF). Any headers which are not server directives are sent directly back to the client. Currently, this specification defines three server directives: * Content-type This is the MIME type of the document you are returning. * Location This is used to specify to the server that you are returning a reference to a document rather than an actual document. If the argument to this is a URL, the server will issue a redirect to the client. If the argument to this is a virtual path, the server will retrieve the document specified as if the client had requested that document originally. ? directives will work in here, but # directives must be redirected back to the client. * Status This is used to give the server an HTTP/1.0 status line to send to the client. The format is nnn xxxxx, where nnn is the 3-digit status code, and xxxxx is the reason string, such as "Forbidden". _________________________________________________________________ Examples Let's say I have a fromgratz to HTML converter. When my converter is finished with its work, it will output the following on stdout (note that the lines beginning and ending with --- are just for illustration and would not be output): --- start of output --- Content-type: text/html --- end of output --- Note the blank line after Content-type. Now, let's say I have a script which, in certain instances, wants to return the document /path/doc.txt from this server just as if the user had actually requested http://server:port/path/doc.txt to begin with. In this case, the script would output: --- start of output --- Location: /path/doc.txt --- end of output --- The server would then perform the request and send it to the client. Let's say that I have a script which wants to reference our gopher server. In this case, if the script wanted to refer the user to gopher://gopher.ncsa.uiuc.edu/, it would output: --- start of output --- Location: gopher://gopher.ncsa.uiuc.edu/ --- end of output --- Finally, I have a script which wants to talk to the client directly. In this case, if the script is referenced with SERVER_PROTOCOL of HTTP/1.0, the script would output the following HTTP/1.0 response: --- start of output --- HTTP/1.0 200 OK Server: NCSA/1.0a6 Content-type: text/plain This is a plaintext document generated on the fly just for you. --- end of output --- _________________________________________________________________ [Back]Return to the interface specification CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- An error script for NCSA HTTPd 1.4 Error scripts have extra environment variables passed to them, in addition to all of the CGI 1.1 variables. These are: REDIRECT_REQUEST This is the request as sent exactly to the server. REDIRECT_URL This is the requested URL that caused the error. REDIRECT_STATUS This is the status number and message that NCSA HTTPd would have sent if it would have been allowed to reply. In addition, NCSA HTTPd passes as the QUERY_STRING error string that NCSA HTTPd generated as err_string=error_message Some error messages might require headers beyond which are in the CGI specification. For that reason, the following example is an nph (non-parsed headers) script. The code, in Perl. Also available for downloading. _________________________________________________________________ #!/usr/local/bin/perl # Non-parsed headers CGI 1.1 error script in Perl to handle error requests # from NCSA HTTPd 1.4 via ErrorDocument. This should handle all errors in # almost the same fashion as NCSA HTTPd 1.4 would internally. # # This script is in the Public Domain. NCSA and the author offer no # guaruntee's nor claim any responsibility for it. That's as pseudo-legalise # as I get. # # This script doesn't do any encryption or authentication, nor does it # contain hooks to do so. # # This was written for Perl 4.016. I've heard rumours about it working with # other versions, but I'm no Perl hacker, so how would I know? # # Brandon Long / NCSA HTTPd Development Team / Software Development Group # National Center for Supercomputing Applications / University of Illinios # # For more information: # NCSA HTTPd : http://hoohoo.ncsa.uiuc.edu/docs/ # CGI 1.1 : http://hoohoo.ncsa.uiuc.edu/cgi/ # ErrorDocument : http://hoohoo.ncsa.uiuc.edu/docs/setup/srm/ErrorDocument.h tml # Example CGI : http://hoohoo.ncsa.uiuc.edu/cgi/ErrorCGI.html # $error = $ENV{'QUERY_STRING'}; $redirect_request = $ENV{'REDIRECT_REQUEST'}; ($redirect_method,$request_url,$redirect_protocal) = split(' ',$redirect_reques t); $redirect_status = $ENV{'REDIRECT_STATUS'}; if (!defined($redirect_status)) { $redirect_status = "200 Ok"; } ($redirect_number,$redirect_message) = split(' ',$redirect_status); $error =~ s/error=//; $title = "".$redirect_status.""; if ($redirect_method eq "HEAD") { $head_only = 1; } else { $head_only = 0; } printf("%s %s\r\n",$ENV{'SERVER_PROTOCOL'},$redirect_status); printf("Server: %s\r\n",$ENV{'SERVER_SOFTWARE'}); printf("Content-type: text/html\r\n"); $redirect_status = "\"\"".$redirect_status; if ($redirect_number == 302) { if ($error !~ /http:/) { printf("xLocation: http://%s:%s%s\r\n", $ENV{'SERVER_NAME'}, $ENV{'SERVER_PORT'}, $error); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("This document has moved"); printf("here.\r\n", $ENV{'SERVER_NAME'}, $ENV{'SERVER_PORT'}, $error); } } else { printf("Location: %s\r\n",$error); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("This document has moved"); printf("here.\r\n",$error); } } } elsif ($redirect_number == 400) { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("Your client sent a request that this server didn't"); printf(" understand.
Reason: %s\r\n",$error); } } elsif ($redirect_number == 401) { printf("WWW-Authenticate: %s\r\n",$error); printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("Browser not authentication-capable or "); printf("authentication failed.\r\n"); } } elsif ($redirect_number == 403) { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("Your client does not have permission to get"); printf("URL:%s from this server.\r\n",$ENV{'REDIRECT_URL'}); } } elsif ($redirect_number == 404) { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("The requested URL:%s ", $ENV{'REDIRECT_URL'}); printf("was not found on this server.\r\n"); } } elsif ($redirect_number == 500) { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("The server encountered an internal error or "); printf("misconfiguration and was unable to complete your "); printf("request \"%s\"\r\n",$redirect_request); } } elsif ($redirect_number == 501) { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); printf("The server is unable to perform the method "); printf("%s at this time.",$redirect_method); } } else { printf("\r\n"); if (!$head_only) { printf("%s\r\n",$title); printf("

%s

\r\n",$redirect_status); } } if (!$head_only) { printf("

The following might be useful in determining the problem:"); printf("

\r\n");
        open(ENV,"env|");
        while () {
                printf("$_");
        }
        close(ENV);
        printf("
\r\n
"); printf("\"[Back Back to Root of Server\r\n", $ENV{'SERVER_NAME'},$ENV{'SERVER_PORT'}); printf("
webmaster\@%s / ", $ENV{'SERVER_NAME'},$ENV{'SERVER_NAME'}); printf("httpd\@ncsa.uiuc.edu "); printf("\r\n"); } _________________________________________________________________ [Back]The ErrorDocument Directive [Back] Other CGI examples _________________________________________________________________ NCSA HTTPd Development Team / cgi@ncsa.uiuc.edu / Last Modified 6-28-95 ---------- ErrorDocument directive Purpose The ErrorDocument directive points the server to a file to send in place of the builtin error message. __________________________________________________________ Syntax ErrorDocument type filename Where type is one of: + 302 - REDIRECT + 400 - BAD_REQUEST + 401 - AUTH_REQUIRED + 403 - FORBIDDEN + 404 - NOT_FOUND + 500 - SERVER_ERROR + 501 - NOT_IMPLEMENTED And filename is a CGI script or text/html file with full path from document root. CGI scripts launched via these errors have 3 new environment variables, REDIRECT_REQUEST, REDIRECT_URL and REDIRECT_STATUS. They also take as input the error reason of the form err_string=error_reason. For an example script, see an example. __________________________________________________________ File srm.conf __________________________________________________________ Default If this directive is left out, the compiled error messages will be used. __________________________________________________________ Examples ErrorDocument 403 /cgi-bin/notallowed.cgi ErrorDocument 404 /cgi-bin/nph-error.pl ErrorDocument 500 /serverError.html ErrorDocument 501 /error/notImplemented.html For more information on Error scripts _________________________________________________________________ [Back] Return to Resource Configuration File Overview _________________________________________________________________ NCSA HTTPd Development Team / httpd@ncsa.uiuc.edu / Last Modified 7-12-95 ---------- Decoding FORMs with CGI If you are unfamiliar with forms or how to write them, we suggest you look at this guide to fill-out forms. They're just plain HTML, and pretty easy to do. Decoding them is another story... _________________________________________________________________ Where do I get the form data from? As you now know, there are two methods which can be used to access your forms. These methods are GET and POST. Depending on which method you used, you will receive the encoded results of the form in a different way. * The GET method If your form has METHOD="GET" in its FORM tag, your CGI program will receive the encoded form input in the environment variable QUERY_STRING. * The POST method If your form has METHOD="POST" in its FORM tag, your CGI program will receive the encoded form input on stdin. The server will NOT send you an EOF on the end of the data, instead you should use the environment variable CONTENT_LENGTH to determine how much data you should read from stdin. _________________________________________________________________ But what does it all mean? How do I decode the form data? When you write a form, each of your input items has a NAME tag. When the user places data in these items in the form, that information is encoded into the form data. The value each of the input items is given by the user is called the value. Form data is a stream of name=value pairs separated by the & character. Each name=value pair is URL encoded, i.e. spaces are changed into plusses and some characters are encoded into hexadecimal. Because others have been presented with this problem as well, there are already a number of programs which will do this decoding for you. The following are links into the CGI archive, clicking on them will retrieve the software package being referred to. * The Bourne Shell: The AA archie gateway. Contains calls to sed and awk which convert a GET form data string into separate environment variables. * C: The default scripts for NCSA httpd. While I won't win any awards for verbosity in documenting my code, there are C routines and example programs you can use to translate the query string into a group of structures. * PERL: The PERL CGI-lib. This package contains a group of useful PERL routines to decode forms. * PERL5: CGI.pm A perl5 library for handling forms in CGI scripts. With just a handful of calls, you can parse CGI queries, create forms, and maintain the state of the buttons on the form from invocation to invocation. * TCL: TCL argument processor. This is a set of TCL routines to retrieve form data and place it into TCL variables. The basic procedure is to split the data by the ampersands. Then, for each name=value pair you get for this, you should URL decode the name, and then the value, and then do what you like with them. _________________________________________________________________ [Back]Return to the overview CGI - Common Gateway Interface cgi@ncsa.uiuc.edu ---------- Writing secure CGI scripts Any time that a program is interacting with a networked client, there is the possibility of that client attacking the program to gain unauthorized access. Even the most innocent looking script can be very dangerous to the integrity of your system. With that in mind, we would like to present a few guidelines to making sure your program does not come under attack. _________________________________________________________________ * Beware the eval statement Languages like PERL and the Bourne shell provide an eval command which allow you to construct a string and have the interpreter execute that string. This can be very dangerous. Observe the following statement in the Bourne shell: eval `echo $QUERY_STRING | awk 'BEGIN{RS="&"} {printf "QS_%s\n",$1}' ` This clever little snippet takes the query string, and convents it into a set of variable set commands. Unfortunately, this script can be attacked by sending it a query string which starts with a ;. See what I mean about innocent-looking scripts being dangerous? * Do not trust the client to do anything A well-behaved client will escape any characters which have special meaning to the Bourne shell in a query string and thus avoid problems with your script misinterpreting the characters. A mischevious client may use special characters to confuse your script and gain unauthorized access. * Be careful with popen and system. If you use any data from the client to construct a command line for a call to popen() or system(), be sure to place backslashes before any characters that have special meaning to the Bourne shell before calling the function. This can be achieved easily with a short C function. * Turn off server-side includes If your server is unfortunate enough to support server-side includes, turn them off for your script directories!!!. The server-side includes can be abused by clients which prey on scripts which directly output things they have been sent. For a more comprehensive summary of security and the World-Wide Web, see the WWW Security FAQ. _________________________________________________________________ [Back]Return to the overview CGI - Common Gateway Interface cgi@ncsa.uiuc.edu Client Side State - HTTP Cookies PERSISTENT CLIENT STATE HTTP COOKIES Preliminary Specification - Use with caution INTRODUCTION Cookies are a general mechanism which server side connections (such as CGI scripts) can use to both store and retrieve information on the client side of the connection. The addition of a simple, persistent, client-side state significantly extends the capabilities of Web-based client/server applications. OVERVIEW A server, when returning an HTTP object to a client, may also send a piece of state information which the client will store. Included in that state object is a description of the range of URLs for which that state is valid. Any future HTTP requests made by the client which fall in that range will include a transmittal of the current value of the state object from the client back to the server. The state object is called a COOKIE, for no compelling reason. This simple mechanism provides a powerful new tool which enables a host of new types of applications to be written for web- based environments. Shopping applications can now store information about the currently selected items, for fee services can send back registration information and free the client from retyping a user-id on next connection, sites can store per-user preferences on the client, and have the client supply those preferences every time that site is connected to. SPECIFICATION A cookie is introduced to the client by including a SET-COOKIE header as part of an HTTP response, typically this will be generated by a CGI script. Syntax of the Set-Cookie HTTP Response Header This is the format a CGI script would use to add to the HTTP headers a new piece of data which is to be stored by the client for later retrieval. -=-=-=-=-=-=-=-=-=- Set-Cookie: NAME=VALUE; expires=DATE; path=PATH; domain=DOMAIN_NAME; secure -=-=-=-=-=-=-=-=-=- NAME=VALUE ÿThis string is a sequence of characters excluding semi-colon, ÿcomma and white space. If there is a need to place such ÿdata in the name or value, some encoding method such as ÿURL style %XX encoding is recommended, though no encoding ÿis defined or required. This is the only required attribute on the SET-COOKIE header. EXPIRES=DATE ÿThe EXPIRES attribute specifies a date string that defines ÿthe valid life time of that cookie. Once the expiration ÿdate has been reached, the cookie will no longer be stored ÿor given out. The date string is formatted as: -=-=-=-=-=-=-=-=-=- ÿÿÿ Wdy, DD-Mon-YY HH:MM:SS GMT -=-=-=-=-=-=-=-=-=- This is based on RFC 850, RFC 1036, and RFC 822, with the variations that the only legal time zone is GMT and the separators between the elements of the date must be dashes. EXPIRES is an optional attribute. If not specified, the cookie will expire when the user's session ends. NOTE: There is a bug in Netscape Navigator version 1.1 and earlier. Only cookies whose PATH attribute is set explicitly to "/" will be properly saved between sessions if they have an EXPIRES attribute. DOMAIN=DOMAIN_NAME ÿWhen searching the cookie list for valid cookies, a comparison ÿof the DOMAIN attributes of the cookie is made with the ÿInternet domain name of the host from which the URL will ÿbe fetched. If there is a tail match, then the cookie ÿwill go through PATH matching to see if it should be sent. ÿ"Tail matching" means that DOMAIN attribute is matched ÿagainst the tail of the fully qualified domain name of ÿthe host. A DOMAIN attribute of "acme.com" would match ÿhost names "anvil.acme.com" as well as "shipping.crate.acme.com". ÿ Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT". The default value of DOMAIN is the host name of the server which generated the cookie response. PATH=PATH ÿThe PATH attribute is used to specify the subset of URLs in ÿa domain for which the cookie is valid. If a cookie has ÿalready passed DOMAIN matching, then the pathname component ÿof the URL is compared with the path attribute, and if ÿthere is a match, the cookie is considered valid and is ÿsent along with the URL request. The path "/foo" would ÿmatch "/foobar" and "/foo/bar.html". The path "/" is the ÿmost general path. If the PATH is not specified, it as assumed to be the same path as the document being described by the header which contains the cookie. SECURE ÿIf a cookie is marked SECURE, it will only be transmitted ÿif the communications channel with the host is a secure ÿone. Currently this means that secure cookies will only ÿbe sent to HTTPS (HTTP over SSL) servers. If SECURE is not specified, a cookie is considered safe to be sent in the clear over unsecured channels. Syntax of the Cookie HTTP Request Header When requesting a URL from an HTTP server, the browser will match the URL against all cookies and if any of them match, a line containing the name/value pairs of all matching cookies will be included in the HTTP request. Here is the format of that line: -=-=-=-=-=-=-=-=-=- Cookie: NAME1=OPAQUE_STRING1; NAME2=OPAQUE_STRING2 ... -=-=-=-=-=-=-=-=-=- Additional Notes ÿMultiple SET-COOKIE headers can be issued in a single server ÿresponse. ÿInstances of the same path and name will overwrite each ÿother, with the latest instance taking precedence. Instances ÿof the same path but different names will add additional ÿmappings. ÿSetting the path to a higher-level value does not override ÿother more specific path mappings. If there are multiple ÿmatches for a given cookie name, but with separate paths, ÿall the matching cookies will be sent. (See examples below.) ÿ ÿThe expires header lets the client know when it is safe ÿto purge the mapping but the client is not required to ÿdo so. A client may also delete a cookie before it's expiration ÿdate arrives if the number of cookies exceeds its internal ÿlimits. ÿWhen sending cookies to a server, all cookies with a more ÿspecific path mapping should be sent before cookies with ÿless specific path mappings. For example, a cookie "name1= ÿfoo" with a path mapping of "/" should be sent after a ÿcookie "name1=foo2" with a path mapping of "/bar" if they ÿare both to be sent. ÿThere are limitations on the number of cookies that a client ÿcan store at any one time. This is a specification of ÿthe minimum number of cookies that a client should be prepared ÿto receive and store. ÿÿÿ ÿÿÿ300 total cookies ÿÿÿ4 kilobytes per cookie, where the name and the OPAQUE_STRING ÿÿÿ combine to form the 4 kilobyte limit. ÿÿÿ ÿÿÿ20 cookies per server or domain. (note that completely ÿÿÿ specified hosts and domains are treated ÿÿÿas separate entities and have a 20 cookie ÿÿÿlimitation for each, not combined) Servers should not expect clients to be able to exceed these ÿlimits. When the 300 cookie limit or the 20 cookie per ÿserver limit is exceeded, clients should delete the least ÿrecently used cookie. When a cookie larger than 4 kilobytes ÿis encountered the cookie should be trimmed to fit, but ÿthe name should remain intact as long as it is less than ÿ4 kilobytes. ÿIf a CGI script wishes to delete a cookie, it can do so ÿby returning a cookie with the same name, and an EXPIRES ÿtime which is in the past. The path and name must match ÿexactly in order for the expiring cookie to replace the ÿvalid cookie. This requirement makes it difficult for anyone ÿbut the originator of a cookie to delete a cookie. ÿWhen caching HTTP, as a proxy server might do, the SET-COOKIE ÿresponse header should never be cached. ÿIf a proxy server receives a response which contains a SET- ÿCOOKIE header, it should propagate the SET-COOKIE header ÿto the client, regardless of whether the response was 304 ÿ(Not Modified) or 200 (OK). Similarly, if a client request contains a Cookie: header, it should be forwarded through a proxy, even if the conditional If-modified-since request is being made. EXAMPLES Here are some sample exchanges which are designed to illustrate the use of cookies. First Example transaction sequence: Client requests a document, and receives in the response: ÿ -=-=-=-=-=-=-=-=-=- Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 23:12:40 GMT -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: CUSTOMER=WILE_E_COYOTE -=-=-=-=-=-=-=-=-=- Client requests a document, and receives in the response: ÿ -=-=-=-=-=-=-=-=-=- Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/ -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001 -=-=-=-=-=-=-=-=-=- Client receives: ÿ -=-=-=-=-=-=-=-=-=- Set-Cookie: SHIPPING=FEDEX; path=/foo -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001 -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/foo" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001; SHIPPING=FEDEX -=-=-=-=-=-=-=-=-=- Second Example transaction sequence: Assume all mappings from above have been cleared. Client receives: ÿ -=-=-=-=-=-=-=-=-=- Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/ -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001 -=-=-=-=-=-=-=-=-=- Client receives: ÿ -=-=-=-=-=-=-=-=-=- Set-Cookie: PART_NUMBER=RIDING_ROCKET_0023; path=/ammo -=-=-=-=-=-=-=-=-=- When client requests a URL in path "/ammo" on this server, it sends: ÿ -=-=-=-=-=-=-=-=-=- Cookie: PART_NUMBER=RIDING_ROCKET_0023; PART_NUMBER=ROCKET_LAUNCHER_0001 -=-=-=-=-=-=-=-=-=- ÿNOTE: There are two name/value pairs named "PART_NUMBER" due ÿto the inheritance of the "/" mapping in addition to the ÿ"/ammo" mapping. Corporate Sales: 415/937-2555; Personal Sales: 415/937-3777; Federal Sales: 415/937-3678 If you have any questions, please visit Customer Service. Copyright c 1996 Netscape Communications Corporation