CGI Common Gateway Interface
     _________________________________________________________________

Overview
     _________________________________________________________________

   The Common Gateway Interface (CGI) is a standard for interfacing
   external applications with information servers, such as HTTP or Web
   servers. A plain HTML document that the Web daemon retrieves is
   static, which means it exists in a constant state: a text file that
   doesn't change. A CGI program, on the other hand, is executed in
   real-time, so that it can output dynamic information.

   For example, let's say that you wanted to "hook up" your Unix database
   to the World Wide Web, to allow people from all over the world to
   query it. Basically, you need to create a CGI program that the Web
   daemon will execute to transmit information to the database engine,
   and receive the results back again and display them to the client.
   This is an example of a gateway, and this is where CGI, currently
   version 1.1, got its origins.

   The database example is a simple idea, but most of the time rather
   difficult to implement. There really is no limit as to what you can
   hook up to the Web. The only thing you need to remember is that
   whatever your CGI program does, it should not take too long to
   process. Otherwise, the user will just be staring at their browser
   waiting for something to happen.
     _________________________________________________________________

Specifics
     _________________________________________________________________

   Since a CGI program is executable, it is basically the equivalent of
   letting the world run a program on your system, which isn't the safest
   thing to do. Therefore, there are some security precautions that need
   to be implemented when it comes to using CGI programs. Probably the
   one that will affect the typical Web user the most is the fact that
   CGI programs need to reside in a special directory, so that the Web
   server knows to execute the program rather than just display it to the
   browser. This directory is usually under direct control of the
   webmaster, prohibiting the average user from creating CGI programs.
   There are other ways to allow access to CGI scripts, but it is up to
   your webmaster to set these up for you. At this point, you may want to
   contact them about the feasibility of allowing CGI access.

   If you have a version of the NCSA HTTPd server distribution, you will
   see a directory called /cgi-bin. This is the special directory
   mentioned above where all of your CGI programs currently reside. A CGI
   program can be written in any language that allows it to be executed
   on the system, such as:
     * C/C++
     * Fortran
     * PERL
     * TCL
     * Any Unix shell
     * Visual Basic
     * AppleScript

   It just depends what you have available on your system. If you use a
   programming language like C or Fortran, you know that you must compile
   the program before it will run. If you look in the /cgi-src directory
   that came with the server distribution, you will find the source code
   for some of the CGI programs in the /cgi-bin directory. If, however,
   you use one of the scripting languages instead, such as PERL, TCL, or
   a Unix shell, the script itself only needs to reside in the /cgi-bin
   directory, since there is no associated source code. Many people
   prefer to write CGI scripts instead of programs, since they are easier
   to debug, modify, and maintain than a typical compiled program.
     _________________________________________________________________

   [Back]Return to the o verview

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                         The Common Gateway Interface
     _________________________________________________________________

   The Common Gateway Interface, or CGI, is a standard for external
   gateway programs to interface with information servers such as HTTP
   servers.

   The current version is CGI/1.1.
     _________________________________________________________________

CGI Documentation

   If you have no idea what CGI is, you should read this introduction.

   Once you have a basic idea of what CGI is and what you can use it for,
   you should read this primer which will help you get started writing
   your own gateways.

   If you are interested in handling the output of HTML forms with
   your CGI program, you will want to read this guide to handling
   forms with CGI programs.

   Security is a crucial issue when writing CGI programs. Please read
   these tips on how to write CGI programs which do not allow
   malicious users to abuse them.

   When you get more advanced, you should read the interface
   specification which will help you utilize CGI to the fullest extent.
   If you are a server software author, it will help you add CGI
   compliance to your information server.

   There is now also a tutorial for writing ErrorDocument handling
   CGI scripts.
     _________________________________________________________________

Examples of CGI behavior and programs

   You may wish to look at this page of examples which demonstrate how
   the client URL affects the interface variables.

   We have created an archive of CGI programs on our FTP server.
   These programs were written by various people around the world in a
   variety of programming languages. Some of the entries are libraries
   which may make writing your CGI program easier.

   You can now search the CGI documentation contained herein. Click
   here to search now.

   If you would like to submit one of your CGI programs to the archive,
   you should first package it with any documentation, copyright notices,
   etc. Then, upload it to hoohoo.ncsa.uiuc.edu into the directory
   /incoming/cgi and send mail to cgi@ncsa.uiuc.edu with a short
   description of what the file is.
     _________________________________________________________________

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                         The Common Gateway Interface

   After reading this document, you should have an overall idea of what a
   CGI program needs to do to function.
     _________________________________________________________________

How do I get information from the server?

   Each time a client requests the URL corresponding to your CGI
   program, the server will execute it in real-time. The output of your
   program will go more or less directly to the client.

   A common misconception about CGI is that you can send command-line
   options and arguments to your program, such as

     command% myprog -qa blorf

   CGI uses the command line for other purposes and thus this is not
   directly possible. Instead, CGI uses environment variables to send
   your program its parameters. The two major environment variables you
   will use for this purpose are:

     * QUERY_STRING
       QUERY_STRING is defined as anything which follows the first ? in
       the URL. This information could be added either by an ISINDEX
       document, or by an HTML form (with the GET action). It could
       also be manually embedded in an HTML anchor which references your
       gateway. This string will usually be an information query, i.e.
       what the user wants to search for in the archie databases, or
       perhaps the encoded results of your feedback GET form.
       This string is encoded in the standard URL format of changing
       spaces to +, and encoding special characters with %xx hexadecimal
       encoding. You will need to decode it in order to use it.
       If your gateway is not decoding results from a FORM, you will also
       get the query string decoded for you onto the command line. This
       means that each word of the query string will be in a different
       section of ARGV. For example, the query string "forms rule" would
       be given to your program with argv="forms" and argv="rule".
       If you choose to use this, you do not need to do any processing on
       the data before using it.
     * PATH_INFO
       CGI allows for extra information to be embedded in the URL for
       your gateway which can be used to transmit extra context-specific
       information to the scripts. This information is usually made
       available as "extra" information after the path of your gateway in
       the URL. This information is not encoded by the server in any way.
       The most useful example of PATH_INFO is transmitting file
       locations to the CGI program. To illustrate this, let's say I have
       a CGI program on my server called /cgi-bin/foobar that can process
       files residing in the DocumentRoot of the server. I need to be
       able to tell foobar which file to process. By including extra path
       information to the end of the URL, foobar will know the location
       of the document relative to the DocumentRoot via the PATH_INFO
       environment variable, or the actual path to the document via the
       PATH_TRANSLATED environment variable which the server generates
       for you.
     _________________________________________________________________

How do I send my document back to the client?

   I have found that the most common error in beginners' CGI programs is
   not properly formatting the output so the server can understand it.

   CGI programs can return a myriad of document types. They can send back
   an image to the client, and HTML document, a plaintext document, or
   perhaps even an audio clip. They can also return references to other
   documents. The client must know what kind of document you're sending
   it so it can present it accordingly. In order for the client to know
   this, your CGI program must tell the server what type of document it
   is returning.

   In order to tell the server what kind of document you are sending
   back, whether it be a full document or a reference to one, CGI
   requires you to place a short header on your output. This header is
   ASCII text, consisting of lines separated by either linefeeds or
   carriage returns (or both) followed by a single blank line. The output
   body then follows in whatever native format.

     * A full document with a corresponding MIME type
       In this case, you must tell the server what kind of document you
       will be outputting via a MIME type. Common MIME types are things
       such as text/html for HTML, and text/plain for straight ASCII
       text.
       For example, to send back HTML to the client, your output should
       read:

        Content-type: text/html

        <HTML><HEAD>
        <TITLE>output of HTML from CGI script</TITLE>
        </HEAD><BODY>
        <H1>Sample output</H1>
        What do you think of <STRONG>this?</STRONG>
        </BODY></HTML>
     * A reference to another document
       Instead of outputting the document, you can just tell the browser
       where to get the new one, or have the server automatically output
       the new one for you.
       For example, say you want to reference a file on your Gopher
       server. In this case, you should know the full URL of what you
       want to reference and output something like:

        Content-type: text/html
        Location: gopher://httprules.foobar.org/0

   <HTML><HEAD>
   <TITLE>Sorry...it moved</TITLE>
   </HEAD><BODY>
   <H1>Go to gopher instead</H1>
   Now available at
   <A HREF="gopher://httprules.foobar.org/0">a new location</A>
   on our gopher server.
   </BODY></HTML>
   However, today's browsers are smart enough to automatically throw you
       to the new document, without ever seeing the above since. If you
       get lazy and don't want to output the above HTML, NCSA HTTPd will
       output a default one for you to support older browsers.
       If you want to reference another file (not protected by access
       authentication) on your own server, you don't have to do nearly as
       much work. Just output a partial (virtual) URL, such as the
       following:

        Location: /dir1/dir2/myfile.html

   The server will act as if the client had not requested your script,
       but instead requested http://yourserver/dir1/dir2/myfile.html. It
       will take care of most everything, such as looking up the file
       type and sending the appropriate headers. Just be sure that you
       output the second blank line.
       If you do want to reference a document that is protected by access
       authentication, you will need to have a full URL in the Location:,
       since the client and the server need to re-transact to establish
       that you access to the referenced document.

   Advanced usage: If you would like to output headers such as Expires or
   Content-encoding, you can if your server is compatible with CGI/1.1.
   Just output them along with Location or Content-type and they will be
   sent back to the client.
     _________________________________________________________________

   [Back]Return to the overview

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                             The CGI Specification
     _________________________________________________________________

   This is the specification for CGI version 1.1, or CGI/1.1. Further
   revisions of this protocol are guaranteed to be backward compatible.

   The server and the CGI script communicate in four major ways. Each of
   the following is a hotlink to graphic detail.

     * Environment variables
     * The command line
     * Standard input
     * Standard output
     _________________________________________________________________

   [Back]Return to the overview

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                                 Configuration
     _________________________________________________________________

   These rules apply to all of HTTPd's configuration files.

     * Case insensitive
       Except where pathnames are involved, these files are not case
       sensitive.
     * Comment lines begin with #
       Lines which should be ignored begin with #, the hash sign. This
       must be the first character on the line. Comments must be on a
       line by themselves.
     * One directive per line
       Each line of these files consists of:
       Directive data [data2 ... datan]
       Directive is a keyword HTTPd recognizes, followed by whitespace.
       data is specific to the directive. Any additional data entries
       should be separated by whitespace.
     * Extra whitespace is ignored
       You can put extra spaces or tabs between Directive and data. To
       embed a space in data without separating it from any subsequent
       arguments use a \ character before the space.

----------
                           CGI Command line options
     _________________________________________________________________

Specification

   The command line is only used in the case of an ISINDEX query. It is
   not used in the case of an HTML form or any as yet undefined query
   type. The server should search the query information (the QUERY_STRING
   environment variable) for a non-encoded = character to determine if
   the command line is to be used, if it finds one, the command line is
   not to be used. This trusts the clients to encode the = sign in
   ISINDEX queries, a practice which was considered safe at the time of
   the design of this specification.

   For example, use the finger script and the ISINDEX interface to
   look up "httpd". You will see that the script will call itself with
   /cgi-bin/finger?httpd and will actually execute "finger httpd" on the
   command line and output the results to you.

   If the server does find a "=" in the QUERY_STRING, then the command
   line will not be used, and no decoding will be performed. The query
   then remains intact for processing by an appropriate FORM submission
   decoder. Again, as an example, use this hyperlink to submit
   "httpd=name" to the finger script. Since this QUERY_STRING contained
   an unencoded "=", nothing was decoded, the script didn't know it was
   being submitted a valid query, and just gave you the default finger
   form.

   If the server finds that it cannot send the string due to internal
   limitations (such as exec() or /bin/sh command line restrictions) the
   server should include NO command line information and provide the
   non-decoded query information in the environment variable
   QUERY_STRING.
     _________________________________________________________________

Examples

   Examples of the command line usage are much better demonstrated
   than explained. For these examples, pay close attention to the script
   output which says what argc and argv are.
     _________________________________________________________________

   [Back]Return to the interface specification

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                           CGI Environment Variables
     _________________________________________________________________

   In order to pass data about the information request from the server to
   the script, the server uses command line arguments as well as
   environment variables. These environment variables are set when the
   server executes the gateway program.
     _________________________________________________________________

Specification

   The following environment variables are not request-specific and are
   set for all requests:

     * SERVER_SOFTWARE
       The name and version of the information server software answering
       the request (and running the gateway). Format: name/version
     * SERVER_NAME
       The server's hostname, DNS alias, or IP address as it would appear
       in self-referencing URLs.
     * GATEWAY_INTERFACE
       The revision of the CGI specification to which this server
       complies. Format: CGI/revision
     _________________________________________________________________

   The following environment variables are specific to the request being
   fulfilled by the gateway program:

     * SERVER_PROTOCOL
       The name and revision of the information protcol this request came
       in with. Format: protocol/revision
     * SERVER_PORT
       The port number to which the request was sent.
     * REQUEST_METHOD
       The method with which the request was made. For HTTP, this is
       "GET", "HEAD", "POST", etc.
     * PATH_INFO
       The extra path information, as given by the client. In other
       words, scripts can be accessed by their virtual pathname, followed
       by extra information at the end of this path. The extra
       information is sent as PATH_INFO. This information should be
       decoded by the server if it comes from a URL before it is passed
       to the CGI script.
     * PATH_TRANSLATED
       The server provides a translated version of PATH_INFO, which takes
       the path and does any virtual-to-physical mapping to it.
     * SCRIPT_NAME
       A virtual path to the script being executed, used for
       self-referencing URLs.
     * QUERY_STRING
       The information which follows the ? in the URL which referenced
       this script. This is the query information. It should not be
       decoded in any fashion. This variable should always be set when
       there is query information, regardless of command line
       decoding.
     * REMOTE_HOST
       The hostname making the request. If the server does not have this
       information, it should set REMOTE_ADDR and leave this unset.
     * REMOTE_ADDR
       The IP address of the remote host making the request.
     * AUTH_TYPE
       If the server supports user authentication, and the script is
       protects, this is the protocol-specific authentication method used
       to validate the user.
     * REMOTE_USER
       If the server supports user authentication, and the script is
       protected, this is the username they have authenticated as.
     * REMOTE_IDENT
       If the HTTP server supports RFC 931 identification, then this
       variable will be set to the remote user name retrieved from the
       server. Usage of this variable should be limited to logging only.
     * CONTENT_TYPE
       For queries which have attached information, such as HTTP POST and
       PUT, this is the content type of the data.
     * CONTENT_LENGTH
       The length of the said content as given by the client.
     _________________________________________________________________

   In addition to these, the header lines received from the client, if
   any, are placed into the environment with the prefix HTTP_ followed by
   the header name. Any - characters in the header name are changed to _
   characters. The server may exclude any headers which it has already
   processed, such as Authorization, Content-type, and Content-length. If
   necessary, the server may choose to exclude any or all of these
   headers if including them would exceed any system environment limits.

   An example of this is the HTTP_ACCEPT variable which was defined in
   CGI/1.0. Another example is the header User-Agent.

     * HTTP_ACCEPT
       The MIME types which the client will accept, as given by HTTP
       headers. Other protocols may need to get this information from
       elsewhere. Each item in this list should be separated by commas as
       per the HTTP spec.
       Format: type/subtype, type/subtype
     * HTTP_USER_AGENT
       The browser the client is using to send the request. General
       format: software/version library/version.
     _________________________________________________________________

Examples

   Examples of the setting of environment variables are really much
   better demonstrated than explained.
     _________________________________________________________________

   [Back]Return to the interface specification

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                               CGI Script Input
     _________________________________________________________________

Specification

   For requests which have information attached after the header, such as
   HTTP POST or PUT, the information will be sent to the script on stdin.

   The server will send CONTENT_LENGTH bytes on this file descriptor.
   Remember that it will give the CONTENT_TYPE of the data as well.
   The server is in no way obligated to send end-of-file after the script
   reads CONTENT_LENGTH bytes.
     _________________________________________________________________

Example

   Let's take a form with METHOD="POST" as an example. Let's say the form
   results are 7 bytes encoded, and look like a=b&b=c.

   In this case, the server will set CONTENT_LENGTH to 7 and CONTENT_TYPE
   to application/x-www-form-urlencoded. The first byte on the script's
   standard input will be "a", followed by the rest of the encoded
   string.
     _________________________________________________________________

   [Back]Return to the interface specification

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                               CGI Script Output
     _________________________________________________________________

Script output

   The script sends its output to stdout. This output can either be a
   document generated by the script, or instructions to the server for
   retrieving the desired output.
     _________________________________________________________________

Script naming conventions

   Normally, scripts produce output which is interpreted and sent back to
   the client. An advantage of this is that the scripts do not need to
   send a full HTTP/1.0 header for every request.

   Some scripts may want to avoid the extra overhead of the server
   parsing their output, and talk directly to the client. In order to
   distinguish these scripts from the other scripts, CGI requires that
   the script name begins with nph- if a script does not want the server
   to parse its header. In this case, it is the script's responsibility
   to return a valid HTTP/1.0 (or HTTP/0.9) response to the client.
     _________________________________________________________________

Parsed headers

   The output of scripts begins with a small header. This header consists
   of text lines, in the same format as an HTTP header, terminated by
   a blank line (a line with only a linefeed or CR/LF).

   Any headers which are not server directives are sent directly back to
   the client. Currently, this specification defines three server
   directives:

     * Content-type
       This is the MIME type of the document you are returning.
     * Location
       This is used to specify to the server that you are returning a
       reference to a document rather than an actual document.
       If the argument to this is a URL, the server will issue a redirect
       to the client.
       If the argument to this is a virtual path, the server will
       retrieve the document specified as if the client had requested
       that document originally. ? directives will work in here, but #
       directives must be redirected back to the client.
     * Status
       This is used to give the server an HTTP/1.0 status line to send
       to the client. The format is nnn xxxxx, where nnn is the 3-digit
       status code, and xxxxx is the reason string, such as "Forbidden".
     _________________________________________________________________

Examples

   Let's say I have a fromgratz to HTML converter. When my converter is
   finished with its work, it will output the following on stdout (note
   that the lines beginning and ending with --- are just for illustration
   and would not be output):

--- start of output ---
Content-type: text/html

--- end of output ---

   Note the blank line after Content-type.

   Now, let's say I have a script which, in certain instances, wants to
   return the document /path/doc.txt from this server just as if the user
   had actually requested http://server:port/path/doc.txt to begin with.
   In this case, the script would output:

--- start of output ---
Location: /path/doc.txt

--- end of output ---

   The server would then perform the request and send it to the client.

   Let's say that I have a script which wants to reference our gopher
   server. In this case, if the script wanted to refer the user to
   gopher://gopher.ncsa.uiuc.edu/, it would output:

--- start of output ---
Location: gopher://gopher.ncsa.uiuc.edu/

--- end of output ---

   Finally, I have a script which wants to talk to the client directly.
   In this case, if the script is referenced with SERVER_PROTOCOL of
   HTTP/1.0, the script would output the following HTTP/1.0 response:

--- start of output ---
HTTP/1.0 200 OK
Server: NCSA/1.0a6
Content-type: text/plain

This is a plaintext document generated on the fly just for you.

--- end of output ---
     _________________________________________________________________

   [Back]Return to the interface specification

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                      An error script for NCSA HTTPd 1.4

   Error scripts have extra environment variables passed to them, in
   addition to all of the CGI 1.1 variables. These are:

   REDIRECT_REQUEST
          This is the request as sent exactly to the server.

   REDIRECT_URL
          This is the requested URL that caused the error.

   REDIRECT_STATUS
          This is the status number and message that NCSA HTTPd would
          have sent if it would have been allowed to reply.

   In addition, NCSA HTTPd passes as the QUERY_STRING error string that
   NCSA HTTPd generated as err_string=error_message

   Some error messages might require headers beyond which are in the CGI
   specification. For that reason, the following example is an nph
   (non-parsed headers) script.

   The code, in Perl. Also available for downloading.
     _________________________________________________________________

#!/usr/local/bin/perl
# Non-parsed headers CGI 1.1 error script in Perl to handle error requests
# from NCSA HTTPd 1.4 via ErrorDocument.  This should handle all errors in
# almost the same fashion as NCSA HTTPd 1.4 would internally.
#
# This script is in the Public Domain.  NCSA and the author offer no
# guaruntee's nor claim any responsibility for it.  That's as pseudo-legalise
# as I get.
#
# This script doesn't do any encryption or authentication, nor does it
# contain hooks to do so.
#
# This was written for Perl 4.016.  I've heard rumours about it working with
# other versions, but I'm no Perl hacker, so how would I know?
#
# Brandon Long / NCSA HTTPd Development Team / Software Development Group
# National Center for Supercomputing Applications / University of Illinios
#
# For more information:
# NCSA HTTPd    : http://hoohoo.ncsa.uiuc.edu/docs/
# CGI 1.1       : http://hoohoo.ncsa.uiuc.edu/cgi/
# ErrorDocument : http://hoohoo.ncsa.uiuc.edu/docs/setup/srm/ErrorDocument.h
tml
# Example CGI   : http://hoohoo.ncsa.uiuc.edu/cgi/ErrorCGI.html
#

$error = $ENV{'QUERY_STRING'};
$redirect_request = $ENV{'REDIRECT_REQUEST'};
($redirect_method,$request_url,$redirect_protocal) = split(' ',$redirect_reques
t);
$redirect_status = $ENV{'REDIRECT_STATUS'};
if (!defined($redirect_status)) {
  $redirect_status = "200 Ok";
}
($redirect_number,$redirect_message) = split(' ',$redirect_status);
$error =~ s/error=//;

$title = "<HEAD><TITLE>".$redirect_status."</TITLE></HEAD>";

if ($redirect_method eq "HEAD") {
        $head_only = 1;
} else {
        $head_only = 0;
}

printf("%s %s\r\n",$ENV{'SERVER_PROTOCOL'},$redirect_status);
printf("Server: %s\r\n",$ENV{'SERVER_SOFTWARE'});
printf("Content-type: text/html\r\n");

$redirect_status = "<img alt=\"\" src=/images/icon.gif>".$redirect_status;
if ($redirect_number == 302) {
        if ($error !~ /http:/) {
                printf("xLocation: http://%s:%s%s\r\n",
                        $ENV{'SERVER_NAME'},
                        $ENV{'SERVER_PORT'},
                        $error);
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("This document has moved");
                printf("<A HREF=\"http://%s:%s%s\">here</A>.\r\n",
                       $ENV{'SERVER_NAME'},
                       $ENV{'SERVER_PORT'},
                       $error);
                }
        } else {
                printf("Location: %s\r\n",$error);
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("This document has moved");
                printf("<A HREF=\"%s\">here</A>.\r\n",$error);
                }
        }
} elsif ($redirect_number == 400) {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("Your client sent a request that this server didn't");
                printf(" understand.<br><b>Reason:</b> %s\r\n",$error);
        }
} elsif ($redirect_number == 401) {
        printf("WWW-Authenticate: %s\r\n",$error);
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("Browser not authentication-capable or ");
                printf("authentication failed.\r\n");
        }
} elsif ($redirect_number == 403) {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("Your client does not have permission to get");
                printf("URL:%s from this server.\r\n",$ENV{'REDIRECT_URL'});
        }
} elsif ($redirect_number == 404) {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("The requested URL:<code>%s</code> ",
                        $ENV{'REDIRECT_URL'});
                printf("was not found on this server.\r\n");
        }
} elsif ($redirect_number == 500) {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("The server encountered an internal error or ");
                printf("misconfiguration and was unable to complete your ");
                printf("request \"<code>%s</code>\"\r\n",$redirect_request);
        }
} elsif ($redirect_number == 501) {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
                printf("The server is unable to perform the method ");
                printf("<b>%s</b> at this time.",$redirect_method);
        }
} else {
        printf("\r\n");
        if (!$head_only) {
                printf("%s\r\n",$title);
                printf("<BODY><H1>%s</H1>\r\n",$redirect_status);
        }
}

if (!$head_only) {
        printf("<p>The following might be useful in determining the problem:");
        printf("<pre>\r\n");
        open(ENV,"env|");
        while (<ENV>) {
                printf("$_");
        }
        close(ENV);
        printf("</pre>\r\n<hr>");
        printf("<A HREF=\"http://%s:%s/\"><img alt=\"[Back to Top]\" src=\"/ima
ges/back.gif\"> Back to Root of Server</A>\r\n",
                $ENV{'SERVER_NAME'},$ENV{'SERVER_PORT'});
        printf("<hr><i><a href=\"mailto:webmaster\@%s\">webmaster\@%s</a></i> /
 ",
                $ENV{'SERVER_NAME'},$ENV{'SERVER_NAME'});
        printf("<i><a href=\"mailto:httpd\@ncsa.uiuc.edu\">httpd\@ncsa.uiuc.edu
</a></i>");
        printf("</BODY>\r\n");
}
     _________________________________________________________________

   [Back]The ErrorDocument Directive
   [Back] Other CGI examples
     _________________________________________________________________


    NCSA HTTPd Development Team / cgi@ncsa.uiuc.edu / Last Modified
    6-28-95


----------
                            ErrorDocument directive

Purpose


          The ErrorDocument directive points the server to a file to
          send in place of the builtin error message.
            __________________________________________________________

  Syntax


          ErrorDocument type filename
          Where type is one of:

          + 302 - REDIRECT
          + 400 - BAD_REQUEST
          + 401 - AUTH_REQUIRED
          + 403 - FORBIDDEN
          + 404 - NOT_FOUND
          + 500 - SERVER_ERROR
          + 501 - NOT_IMPLEMENTED


          And filename is a CGI script or text/html file with full
          path from document root. CGI scripts launched via these errors
          have 3 new environment variables, REDIRECT_REQUEST,
          REDIRECT_URL and REDIRECT_STATUS. They also take as input the
          error reason of the form err_string=error_reason. For an
          example script, see an example.
            __________________________________________________________

File


          srm.conf
            __________________________________________________________

Default


          If this directive is left out, the compiled error messages will
          be used.
            __________________________________________________________

Examples


          ErrorDocument 403 /cgi-bin/notallowed.cgi
          ErrorDocument 404 /cgi-bin/nph-error.pl
          ErrorDocument 500 /serverError.html
          ErrorDocument 501 /error/notImplemented.html

   For more information on Error scripts
     _________________________________________________________________

   [Back] Return to Resource Configuration File Overview
     _________________________________________________________________


    NCSA HTTPd Development Team / httpd@ncsa.uiuc.edu / Last Modified
    7-12-95


----------
                            Decoding FORMs with CGI

   If you are unfamiliar with forms or how to write them, we suggest you
   look at this guide to fill-out forms. They're just plain HTML, and
   pretty easy to do.

   Decoding them is another story...
     _________________________________________________________________

Where do I get the form data from?

   As you now know, there are two methods which can be used to access
   your forms. These methods are GET and POST. Depending on which method
   you used, you will receive the encoded results of the form in a
   different way.

     * The GET method
       If your form has METHOD="GET" in its FORM tag, your CGI program
       will receive the encoded form input in the environment variable
       QUERY_STRING.
     * The POST method
       If your form has METHOD="POST" in its FORM tag, your CGI program
       will receive the encoded form input on stdin. The server will NOT
       send you an EOF on the end of the data, instead you should use the
       environment variable CONTENT_LENGTH to determine how much data you
       should read from stdin.
     _________________________________________________________________

But what does it all mean? How do I decode the form data?

   When you write a form, each of your input items has a NAME tag. When
   the user places data in these items in the form, that information is
   encoded into the form data. The value each of the input items is given
   by the user is called the value.

   Form data is a stream of name=value pairs separated by the &
   character. Each name=value pair is URL encoded, i.e. spaces are
   changed into plusses and some characters are encoded into hexadecimal.

   Because others have been presented with this problem as well, there
   are already a number of programs which will do this decoding for you.
   The following are links into the CGI archive, clicking on them will
   retrieve the software package being referred to.

     * The Bourne Shell: The AA archie gateway. Contains calls to sed
       and awk which convert a GET form data string into separate
       environment variables.
     * C: The default scripts for NCSA httpd. While I won't win any
       awards for verbosity in documenting my code, there are C routines
       and example programs you can use to translate the query string
       into a group of structures.
     * PERL: The PERL CGI-lib. This package contains a group of useful
       PERL routines to decode forms.
     * PERL5: CGI.pm A perl5 library for handling forms in CGI
       scripts. With just a handful of calls, you can parse CGI queries,
       create forms, and maintain the state of the buttons on the form
       from invocation to invocation.
     * TCL: TCL argument processor. This is a set of TCL routines to
       retrieve form data and place it into TCL variables.

   The basic procedure is to split the data by the ampersands. Then, for
   each name=value pair you get for this, you should URL decode the name,
   and then the value, and then do what you like with them.
     _________________________________________________________________

   [Back]Return to the overview

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu


----------
                          Writing secure CGI scripts

   Any time that a program is interacting with a networked client, there
   is the possibility of that client attacking the program to gain
   unauthorized access. Even the most innocent looking script can be very
   dangerous to the integrity of your system.

   With that in mind, we would like to present a few guidelines to making
   sure your program does not come under attack.
     _________________________________________________________________

     * Beware the eval statement
       Languages like PERL and the Bourne shell provide an eval command
       which allow you to construct a string and have the interpreter
       execute that string. This can be very dangerous. Observe the
       following statement in the Bourne shell:
       eval `echo $QUERY_STRING | awk 'BEGIN{RS="&"} {printf
       "QS_%s\n",$1}' `
       This clever little snippet takes the query string, and convents it
       into a set of variable set commands. Unfortunately, this script
       can be attacked by sending it a query string which starts with a
       ;. See what I mean about innocent-looking scripts being dangerous?
     * Do not trust the client to do anything
       A well-behaved client will escape any characters which have
       special meaning to the Bourne shell in a query string and thus
       avoid problems with your script misinterpreting the characters. A
       mischevious client may use special characters to confuse your
       script and gain unauthorized access.
     * Be careful with popen and system.
       If you use any data from the client to construct a command line
       for a call to popen() or system(), be sure to place backslashes
       before any characters that have special meaning to the Bourne
       shell before calling the function. This can be achieved easily
       with a short C function.
     * Turn off server-side includes
       If your server is unfortunate enough to support server-side
       includes, turn them off for your script directories!!!. The
       server-side includes can be abused by clients which prey on
       scripts which directly output things they have been sent.

   For a more comprehensive summary of security and the World-Wide Web,
   see the WWW Security FAQ.
     _________________________________________________________________

   [Back]Return to the overview

   CGI - Common Gateway Interface


    cgi@ncsa.uiuc.edu
Client Side State - HTTP Cookies

PERSISTENT CLIENT STATE
HTTP COOKIES 

Preliminary Specification - Use with caution

INTRODUCTION 

Cookies are a general mechanism which server side connections 
(such as CGI scripts) can use to both store and retrieve information 
on the client side of the connection. The addition of a simple,
persistent, client-side state significantly extends the capabilities 
of Web-based client/server applications.

OVERVIEW 

A server, when returning an HTTP object to a client, may also 
send a piece of state information which the client will store. 
Included in that state object is a description of the range 
of URLs for which that state is valid.  Any future HTTP requests 
made by the client which fall in that range will include a 
transmittal of the current value of the state object from 
the client back to the server.  The state object is called 
a COOKIE, for no compelling reason. 

This simple mechanism provides a powerful new tool which enables 
a host of new types of applications to be written for web-
based environments. Shopping applications can now store information 
about the currently selected items, for fee services can send 
back registration information and free the client from retyping 
a user-id on next connection, sites can store per-user preferences 
on the client, and have the client supply those preferences 
every time that site is connected to. 

SPECIFICATION 

A cookie is introduced to the client by including a SET-COOKIE 
header as part of an HTTP response, typically this will be 
generated by a CGI script. 

Syntax of the Set-Cookie HTTP Response Header

This is the format a CGI script would use to add to the HTTP 
headers a new piece of data which is to be stored by the client 
for later retrieval. 
-=-=-=-=-=-=-=-=-=-


Set-Cookie: <I>NAME</I>=<I>VALUE</I>; expires=<I>DATE</I>;

path=<I>PATH</I>; domain=<I>DOMAIN_NAME</I>; secure


-=-=-=-=-=-=-=-=-=-

NAME=VALUE
�This string is a sequence of characters excluding semi-colon,
�comma and white space.  If there is a need to place such 
�data in the name or value, some encoding method such as 
�URL style %XX encoding is recommended, though no encoding 
�is defined or required. 

This is the only required attribute on the SET-COOKIE header. 

EXPIRES=DATE 
�The EXPIRES attribute specifies a date string that defines 
�the valid life time of that cookie.  Once the expiration 
�date has been reached, the cookie will no longer be stored 
�or given out. 

The date string is formatted as: 
-=-=-=-=-=-=-=-=-=-
��� Wdy, DD-Mon-YY HH:MM:SS GMT
-=-=-=-=-=-=-=-=-=-
This is based on RFC 850, RFC 1036, and RFC 822, with the 
variations that the only legal time zone is GMT and the separators 
between the elements of the date must be dashes. 

EXPIRES is an optional attribute.  If not specified, the cookie 
will expire when the user's session ends. 

NOTE: There is a bug in Netscape Navigator version 1.1 and 
earlier. Only cookies whose PATH attribute is set explicitly 
to "/" will be properly saved between sessions if they have 
an EXPIRES attribute.

DOMAIN=DOMAIN_NAME 
�When searching the cookie list for valid cookies, a comparison 
�of the DOMAIN attributes of the cookie is made with the 
�Internet domain name of the host from which the URL will 
�be fetched.  If there is a tail match, then the cookie 
�will go through PATH matching to see if it should be sent. 
�"Tail matching" means that DOMAIN attribute is matched 
�against the tail of the fully qualified domain name of 
�the host.  A DOMAIN attribute of "acme.com" would match 
�host names "anvil.acme.com" as well as "shipping.crate.acme.com". 
�
Only hosts within the specified domain can set a cookie for 
a domain and domains must have at least two (2) or three (3) 
periods in them to prevent domains of the form: ".com", ".edu",
and "va.us".  Any domain that fails within one of the seven 
special top level domains listed below only require two periods. 
Any other domain requires at least three.  The seven special 
top level domains are: "COM", "EDU", "NET", "ORG", "GOV", 
"MIL", and "INT".  

The default value of DOMAIN is the host name of the server 
which generated the cookie response. 

PATH=PATH 
�The PATH attribute is used to specify the subset of URLs in 
�a domain for which the cookie is valid.  If a cookie has 
�already passed DOMAIN matching, then the pathname component 
�of the URL is compared with the path attribute, and if 
�there is a match, the cookie is considered valid and is 
�sent along with the URL request. The path "/foo" would 
�match "/foobar" and "/foo/bar.html".  The path "/" is the 
�most general path. 

If the PATH is not specified, it as assumed to be the same 
path as the document being described by the header which contains 
the cookie. 

SECURE 
�If a cookie is marked SECURE, it will only be transmitted 
�if the communications channel with the host is a secure 
�one.  Currently this means that secure cookies will only 
�be sent to HTTPS (HTTP over SSL) servers. 

If SECURE is not specified, a cookie is considered safe to 
be sent in the clear over unsecured channels. 

Syntax of the Cookie HTTP Request Header

When requesting a URL from an HTTP server, the browser will 
match the URL against all cookies and if any of them match,
a line containing the name/value pairs of all matching cookies 
will be included in the HTTP request.  Here is the format 
of that line: 
-=-=-=-=-=-=-=-=-=-


Cookie: <I>NAME1=OPAQUE_STRING1</I>; <I>NAME2=OPAQUE_STRING2 ...</I>


-=-=-=-=-=-=-=-=-=-

Additional Notes

�Multiple SET-COOKIE headers can be issued in a single server 
�response. 

�Instances of the same path and name will overwrite each 
�other, with the latest instance taking precedence.  Instances 
�of the same path but different names will add additional 
�mappings. 

�Setting the path to a higher-level value does not override 
�other more specific path mappings.  If there are multiple 
�matches for a given cookie name, but with separate paths,
�all the matching cookies will be sent. (See examples below.) 
�
�The expires header lets the client know when it is safe 
�to purge the mapping but the client is not required to 
�do so.  A client may also delete a cookie before it's expiration 
�date arrives if the number of cookies exceeds its internal 
�limits. 

�When sending cookies to a server, all cookies with a more 
�specific path mapping should be sent before cookies with 
�less specific path mappings.  For example, a cookie "name1=
�foo" with a path mapping of "/" should be sent after a 
�cookie "name1=foo2" with a path mapping of "/bar" if they 
�are both to be sent. 

�There are limitations on the number of cookies that a client 
�can store at any one time.  This is a specification of 
�the minimum number of cookies that a client should be prepared 
�to receive and store. 
���    
���300 total cookies         
���4 kilobytes per cookie, where the name and the OPAQUE_STRING 
���        combine to form the 4 kilobyte limit. 
���    
���20 cookies per server or domain.  (note that completely 
���        specified hosts and domains are treated 
���as separate entities                 and have a 20 cookie 
���limitation for each, not combined) 
Servers should not expect clients to be able to exceed these 
�limits. When the 300 cookie limit or the 20 cookie per 
�server limit is exceeded, clients should delete the least 
�recently used cookie. When a cookie larger than 4 kilobytes 
�is encountered the cookie should be trimmed to fit, but 
�the name should remain intact as long as it is less than 
�4 kilobytes.  

�If a CGI script wishes to delete a cookie, it can do so 
�by returning a cookie with the same name, and an EXPIRES 
�time which is in the past.  The path and name must match 
�exactly in order for the expiring cookie to replace the 
�valid cookie. This requirement makes it difficult for anyone 
�but the originator of a cookie to delete a cookie. 

�When caching HTTP, as a proxy server might do, the SET-COOKIE 
�response header should never be cached. 

�If a proxy server receives a response which contains a SET-
�COOKIE header, it should propagate the SET-COOKIE header 
�to the client, regardless of whether the response was 304 
�(Not Modified) or 200 (OK). 

Similarly, if a client request contains a Cookie: header, 
it should be forwarded through a proxy, even if the conditional 
If-modified-since request is being made. 

EXAMPLES 

Here are some sample exchanges which are designed to illustrate 
the use of cookies. 

First Example transaction sequence:

Client requests a document, and receives in the response:
�
-=-=-=-=-=-=-=-=-=-


Set-Cookie: CUSTOMER=WILE_E_COYOTE; path=/; expires=Wednesday, 09-Nov-99 23:12:40 GMT
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/" on this server, it 
sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: CUSTOMER=WILE_E_COYOTE
-=-=-=-=-=-=-=-=-=-

Client requests a document, and receives in the response:
�
-=-=-=-=-=-=-=-=-=-
Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/" on this server, it 
sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001
-=-=-=-=-=-=-=-=-=-

Client receives:
�
-=-=-=-=-=-=-=-=-=-
Set-Cookie: SHIPPING=FEDEX; path=/foo
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/" on this server, it 
sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/foo" on this server, 
it sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: CUSTOMER=WILE_E_COYOTE; PART_NUMBER=ROCKET_LAUNCHER_0001; SHIPPING=FEDEX
-=-=-=-=-=-=-=-=-=-

Second Example transaction sequence:

Assume all mappings from above have been cleared.

Client receives:
�
-=-=-=-=-=-=-=-=-=-
Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/" on this server, it 
sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001
-=-=-=-=-=-=-=-=-=-

Client receives:
�
-=-=-=-=-=-=-=-=-=-
Set-Cookie: PART_NUMBER=RIDING_ROCKET_0023; path=/ammo
-=-=-=-=-=-=-=-=-=-

When client requests a URL in path "/ammo" on this server,
it sends:
�
-=-=-=-=-=-=-=-=-=-
Cookie: PART_NUMBER=RIDING_ROCKET_0023; PART_NUMBER=ROCKET_LAUNCHER_0001
-=-=-=-=-=-=-=-=-=-

�NOTE: There are two name/value pairs named "PART_NUMBER" due 
�to the inheritance of the "/" mapping in addition to the 
�"/ammo" mapping. 

Corporate Sales: 415/937-2555; Personal Sales: 415/937-3777;
Federal Sales: 415/937-3678
If you have any questions, please visit Customer Service.

Copyright c 1996 Netscape Communications Corporation