Programming with HTML Forms
A drawback with the basic features of HTML is their lack of support for writing documents which interact with the user. Interaction in many documents consists of the user deciding which hypertext link to follow next.
Fortunately, HTML 2.0 includes forms, which means that a document can utilize text entry fields, radio boxes, selection lists, check boxes, text areas, hidden fields, and buttons. These can be used to gather information for an application 'behind' the document, to guide what is offered to the user next. Some typical forms documents are: a movie database questionnaires, surveys, and pizza delivery (Pizza Hut's PizzaNet
This article details three stages in writing forms-based applications for Netscape Navigator for X Windows version 2.0. However, the approach is applicable to all Web browsers with forms capabilities.
The first example prints out the values entered in the fields of a form. The second searches through a text file for matching strings entered via a form. Both examples utilize C programs to process the form's data.
The article finishes with brief discussions of forms testing, and techniques for logging form usage.
Ooooooooooooooooooooooooo
Programming with HTML Forms
There are three basic stages to programming a forms-based document:
1. Design the input form, and write the corresponding HTML document.
2. Write the application program which interprets the data from the input form.
3. Design the document which is generated by the program as the reply to the user. Usually, this document is written in HTML, but this is not mandatory.
Pictorially, these three stages are related as shown below:
Programming with HTML Forms
Before a forms application can be described, the HTML features for defining forms need to be reviewed.
A form begins with:
and ends with:
The METHOD attribute specifies how the data entered in the various fields of the form is transmitted to the application. It is best to use the POST method, since the data is then sent to the standard input of the application, as a string of the form:
name=value&name=value&...
name is the name of the form's data entry field, while value is its associated data.
The other method for sending data is GET. This causes the string to arrive at the server in the environment variable QUERY_STRING, which may result in the string being truncated if it exceeds the shell's command line length. For that reason, GET should be avoided.
A form can contain 8 types of data entry field:
• single line text entry fields
• check boxes
• radio boxes
• hidden fields
• password fields
• selection lists
• multi-line text entry fields
• submit and reset buttons
Single line text entry fields, hidden fields, password text fields, check boxes and radio boxes are specified using the same basic HTML syntax:
field-type can be either: text, checkbox, radio, hidden, or password.
For a check box or radio button, the VALUE field specifies the value of the field when it is checked; unchecked check boxes are disregarded when name=value substrings are being posted to the application.
If several radio buttons have the same name then they act as a one-of-many selection: only one of them can be switched 'on', and so have its value paired with the name.
A hidden text field does not appear on the form, but can have a default value which will be sent to the application.
A password text field will echo *'s when a value is typed into it.
A selection list is specified using:
The option chosen will become the value associated with the selection list's name. It is also possible to include the attribute MULTIPLE after the NAME string to allow multiple selections. This maps to multiple name=value substrings, each with the same name.
A multi-line text entry field has the form:
The submit button causes the document to collect the data from the various form fields, pair it with the names of the fields, and post it to the application. The reset button resets the fields to their default values. Button syntax is:
Thirteen form examples are accessible through:
overview.html also contains more details on the syntax of form fields.
Kkkkkkkkkkkkkkkkkkkkkkkkkkk
Programming with HTML Forms
As mentioned before, when the submit button is clicked, the POST method causes a string to be sent to the application. The string consists of a series of name=value substrings, separated by &'s. An added complication is that name and value are encoded so that spaces are changed into +'s, and some characters are encoded as hexadecimals. Fortunately, form application programmers have written routines for handling these coded strings.
The POST method means that the form application will receive the string on its standard input. This protocol is defined by the Common Gateway Interface (CGI) specification, which also states that an application can respond by generating suitable code on its standard output. Details on the CGI specification can be found at:
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
The CGI specification permits an application to output many different types of documents (e.g. an image, audio, plain text, HTML, or references to other documents). The application determines the output type by writing a header string to standard output, of the form:
Content-type: type/subtype
type/subtype must be MIME (Multipurpose Internet Mail Extensions) types; two common ones are text/html for HTML output, and text/plain for ASCII text. There must be a blank line after the header, and then the data can begin. For instance, an application (coded in C) could output the following:
printf("content-type: text/html\n\n");
/* the newlines are necessary for the blank line */
printf("");
printf("
Search String Error!
");
printf("Must specify at least 1 pattern
");
printf("
");
Kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Example 1: The Echoer
Programming with HTML Forms
Having described the HTML form constructs, the mechanism for supplying data to a form application, and the mechanism for generating a reply, it is now possible to describe a complete, very simple example. Its input document consists of a form with five single line text entry fields, which the application processes by outputting a HTML document containing the text entered in the fields. In other words, the user's input is echoed.
The input document is located at:
http://www.cs.mu.oz.au/~ad/code/form-gp0.html
It will also be used as the interface to the file searcher application of the next section. However, ignoring the explanatory text, the form is quite simple: five text entry fields, and submit and reset buttons labelled as 'Start Search' and 'Clear' respectively.
The text field constructs include extra attributes to limit the size of the input, and the size of the boxes drawn on the screen. Also note that the fields are named pat1 through pat5, although these are not displayed as part of the input document.
When the input document is entered as in Figure 2 and the 'Start Search' button is clicked, the echoing application returns the document shown in Figure 3.
________________________________________
ALP Membership Search
• Enter at most 15 characters in a box (e.g. Melbo).
• At least one box should contain something.
• Matches are lines in the membership list which contain all the box entries.
• The first 10 matches will be returned, together with the total number of matches.
• Click the Start Search button to start the search.
• All the boxes can be cleared by clicking on the Clear button.
Search Boxes
Figure 2: Example input
________________________________________
Query Result
You submitted the following name/value pairs:
• pat1 = John
• pat2 = uk
• pat 3 =
• pat 4 =
• pat 5 =
Figure 3: Echoed input from Figure 2
________________________________________
In form-gp0.html, the name of the application is given in the FORM ACTION attribute as:
http://www.cs.mu.oz.au/cgi-bin/qgp
qgp's actual location on the server depends on the configuration file for the httpd daemon (called httpd.conf). The relevant line in that file is:
Exec /cgi-bin/* /local/dept/wwwd/scripts/*
In other words, qgp must be placed in /local/dept/wwwd/scripts in order for the form to invoke it. This step in linking the input HTML document to the application varies from system to system.
qgp is the object file for qgp.c which can be found at:
http://www.cs.mu.oz.au/~ad/code/qgp.c
The program mostly consists of utility functions for processing name=value substrings, which will, consequently, appear in almost every form application. Some of the functions were written by Rob McCool, and can be accessed via the page:
http://hoohoo.ncsa.uiuc.edu/cgi/forms.html
Also available from that page are similar utilities for writing applications in the Bourne Shell, Perl, and Tcl. Also included are several excellent small programs showing how the utilities can be used.
qgp begins by outputting the start of the reply document -- a HTML document in this case. The extra newlines in the printf()'s are not required, but make the output easier to read during debugging (see Part 7, 'A Note on Testing' for more on this).
cgi_errs() performs two standard error checks: the first determines whether the delivery METHOD is something other than POST. The second checks the encoding strategy for the name=value substrings. In fact, the only encoding supported by most browsers is x-www-form-urlencoded.
The call to build_entries() initializes the entries array with the name and value pairs sent to the application. The environment variable, CONTENT_LENGTH, contains the length of the string, which is used by the for-loop that builds the entries array. Each name=value substring is extracted by a call to fmakeword(). The +'s and hexadecimal URL encodings are replaced, and then the name part of the substring is removed, leaving only the value.
Finally, the contents of the entries array are output as an unnumbered HTML list.
Both cgi_errs() and build_entries() illustrate the importance of environment variables for conveying information from the input document to the application. A complete list of environment variables supported by the CGI specification can be found in:
http://hoohoo.ncsa.uiuc.edu/cgi/env.html
kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Example 2: The File Searcher
Programming with HTML Forms
The input document for the file searcher has the same format as before, but can be found at:
http://www.cs.mu.oz.au/~ad/code/form-gp.html
The only change is that its form's ACTION attribute refers to "http://www.cs.mu.oz.au/cgi-bin/qdir".
qdir searches through a text file holding a membership list. It looks for lines containing the strings entered in the text entry fields of the form, and at most 10 matching lines are printed, together with the total number of matching lines.
The query like the one shown in Figure 2 results in the HTML document shown in Figure 4 being generated by the application. It can also produce an error document, as shown in Figure 5, if no strings are entered before a search is initiated.
________________________________________
Query Result
Only a maximum of 10 matching lines are shown for any search.
The following strings are being used for the search:
• John
• uk
The lines found are:
• Bell, John; DoCS; Queen Mary and Westfield College; Mile End Road; London E1 4NS; UK; Tel: 071 975 5210; Email: jb@dcs.qmw.ac.uk
• Fox, John; ICRF; Advanced Computation Laboratory; PO Box 123,; Lincoln's Inn Fields; London WC2A 3PX; UK; Tel: 071 269 3624; Email: jf@acl.icnet.uk
• Johns, Nicky; LPA; Studio 4; Royal Victoria Patriotic Building; Trinity Road; London SW18 3SX; UK; Tel: 081 871 2016; Fax: 081 874 0449; Email: lpa@cix.compulink.co.uk
• Jones, John; DoCS; Univ. of Hull; Hull; HU6 7RX; UK; Tel: 0482 465767 or 465951; Email: jgj@dcs.hull.ac.uk
• Lloyd, John; Dept of CS,; Univ. of Bristol; Queen's Building,; Univ. Walk; Bristol BS8 1TR; UK; Tel: 0272-303913; Email: jwl@compsci.bristol.ac.uk
5 line(s) printed.
Figure 4: Reply to input like that in Figure 2
________________________________________
Query Result
Must specify at least 1 pattern
Figure 5: Error reply if no strings supplied
________________________________________
The application C code (in file qdir.c) can be found at:
http://www.cs.mu.oz.au/~ad/code/qdir.c
The compiled version, qdir, is in /local/dept/wwwd/scripts.
qdir begins in a similar way to qgp by outputting the start of its reply HTML document and then checking for CGI errors.
The call to record_details() is explained in Part 8, 'Logging'. It logs information about the user in a file, and has no effect on the subsequent code.
get_pats() initially calls build_entries() to obtain the name=value pairs in a useable form. The values are then extracted and placed in the patterns array. These values correspond to the search strings entered by the user in the form's text entry fields.
print_pats() prints the search strings as an unnumbered HTML list.
process_pats() uses the search strings to form a UNIX command. The idea is to translate a single search string, such as John, into the command:
fgrep 'John' search-file > temp-file
fgrep is a UNIX utility for quickly searching through text files.
Multiple search strings, such as John, uk, and LPA, would be utilized in the following command:
fgrep 'John' search-file | fgrep 'uk' | fgrep 'LPA' > temp-file
The trick is to pipe the matching lines of one fgrep call into another fgrep call, which further filters the selection.
The matching lines are printed by print_matches(), which reads at most 10 lines from the temporary file.
The total number of lines in the temporary file is counted by the wc UNIX command:
wc -l temp-file
The line count is read in via a pipe which captures the output from the command.
This code demonstrates how to utilise UNIX features as part of an application, which is preferable in this case because of the size of the file being searched, and the potentially large number of matching lines that need to be manipulated. These techniques for utilizing UNIX can also be employed to create forms that edit files, send mail, read news, or monitor the network.