Perl CGI Scripts
Now that we have learned how to use some common Perl features we can write a Perl program that will process data from an internet form page. Programs that work with web pages are commonly known as scripts. The form page in Chapter 2 solicits linguistic responses to a battery of test sentences. We now need to write a script that records the responses to the web page and issues a reply to the user. Our pronoun survey form includes a <FORM> tag that tells the server where to send the user’s reply. This tag specifies the POST method of sending the data to the program that will process the information. The ACTION attribute allows us to tell the server which program we wish to use for processing the data.
Processing the data from our pronoun survey will require three stages. We need to collect the data from the browser, transform the data from its CGI form into a more manageable form for Perl, and record the data in a file on the server. We should also write a reply to the user in the form of an html document that can be displayed on the user’s browser.
Creating a web page with Perl
We will begin by writing a Perl program that produces the html document to thank the user for their information. We need to tell the user’s browser what kind of document it has received. We will use the content type text/html since we now know how to write basic web pages. This content-type line must be followed by a blank line. The Perl code for accomplishing this task is
print “Content-type: text/html\n\n”;
Now that we have dispensed with the preliminaries, the rest of our work is straightforward. We just use Perl’s print command to produce the html page. The code for this is included in the Perl program shown in Figure 5.1. This program only contains two print commands. The first specifies the content type while the other produces an html document. The structure of the two print lines is also the same
print "your message goes here";
The word print is followed by a space and a quotation mark, the message, an end quotation mark and a semicolon. The first print command puts all of this information on one line while the second print command distributes this information over several lines. The flexibility inherent in Perl’s print commands makes it much easier to arrange the output in a familiar form. We can then make sure that we have used the html tags properly.
Figure 5.1 A Perl program that responds to web forms
#!/usr/local/bin/perl
#html1.pl
print "Content-type: text/html\n\n";
"<html>
<title> Pronoun Anaphora Survey </title>
<body>
<center><h1>Thanks!</h1></center>
<p><STRONG>Thanks for your input.</p></STRONG>
<p>We appreciate your response to our web survey.</p>
</body>
</html>";
exit;
The main point of writing our program is to respond to input from our pronoun survey web page. To do this, we must send our Perl program to the server that has our web page on it. We have already created a public_html directory on the server for our web page. CGI programs are commonly kept in a subdirectory of the public_html directory. This subdirectory should be labeled cgi-bin. We will have to telnet to the server, change to the public_html directory there, and make a new directory labeled ‘cgi-bin’. We can use the UNIX command mkdir cgi-bin to create a cgi-bin directory. We must also change the access permission for this subdirectory with the UNIX command chmod 711 cgi-bin. We can then use a file transfer program to send our program to the cgi-bin subdirectory. Once we have installed our program in the cgi-bin subdirectory, we need to change the access permission for the program by using the UNIX command chmod 700 filename. Information about CGI scripts is available on the CGI with Perl and Using suEXEC pages.
We can now load our favorite web browser, go to our pronoun web page, and click the send button. If all goes well, you will see a response displayed in your browser. Unfortunately, there are many places where a new web programmer can run into difficulties. Testing your Perl program first on your home computer or on the server will show if it runs properly. You also need to make sure that you have identified the correct location for your Perl program in the form line of the pronoun survey web page. Most computer administrators take special pains to insulate their servers from CGI programs since such programs can be used to attack the security of the server or even destroy documents stored on the server. One method for maintaining security on servers is to use a program known as cgiwrap. This program checks the form of all CGI programs before running them on the server. If the server that you use runs such a program, the Perl program will be located at the address:
http://domain.name/cgiwrap/username/program.pl
The tilde (‘~’) that you used in the URL for your web page is missing from the address of your Perl program. The form line on the pronoun survey web page will need to be changed accordingly. KU has added a new script execution environment called "suexec". You will need to contact web services to enable suexec for your cgi-bin. Scripts run through suexec have a different name and location. In suexec, you will need to change the program name from program.pl to program.shtml. The program will be located at the address:
http://domain.name/~username/cgi-bin/test.shtml
Counting responses in Perl
I started with a simple example of a Perl generated response to help solve the process of establishing contact between the browser and the Perl CGI program. Of course, just generating a reply to the user’s browser is not very satisfying. The program would be much more useful if it kept track of the user’s responses to each sentence and kept a running total of the responses on the server. We will now tackle this step next.
The browser sends its data to the server in a standard CGI format. Such standards make it easy to process the data on any server, but the data must first be converted to a more accessible form for Perl. The data is sent as a series of name-value pairings. Each name-value pair corresponds to a name-value pair in the web form. The string sent to the server has an ampersand (&) between each name-value pair. Each name is separated from its value by an equal sign (=). Space characters are coded as a plus sign (+). The string sent by the pronoun survey page looks like
sent1=Short+distance&sent2=Long+distance
We will write a Perl program that breaks the input string into the name-value pairs by searching for the ampersands. We will first use Perl’s read command to access the input string. This command has three arguments: the input source, the string variable that our program will use to store the input string, and a measure of the number of characters in the input string. Perl can use the environment variable CONTENT_LENGTH to determine the length of the input string. The read command has the form
read(STDIN, $QueryString, $ENV{'CONTENT_LENGTH'});
We can then use the Perl split function to split the input string stored in the variable $QueryString into pairs of names and values. This command has the form
@NameValuePairs = split (/&/, $QueryString);
The split function stores the name value pairs in an array named @NameValuePairs. Arrays make it easy for the program to access each name and value. The first item in the list @NameValuePairs is the pair sent1=Short+distance. The second item is the pair sent2=Long+distance. We need to use the split function again to separate each name from its corresponding value. The command we need for this process is
($Name, $Value) = split (/=/, $NameValue);
The variable $Name will contain the name of the variable sent by the survey form, e.g., sent1. The variable $Value contains the value that the user selected for sent1, e.g., Short+distance. We can then use the Perl translation operator to change the plus signs in the values back to spaces. This command would appear as
$Value =~ tr/+/ /;
If our survey contains more than one sentence, we will need to store the information in the $Name and $Value variables in another variable so that the next name value pair will not erase the information from the previous name value pair. The following lines show how we can use the Perl if and elsif operators to accomplish this procedure.
if ( $Name eq "sent1" ) {
$sent1 = $Value;
} elsif ( $Name eq "sent2" ) {
$sent2 = $Value;
} # end if
Figure 5.2 shows what our complete program looks like. This program processes the data sent by a user and produces a response that shows the user’s selections. An example of an html page that uses this script can be found at example 5.2
Figure 5.2 A Perl program that tracks responses
#!/usr/local/bin/perl
#html2.pl
#This program creates a response to the pronoun survey web page ‘Pronoun.html’
# Read the data from standard input.
read (STDIN, $QueryString, $ENV{'CONTENT_LENGTH'});
# Use split to make an array of name-value pairs broken at
# the ampersand character. Then get the values.
@NameValuePairs = split (/&/, $QueryString);
#Process each name-value pair
foreach $NameValue (@NameValuePairs) {
($Name, $Value) = split (/=/, $NameValue);
$Value =~ tr/+/ /;
if ( $Name eq "sent1" ) {
$sent1 = $Value;
} elsif ( $Name eq "sent2" ) {
$sent2 = $Value;
} # end if
} # end foreach
#Print the html response
print "Content-type: text/html\n\n";
"<html>
<title> Pronoun Anaphora Survey </title>
<body>
<center><h1>Thanks!</h1></center>
<p><STRONG>Thanks for your input.</p></STRONG>
<p>You thought the pronoun in sentence one had a $sent1 interpretation</p>
<p>You thought the pronoun in sentence two had a $sent2 interpretation</p>
<p>We appreciate your response to our web survey.</p>
</body>
</html>";
exit;
Recording data in a server file
Now that we have a Perl program that responds to the pronoun survey form it is time to add a section that will copy the data from the user’s browser to a data file on the server. Keeping a data file on the server will allow us to record the user’s responses and keep track of the overall distribution of responses. A simple way to do this is to create a data file that starts with zeros for all the possible responses. I recommend using commas to separate the zeros since data files in this format can be transferred easily to most spreadsheet programs. Each line of the file will contain responses for one of our test sentences. Individual responses can be added to the total for the pertinent response. The initial state of the data file for recording responses to two sentences would look like
0,0,0,0,0
0,0,0,0,0
The first 0 on the first line records the number of ‘No response’ replies for the first sentence while the second 0 on the second line records the number of ‘Neither’ replies for the second sentence.
Our program should read the data file from the server, add the user’s responses to the totals, and then write the adjusted data file back to the server. We also need to guard against the possibility that several users may respond at the same time. The Perl command flock can be used to lock and unlock the data file to prevent multiple file access. This function takes two arguments. The first argument is the name or filehandle of the data file; the second argument uses the number 2 for locking or the number 8 for unlocking the file. I used the filehandle DATAFILE in my program. The program segment in Figure 5.3 shows how to accomplish these tasks.
Figure 5.3. Perl commands for data processing
# This segment of the program processes data from the pronoun survey form
# and stores it in a server file named "pronoun.data"
# It uses the filehandle DATAFILE
# Open and lock the survey form
open (DATAFILE, "</home/username/public_html/pronoun.data");
flock (DATAFILE, 2);
# Read the data file, unlock it and close it
for ($index = 0; $index <= 1; $index = $index + 1) {
chomp($file_line[$index] = <DATAFILE>);
}
flock(DATAFILE, 8);
close(DATAFILE);
# Split the data lines into separate responses, increment the chosen
# response, and put the lines back together
for ($index = 0; $index <= 1; $index = $index + 1) {
@response = split /,/, $file_line[$index];
$next = $index + 1;
$sent = "sent" . $next;
if ( $$sent eq "No response" ) {
$response[0] = $response[0] + 1;
} elsif ( $$sent eq "Neither" ) {
$response[1] = $response[1] + 1;
} elsif ( $$sent eq "Long distance" ) {
$response[2] = $response[2] + 1;
} elsif ( $$sent eq "Short distance" ) {
$response[3] = $response[3] + 1;
} elsif ( $$sent eq "Both" ) {
$response[4] = $response[4] + 1;
} #end ifs
$file_line[$index] = join(",", @response);
} #end for
# Reopen the data file for writing and lock it
open(DATAFILE, ">/home/username/public_html/pronoun.data");
flock(DATAFILE, 2);
# Write the new data back to the file, unlock the file, and close it
for ($index = 0; $index <= 1; $index = $index + 1 ) {
$line = $file_line[$index];
print DATAFILE "$line\n";
}
flock(DATAFILE, 8);
close(DATAFILE);
I made use of several Perl tricks to read the data file and record the user’s responses. The line
chomp($file_line[$index] = <DATAFILE>);
stores a line of data in the file_line array. Every time the program reads a new line from the data file, it increments the value of the variable $index. The result is that the new data is put in a new place in the array. The chomp function removes any input line separators from the data file.
The data file contains zeros separated by commas. We can use Perl’s split function once again to separate the zeros into individual holding areas. The command
@response = split /,/, $file_line[$index];
accomplishes this task. It uses the array @response to store a single row of data from the file_line array. The first zero in this line will be the first item in the response array while the fifth zero in this line will be the fifth item in the response array.
Once we separate the possible responses this way we are ready to add the user’s response to the totals in our response array. The user’s response comes in the form of ‘No response’, ‘Neither’, ‘Long distance’, etc. Our program has to check which response the user chose and increment the relevant total in the response array. I used a series of if statements to check the user’s response
if ( $$sent eq "No response" ) {
$response[0] = $response[0] + 1;
} elsif ( $$sent eq "Neither" ) {
$response[1] = $response[1] + 1;
} elsif ( $$sent eq "Long distance" ) {
$response[2] = $response[2] + 1;
} elsif ( $$sent eq "Short distance" ) {
$response[3] = $response[3] + 1;
} elsif ( $$sent eq "Both" ) {
$response[4] = $response[4] + 1;
} #end ifs
The name of the variable that contains the user’s response changes with each sentence. Our survey form uses the variable names sent1 and sent2. In order to use the same block of if statements for all the sentences in our survey I constructed a new variable $$sent that cycles through the values sent1 and sent2. The program first constructs a variable with the value ‘sent1' by means of the concatenation statement
$sent = "sent" . $next;
Adding a scalar variable sign at the beginning of this variable produces the compound $$sent which Perl evaluates as $($sent) or $sent1, etc.
After we have added the user’s responses to the relevant totals we can use Perl’s join function to put the response list back into a single line of data for storage on the server. The following line accomplishes this task.
$file_line[$index] = join(",", @response);
We need to place quotes around the character (the comma) that we use to separate the totals in the data file. Once our program puts the data in this form it is a simple matter to open the data file for output and write our results back to the file. Note the use of the print function to output the line to the data file. The print command ends in ‘\n’ to add a new line for the next data entry.
I put all of these sections together in the program shown in Figure 5.4. In addition to storing a running total of user responses in the data file, the program calculates the percentage of total responses that correspond to the user’s choice and displays this percentage on the html response. I used Perl’s sprintf function to reduce the percentages to a manageable number of decimal places. Hopefully, such feedback will encourage more participation in linguistic surveys on the web.
We need to use quite a few advanced Perl techniques for our web survey program. The following chapters will provide more information about each of these techniques as we focus on using Perl to process natural language commands.
Figure 5.4 The final pronoun survey Perl program
#!/usr/local/bin/perl
#pronoun.pl
# Read the data from standard input.
read (STDIN, $QueryString, $ENV{'CONTENT_LENGTH'});
# Use split to make an array of name-value pairs broken at
# the ampersand character. Then get the values.
@NameValuePairs = split (/&/, $QueryString);
foreach $NameValue (@NameValuePairs) {
($Name, $Value) = split (/=/, $NameValue);
$Value =~ tr/+/ /;
if ( $Name eq "sent1" ) {
$sent1 = $Value;
} elsif ( $Name eq "sent2" ) {
$sent2 = $Value;
} # end if
} # end foreach
print "Content-type: text/html\n\n";
# This segment of the program processes data from the pronoun survey form
# and stores it in a server file named "pronoun.survey.data"
# It uses the filehandle DATAFILE
# Open and lock the survey form
open (DATAFILE, "</home/pyersqr/public_html/pronoun.data");
flock (DATAFILE, 2);
# Get total responses from first line of the data file
# and increment the total
chomp($total = <DATAFILE>);
$total = $total + 1;
# Read the data file, unlock it and close it
for ($index = 0; $index <= 1; $index = $index + 1) {
chomp($file_line[$index] = <DATAFILE>);
}
flock(DATAFILE, 8);
close(DATAFILE);
# Split the data lines into separate responses, increment the chosen
# response, and put the lines back together
for ($index = 0; $index <= 1; $index = $index + 1) {
@response = split /,/, $file_line[$index];
$next = $index + 1;
$sent = "sent" . $next;
if ( $$sent eq "No response" ) {
$response[0] = $response[0] + 1;
$percent[$index] = sprintf "%3.4G", 100 * $response[0] / $total;
} elsif ( $$sent eq "Neither" ) {
$response[1] = $response[1] + 1;
$percent[$index] = sprintf "%3.4G", 100 * $response[1] / $total;
} elsif ( $$sent eq "Long distance" ) {
$response[2] = $response[2] + 1;
$percent[$index] = sprintf "%3.4G", 100 * $response[2] / $total;
} elsif ( $$sent eq "Short distance" ) {
$response[3] = $response[3] + 1;
$percent[$index] = sprintf "%3.4G", 100 * $response[3] / $total;
} elsif ( $$sent eq "Both" ) {
$response[4] = $response[4] + 1;
$percent[$index] = sprintf "%3.4G", 100 * $response[4] / $total;
} #end ifs
$file_line[$index] = join(",", @response);
} #end for
# Reopen the data file for writing and lock it
open(DATAFILE, ">/home/pyersqr/public_html/pronoun.data");
flock(DATAFILE, 2);
# Write the new total to the file
print DATAFILE "$total\n";
# Write the new data back to the file, unlock the file, and close it
for ($index = 0; $index <= 1; $index = $index + 1 ) {
$line = $file_line[$index];
print DATAFILE "$line\n";
}
flock(DATAFILE, 8);
close(DATAFILE);
"<html>
<title> Pronoun Anaphora Survey </title>
<body>
<center><h1>Thanks!</h1></center>
<p><STRONG>Thanks for your input.</p></STRONG>
<p>You thought the pronoun in sentence one had a $sent1 interpretation</p>
as do $percent[0]% of respondents.<br>
<p>You thought the pronoun in sentence two had a $sent2 interpretation</p>
as do $percent[1]% of respondents.<br>
<p>We appreciate your response to our web survey.</p>
</body>
</html>";
exit;
Recommend Reading
There are many sources of information on the use of Perl to process web page forms. The Perl web page at www.perl.com has links to lots of information on Perl including sources of Perl compilers for personal computers. Other helpful books on Perl form processing include:
Muelver, Jerry. 1996. Creating Cool Web Pages with Perl. Foster City, CA: IDG Books Worldwide.
Sebesta, Robert W. 2000. A Little Book on Perl. Upper Saddle River, NJ: Prentice Hall.
Strom, Erik. 1998. Perl CGI Programming: No Experience Required. San Francisco: Sybex.
Summary of Unix commands
chmod 711 cgi-bin Allows access to the subdirectory of Perl programs
chmod 700 filename Allows browser access to a web document or Perl program