Web Page Basics


            We must learn to listen before we can talk. We also have to learn the basics of web page construction before we can move on to the interesting part of constructing interactive web pages. Today there are many programs that help users construct web pages. Some excellent web page editors are available on the internet. I recommend the HTML-kit available at www.chami.com. Many of today’s word processors also come with utilities for converting ordinary documents into web pages. However, to create an interactive web page with Perl, you need to understand the basics of web page construction including how to edit web page commands.

            The beauty of a web page is that it can display information on a variety of computers using very different operating systems. This ability is made possible through the use of a common interface—the HyperText Transport Protocol or HTTP. HTTP tells computers how to locate and retrieve the information that exists on the web. HTTP does this by making use of the uniform resource locators or URLs that identify the location of every web document. The URL for the web page to this class is


http://pyersqr.org/classes/Ling783


            Web documents are written in the HyperText Markup Language or HTML. HTML uses predefined tags to tell computers how to display information in a web document. The first part of an HTML tag comes at the beginning of its section and has the form <TAG>. The second part comes at the end of its section and has the form </TAG>. A typical HTML document contains a head, a title, and a body. An example document with these tags is shown in Figure 2.1.


Figure 2.1 Basic HTML document tags

<HTML>

<HEAD>

<TITLE>A sample HTML document</TITLE>

</HEAD>

<BODY>

<P>This is the first paragraph of our document.</P>

<P>This is the second paragraph of our document.</P>

</BODY>

</HTML>


            It is important to note that a web or HTML document only contains HTML tags. Web browsers cannot process the formatting codes that most word processors automatically insert in documents. Figure 2.2 provides an example of this document saved as a Microsoft Word document. The Word formatting codes create many problems for web browsers. Fortunately, most word processors, including Word, allow users to produce what are known as plain, dos or ascii documents. Such documents only contain the characters in a text as well as indications of carriage returns. The UNIX command cat file will display the contents of a file named ‘file’. You can type the example web page in Figure 2.1 into a file on your server. Once you have saved this file with the name ex1.htm, you can display its contents with the command ‘cat ex1.htm’. You should see only the characters that you typed. You can use the DOS command type (e.g. type ex1.htm) to display the same information on a personal computer. This is a good test to make sure that you know how to produce a plain ascii text.



Figure 2.2 A Microsoft Word version of a simple HTML document


ÐÏ à¡+ á > þÿ            ! # þÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿì¥Á 7                ð ¿ ¼ bjbjU U             #

  7| 7| ¼ ÿÿ ÿÿ ÿÿ l Ê Ê Ê Ê Ê Ê Ì Ø Ì Ì Ø Ê Ê Ø Ì À  ì ªqDÂ Þ Ê ̈ Ì Ø Ø ) 0 Y Ø = Ì = Ø Ì Þ Þ Ê Ê Ê Ê Ù <HTML>

<HEAD>

<TITLE>A sample HTML document</TITLE>

</HEAD>

<BODY>

<P>This is the first paragraph of our document.</P>

<P>This is the second paragraph of our document.</P>

</BODY>

</HTML>

           ! ÒÔ € € € € € € € € € . . . ( ) ( ) ( ) . . ) ( )           0 P ̊Ð/ ̊à=!̊  "̊  #  $  %̊ 8Ò$ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ i 8 @ñÿ 8 N o r m a l CJ _H aJ mH              sH       tH        < A@òÿ¡ < D e f a u l t P a r a g r a p h F o n t þOòÿñ B O L D OJ QJ ¼ ÿÿÿÿ 4 < C w ¬ ́ ¾ ̃ 0 € €̃ 0 € €̃ 0 € €̃ 0 € €̃ 0 € €̃ 0 € €̃ 0 € €̃ 0 € €š 0 € € ¼ ¼ ¼ ÿÿ           \ W I N D O W S \ A p p l i c a t i o n D a t a \ M i c r o s o f t \ W o r d \ A u t o R e c o v e r y s a v e o f " C : \...\ H T M L \ e x 1 . d o c ÿ@ € » » X©d » ́ ¼ P @ ÿÿ U n k n o w n ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ G  ‡: ÿ T i m e s N e w R o m a n 5  € S y m b o l 3&

        ‡: ÿ A r i a l S         P a l a t i n o B o o k A n t i q u a " 1 ̂ ðÐ h ¥zh†¥zh† › ! ð Microsoft Word 9.0


            Once you have saved your example, you should try displaying your document in an internet browser. Common browsers include Mozilla Firefox, Netscape Commander and Microsoft Internet Explorer. You will have to grant permission for browsers to access your page before you can see it on the web. Today most people have heard stories about outsiders gaining access to a computer and stealing or destroying documents on the system. As a standard precaution against such unwarranted intrusions, most system administrators for mainframe computers assume that users do not wish their files to be available to the outside world. You must change the access default for your public_html directory if you wish to view your documents on the web. The easiest way to do this is to use the WinSCP program installed on the lab's computers. You can download a version of this program for use on your personal computer. Once you start the program, you will need to type in the name of the KU server (people.ku.edu), your username and password. You will then find a screen with two windows. The left window displays the files on your local computer while the right screen shows the files for your account on the KU server. Your public_html folder should be listed in this directory. You can check the permissions for the public_html folder by right clicking on it, and then clicking on properties. You can grant outside access to this subdirectory by changing the number at the bottom of the properties screen to 755.


An older method would use the command chmod 755 public_html.


Henceforth, any documents in your public_html directory or subdirectory in this directory will be available to web browsers.

            If you open a browser you will have to locate your document’s location or URL. If you saved you document on a mainframe computer, you should know what name the computer goes by. This is the domain name for the computer. The domain name for most web accounts at KU is people.ku.edu. Your html document is located in a folder under your login name. If your login name is ‘username’ you can see a list of folders and documents on your computer account by going opening your web browser and going to the web page:


http://domain.name/~username

e.g., http://people.ku.edu/~username


            Your browser should respond with an index of folders and documents for ‘user’. Among the folders should be your public_html folder. Click on this folder to see a list of your web page folders and documents. KU automatically installs an index page in your public_html folder when you open a web account. Instead of seeing the pages in your folder you will see this index page. You should see your Perl folder here. Click on this folder, and you should find the web page ex1.html.

            Although this page is not all that inspiring in its present state, it can serve as the starting point for the rest of our work on natural language processing. Consider how we could augment this page to collect user intuitions about pronoun anaphora. We can edit the document in Ex1 by changing the title to something more descriptive like ‘Pronoun Anaphora Survey’. We should also make a title that is visible on the page. This can be done by inserting the line


<align=“center”><center><p>Pronoun Anaphora Survey</p></center>


We should change the rest of the body of the document to an explanation of what we wish to accomplish with our survey and instructions to the user. It is also a good idea to provide users with a link to your main web page so they can quickly to return to your main page. We can create a link by using the following line


<P>Click here <A HREF = “http://www.linguistics.ku.edu”>to return</A> to my main page</P>


Finally, you should include your email address that users can use to launch the web browser’s default email program. This can be done by adding the line


<P>You can send me email by clicking <A HREF = “mailto:pyersqr@ku.edu”> pyersqr (at) ku (dot) edu</A>. I am always glad to hear from my fans.</P>


The finished document should look like the example in Figure 2.3.


Figure 2.3 A beginning linguistic survey form

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

<title>Pronoun Anaphora Survey</title>

</head>


<body>

<center>

<p>Pronoun Anaphora Survey</p>

</center>


<p>We are using this web site to collect reader intuitions about

pronoun reference. Please respond to the questions below by

clicking next to the interpretation that matches your intuitions

about the meaning of the pronouns in the example sentences. We

are interested in discovering how you interpret the pronouns and

not how you think the pronouns should be interpreted. Your

responses will help us determine the degree of variation that

presently exists in the understanding of pronouns.</p>


<p>Click here <a href="http://www.linguistics.ku.edu">to

return</a> to my main page</p>


<p>You can send me email by clicking <a href=

"mailto:pyersqr@ku.edu">pyersqr (at) ku (dot) edu</a>. I am always glad to

hear from my fans.</p>


</body>

</html>


            Save this page as ‘Ex2.htm’ and view it in your browser at the URL


http://domain/~user/Perl/Ex2.htm


This page should be much more satisfying.

            There is only one problem, our page seems to promise more than it delivers. How can we actually collect the user’s responses to our questions? For this job, we need to augment our collection of html tags with tags that tell the user’s browser how to send information to our computer and how our computer can respond to the user’s browser. This type of web page is technically a <form>. We need to add the tags that will convert the page in Figure 2.3 into a form. Forms on the web have two parts: a METHOD that tells the computer what to do with the user’s input, and an ACTION that names a computer program that processes the user’s input data. There are two methods for handling user input: POST and GET. The POST method sends the form’s output as a separate message. The GET method appends the form’s output to a URL. The POST method is the most useful method, and we will use it in our application. I will save discussion of the ACTION for chapter 4 since this is where we will learn how to make our web page interact with a Perl program. For now, let’s concentrate on producing a simple form.

            We need to add one other element to our form to collect the user’s responses. The INPUT tag is available for this purpose. The first thing we need to do is organize the user’s responses into useful categories. We can specify the INPUT attribute NAME for this purpose. The INPUT tag also allows us to specify a number of different ways to collect our information. It uses the attribute TYPE for this purpose. The most useful TYPE attributes for our purposes are the following:

 

Text                This option creates a text box in which the user can type a response

Checkbox       This option creates a box that can only have two values: checked and unchecked

Radio              This option allows the user to make one selection from a range of options

Submit             The user clicks on this button to send the form data to the computer

Reset               The user clicks on this button to clear the form and start over again


Finally, we need a way to tell users what information each of these types is collecting. Each of these input types has a VALUE attribute that we can use for this purpose. Figure 2.4 provides an example of a form that collects user intuitions about pronoun reference in the sentence ‘When Mickey came over, Goofy tied his shoes.’


Figure 2.4 A Pronoun Anaphora survey form


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

<title>Pronoun Anaphora Survey</title>

</head>


<body>

<form method="post" action="http://domain/user/program">

<center>

<p>Pronoun Anaphora Survey</p>

</center>


<p>We are using this web site to collect reader intuitions about pronoun reference. Please respond to the questions below by clicking next to the interpretation that matches your intuitions

about the meaning of the pronouns in the example sentences. We are interested in discovering how you interpret the pronouns and not how you think the pronouns should be interpreted. Your

responses will help us determine the degree of variation that presently exists in the understanding of pronouns.</p>


<p>Please read the following sentences and respond to the question about the pronouns that comes after each sentence. You can leave a response blank if you are unsure of your answer. If

you make a mistake, you can click on the reset button to start over. When you have finished the survey, click the send button to send us your responses. Thank you for helping us learn more about pronouns.</p>


<p>Sentences:</p>


<p>(1) When Mickey came over, Goofy tied his shoes.</p>


<p>Does the pronoun 'his' in sentence (1) refer to:<br>

<input name="sent1" type="radio" value="Neither"> Neither Mickey

or Goofy <input name="sent1" type="radio" value="Long distance">

Mickey <input name="sent1" type="radio" value="Short distance">

Goofy <input name="sent1" type="radio" value="Both"> Both Mickey

and Goofy</p>


<p><input type="Submit" value="Send"> <input type="Reset"></p>


<p>Click here <a href="http://www.linguistics.ku.edu">to

return</a> to my main page</p>


<p>You can send me email by clicking <a href=

"mailto:pyersqr@ku.edu">pyersqr@ku.edu</a>. I am always glad to

hear from my fans.</p>

</form>

</body>

</html>


            Now we seem to be getting somewhere! We have a form that tells users about our survey and invites their responses. The form will collect their response to the first sentence in the variable ‘sent1’. This variable will have one of the values ‘neither’, ‘Long distance’, ‘Short distance’, or ‘Both’. In this case, I think it is more convenient to use the values ‘Long distance’ and ‘Short distance’ rather than ‘Mickey’ and ‘Goofy’ since we are interested in whether the user interprets the pronoun as referring to a name that is a short or long distance from the pronoun. We will probably want to add other sentences in which Mickey and Goofy switch places. It would be a mistake to always use one of them as the long distance referent. If we add another sentence to the form, we would need to add a new variable name (e.g., sent2) to the form to collect its data. If we only used one variable, each response to the subsequent sentences would erase the preceding response.

            We have one other feature in this form to consider. I gave our users the option of interpreting the pronoun as referring to neither or both Mickey and Goofy. While linguistic theory may insist that the pronoun should only refer to one or the other, we should always provide the user with a full range of choices. It sometimes takes a good deal of experience before you know what this full range might be. It is worth considering the addition of a comment section to our page to collect such suggestions from users. The introductory paragraph also tells users that they have the option of not responding to a question. Null responses create a small problem for web surveys in that a null response produces a null value for our variable sent1. The program that handles our form will list the responses by a numerical value starting with zero. It will classify a null response as a zero response, and interpret a null response to our question as indicating the user feels the pronoun refers to neither Mickey or Goofy.

            We have two options for solving this dilemma. The easiest would be to add yet another possible response, e.g.,


<INPUT NAME=“sent1” TYPE=“radio” VALUE=“No response”> No opinion


as the first possibility. This solution allows the user to specify a null response overtly, and even if they do not actually chose any of the available responses, the computer will still classify their response as a null response. The problem with this solution is that it complicates the response form for both the user and us. The user has more items to read for each sentence, and we will have to display more information.

            There is a more elegant way to handle such a problem. This method requires the use of the HIDDEN type of INPUT, e.g.,


<INPUT NAME=“sent1” TYPE=“hidden” VALUE=“No response”>


Putting this hidden response first in our list of possible responses insures that null responses will be treated differently from neither responses. Figure 2.5 shows the resulting list of responses.


Figure 2.5 Response types

<INPUT NAME=“sent1” TYPE=“hidden” VALUE=“No response”>

<INPUT NAME="sent1" TYPE="radio" VALUE="Neither"> Neither Mickey or Goofy

<INPUT NAME="sent1" TYPE="radio" VALUE="Long distance"> Mickey

<INPUT NAME="sent1" TYPE="radio" VALUE="Short distance"> Goofy

<INPUT NAME="sent1" TYPE="radio" VALUE="Both"> Both Mickey and Goofy</P>


The first response will not appear on the web page, but if a user does not enter a response for this item, the computer will register their response as ‘no response’.

            If you have created a file with these statements in your Perl subdirectory you should have the beginning of a working linguistic survey for the internet. You can click on one of the choices to specify your response. The form will only allow you to click on one of these responses, so we have to provide users with the proper set of choices. You should also be able to click on the reset button to clear your selection and start over again. Clicking on the send button, though, produces the error message


Not Found


The requested URL /~user/program was not found on this server.


This happened because we have not yet created the program file that we specified in the ACTION part of our form. We need to write a Perl program that tells the server what to do with the information that users send, and before we do that, we need to learn a little about the Perl programming language. All that must wait until the next chapter!



Summary of UNIX commands

 

chmod 755 directory  grant permission for browsers to access files in the directory


Summary of HTML tags

 

<HTML></HTML>                           Declare an html document type

<HEAD></HEAD>                           Declare a head section

<TITLE></TITLE>                           Declare a document title

<BODY></BODY>                           Declare the body of the document

<align="center"><center></center>   Align text in the center of the line

<P></P>                                             Declare a paragraph

<BR>                                                  Declare a new line

<FORM METHOD="post" ACTION="http://domain/~user/program"></FORM>

                                                            Declare a form and specify a method and action

<INPUT NAME=                               Declare the name of an input variable

TYPE="text", "checkbox", "radio", "hidden", "Submit", "Reset" 

                                                            Declare the type of input