Ambiguity

The parsers that we implemented last week simplify the parsing task in several important respects. Instead of finding a single parse for a sentence, there are usually multiple parses that could be constructed. These ambiguities must be tackled by any parser that works with natural language input. The two main types of ambiguity the parser faces are structural ambiguity and lexical ambiguity. Structural ambiguity occurs when a string of words has more than one syntactic structure. For example, the instruction ‘Move the egg in the bowl’ could mean to get the bowl and move the egg, or to move the egg that is presently sitting in the bowl. The noun phrase and prepositional phrase may be either separate constituents (the first meaning), or a single constituent (the second meaning). Lexical ambiguity will create difficulties for a parser if a word belongs to more than one part of speech. Many words in English, e.g., ‘block’ may be used as either nouns or verbs. A parser processing lexically ambiguous words will have to try both options to parse every sentence with such words. In this chapter we will implement procedures for handling both types of ambiguity.

Before implementing specific solutions to parsing ambiguous inputs, we should consider our options. Parsers can take one of the following general approaches to dealing with ambiguous input:

1. Backtracking: the program records each decision point, and backtracks to the last decision point when a specific parse fails.

2. Look-ahead parsing: the program maintains a buffer of unparsed elements that can be used in deciding between possibilities.

3. Parallel parsing: the program builds a parallel parse at each decision point, and discards any parses that ultimately fail.

4. Partial parsing: the program selects a limited set of constituent types to parse.

We will briefly examine how each of these approaches might parse the structurally ambiguous sentence ‘Move the egg in the bowl’.

Backtracking

Backtracking is commonly found in non-deterministic parsers. Deterministic parsers do not contain any decision points. The program takes a pre-specified path for every input. The parsers that we wrote in the previous chapter implement deterministic parsing algorithms (since their inputs were unambiguous symbol strings). A non-deterministic parser, on the other hand, maintains a list of decision points so that it can backtrack if it discovers it made the wrong choice.

Creating a parse for our structurally ambiguous sentence requires deciding between the following structural possibilities

VP VP

gi fh

NP PP NP

fh tgy tgy

V DET N P DET N V DET N PP

tgy

Move the egg in the bowl P DET N

Move the egg in the bowl

These structures are created by the following phrase structure rules

1. VP –> V NP PP

2. VP –> V NP

A shift-reduce parser would create a verb phrase constituent on encountering the verb. It would also add an object noun phrase for both rules. The parser would meet its first decision point when it encounters the preposition. The parser must decide whether to attach the preposition to the verb phrase (rule one) or to the object noun phrase (rule two). A non-deterministic parser would pick one of these rules and use it to parse the prepositional phrase. It could save the other parsing option(s) in an agenda that it could consult later if the parser needed to backtrack to this decision point.

A chart is commonly used to preserve information about the parse state and enable backtracking. A top down parser using our two rules for verb phrases would create a chart like the one shown in Figure 8.1. The parser would enter both verb phrase rules in it chart,

and the lexical categories for each word in the input. When the parser needs to backtrack to a previous decision point, it can easily find the information it needs in the chart.

Figure 8.1 A chart parse

VP (rule 1)
VP (rule 2)
V1	NP	PP
V2	NP

Looking Ahead

The second approach to resolving structural ambiguities in the parse would be to expand the parsing window so that the parser can see several words ahead rather than just the next word. This window is referred to as a ‘buffer’, and commonly holds the next three words in the input. Applying the look-ahead approach to our structurally ambiguous sentence ‘Move the egg in the bowl’, we can begin at the point where the parser has constructed a verb phrase node after scanning the initial verb. The parse would then be ‘VP –> V’. The buffer would contain the three constituents following the verb, namely DET, N, and P. Rather than immediately trying to attach the determiner to the verb phrase parse, a look-ahead parser would first try to construct a constituent out of the constituents in its buffer. Since the buffer contains a preposition, the parser could construct a noun phrase that contains an embedded prepositional phrase (our second rule). After looking for further elements of the prepositional phrase in the input, the parser could reduce the buffer constituents to a single NP, which could then be attached to the verb phrase as the direct object.

Parallel Parsing

The third option is to simply keep a list of the possible parses as the parser encounters each decision point. On encountering our example sentence, a parallel parser would add both the V NP PP parse and the V NP parse to its parse list. It could then report that both possibilities exist syntactically, and turn over the final decision on the parse to a semantic module. Parallel parsing offers the easiest way to simultaneously track all of the possible parses created by ambiguity, but in practice the number of possible parses can overwhelm the finite resources of the computer’s memory.

Partial Parsing

A final option is to limit the types of constituents the parser will attempt to work on. It may not be necessary to identify the exact syntactic structure of our verb phrase. Just knowing that it contains a noun phrase and a prepositional phrase may provide sufficient information for the operation of the program. The syntactic parser can pass this constituent information to a semantic module for the final decision on the structure of the verb phrase. Other sources of information, such as the verb’s subcategorization or statistical information about the verb’s common syntactic contexts, can also be used to guide the final parse decision.

Certain constituents can be identified quite reliably, e.g.,

common nouns and their prenominal elements–the prenominal elements include every word between the determiner and the final noun in the group. This group does not include any following prepositional phrases, since these do not have an unambiguous attachment to the noun phrase.

auxiliary verbs and the main verb–the auxiliary verbs constitute a small lexical class with highly restricted sentence positions. The presence of an auxiliary verb makes it easy to identify the main verb in the sentence.

pronouns and proper nouns–pronouns also constitute a small lexical class that is commonly used in core syntactic relations such as subject and object. Proper nouns can be identified through the absence of determiners.

Partial parsing is commonly used in programs that automatically generate a summary of information in news articles, or that classify web documents. A full parse is unnecessary in these programs since the pertinent information can be found in the main syntactic constituents–the subject, verb and object.

Structural Ambiguity

We can now consider how to implement a parse procedure in Perl for our structurally ambiguous verb phrases that contain a noun phrase and a prepositional phrase. We need to allow for the possibility that a prepositional phrase will sometimes modify a noun and sometimes modify the verb. The solution is to include two definitions for noun phrases, one with and one without a following prepositional phrase.

$NP1 = ‘DET ADJ N’

$NP2 = ‘DET ADJ N( P DET ADJ N)*’

Since the verb phrase is the site of this ambiguity, we will be forced to define two types of verb phrases. This approach implements a parallel parse of the verb phrase. Figure 8.2 provides an example of a program that can process the structural ambiguities created by the attachment of prepositional phrases.

Figure 8.2 A program for processing structural ambiguity

#!usr/local/bin/perl

# ambi2.pl

# demonstrate ambiguous parsing with recursion

%lex = ("I" => "N",

"block" => "N",

"egg" => "N",

"table" => "N",

"bowl" => "N",

"floor" => "N",

"put" => "V",

"has" => "V",

"is" => "V",

"get" => "V",

"move" => "V",

"see" => "V",

"on" => "P",

"in" => "P",

"with" => "P",

"a" => "DET",

"an" => "DET",

"the" => "DET");

print "Please type a sentence\n";

chop( $input = <> ); #Get the string from the standard input

# Main program loop

until ( $input eq 'thanks' ) {

@words = split / /, $input;

foreach $word (@words) {

$sent = $sent . $word . ':' . $lex{$word} . ' ';

}

chop ($sent);

print " sent = $sent\n";

$input =~ s/(\w+)/$lex{$1}/g;

parse ($input);

print " parse1 = $parse1\n";

print " parse2 = $parse2\n";

$sent = '';

chop( $input = <> ); #Get another string from the standard input

} # end until

############ Subroutines ################

sub parse {

$NP = "(DET )?(ADJ )*(N)";

$PP = "(P )($NP)";

$NP2 = "($NP( $PP)*)";

$V1 = "V( $NP) ?";

$V2 = "V ?";

my $string1 = shift;

my $string2 = $string1;

$string1 =~s/($NP2)//; # extract the subject

my $subject = $1;

$parse1 = NP ($subject) . " " . VP1 ($string1);

$string2 =~s/($NP2)//; # extract the subject

$subject = $1;

$parse2 = NP ($subject) . " " . VP2 ($string2);

} # end sub parse

sub NP { # parse NP

my $string = shift;

my $parse;

$string =~ s/($NP)//;

$parse = "NP[$1" . PP ($string) . "]"; # call PP

} # end sub NP

sub PP { # parse PP

my $string = shift;

my $parse;

if ( $string !~ s/(P) ($NP)// ) { # end recursion

return;

}

else { # PP recursion

$parse = " PP[$1 NP[$2" . PP($string) . "]]";

}

} # end sub PP

sub VP1 { #VP = V NP PP

my $string = shift;

my $parse;

if ( $string =~ s/$V1// ) {

my $np = $1;

if ( $string =~ /^ $PP/ ) { #A PP verb complement?

$parse = "VP[V " . NP ($np) . PP ($string) . "]"

}

else { $parse = "VP[V " . NP ($np) . "]" } #A direct object

} # end if $string

else { $parse = "VP[V]"

}

} # end sub VP1

sub VP2 { #VP = V NP

my $string = shift;

my $parse;

$string =~ s/($V2)//;

if ( $string =~ /^ $PP/ ) { #A PP verb complement?

$parse = "VP[V" . PP ($string) . "]"

}

elsif ( $string !~ /$NP/ ) { #No direct object?

$parse = "VP[$1]";

}

else { $parse = "VP[V " . NP ($string) . "]" } #A direct object

} # end sub VP2

Lexical Ambiguity

In addition to processing structurally ambiguous sentences our parser will have to deal with words that are lexically ambiguous. Our current program uses a vocabulary listing that only supplies one part of speech for each lexical entry. English allows most words to belong to several different parts of speech. Thus, it is possible to use the words move and block as either nouns or verbs. The associative array that we are currently using to store lexical information will not allow us to add alternative entries for words since it only permits one entry for each hash key. Adding a second entry for block as a verb erases the previous listing as a noun.

The simple solution to this problem is to allow for multiple values in the associative array. Instead of adding an additional entry to the array we allow each element of the array to take on multiple values. We change the entries for move and block from

"move" => "V",

"block" => "N",

"move" => "V;N",

"block" => "N;V",

This minor change requires major changes to our program. First we will need to add a routine to process the lexically ambiguous entries. The ambiguous entries will lead to the construction of multiple input strings that the program will have to record and parse. Finally, we will need to keep track of which parses succeed, and therefore which reading of the lexically ambiguous words the parser used when it succeeded. Let’s solve these problems one at a time.

Our program starts by translating the words in the input sentence into a string of terminal categories. The routine we need to process lexically ambiguous entries can function by producing multiple candidate strings from the ambiguous input in a way that is similar to the procedure that deals with the problem of structural ambiguity. In effect, the procedure will construct an array of candidate strings which the parser can then evaluate. Every sentence that contains ambiguous words, such as ‘I move the block’ will produce an array of strings, e.g.,

N N DET N

N V DET N

N N DET V

N V DET V

When our parser evaluates the strings in this array, it should choose the second string as its parse in which move is a verb and block is a noun. This is the only reading that makes syntactic sense.

All we need to do at this point is to write a subroutine that translates the input sentence into an array of candidate strings. I provide an example of the subroutine ambilex in Figure 8.3.

Figure 8.3 The subroutine ambilex

sub ambilex {

my $no1 = 0; # initialize loop index for old string array

my $no2 = 0; # initialize loop index for expanded string array

my $index = 0; # initialize loop index

foreach $word (@_) { # @_ is the implicit array variable that refers to the words in the sentence

my @ambig = split /;/, $lex{$word} ; # put syntactic categories in @ambig

$no1 = $#string; # $no records the index of the last element in the array of strings

@string = (@string) x @ambig; #replicate the string array by the number of ambiguities

my $i = 0;

$no2 = $#string; # the index of the expanded array of strings

foreach $ambig (@ambig) { #pick one of the ambiguous values

for ($index = 0; $index <= $no1; $index = $index + 1) {

$string[$i] = $string[$i] . $ambig . ' '; #add the ambiguous value to each string in the array

if ( $no2 > 0 ) {

$i = $i + 1;

} #end if

} #end for index

} #end foreach ambig

} #end foreach word

} #end sub ambilex

The ambilex subroutine is technically a Perl function since it returns a value to the statement that called it. Ambilex returns its value(s) in the array @string, which saves the information about each of the possible terminal strings that ambilex finds. Ambilex operates on an array of words in the input sentence. To apply the ambilex function, we have to apply it to this word array in the calling statement:

ambilex(@words);

This function call allows Perl to associate the implicit array variable @_ in the subroutine with the array variable named in the function call. This procedure may seem complicated at first, but it enables us to reuse the subroutine with any array rather than restricting its use to a single array name.

The subroutine works by first adding the syntactic categories for each word to the protected array @ambig. The subroutine then constructs new strings by adding the syntactic category(s) for the new word onto each string in the array of old strings, @string. If a word is lexically ambiguous, ambilex expands the string array by the number of ambiguous syntactic categories in the statement

@string = (@string) x @ambig;

This statement uses the Perl array operator x to replicate the string array by the number of elements in the array @ambig. This statement relies on Perl’s assumptions about variables since the x operator assumes the variable to its right (@ambig) is a number, and thus creates a number by counting the number of elements in the @ambig array. The x operator also assumes the variable to its left is the item to be replicated. We need to put the string array variable in parentheses in this context to tell the x operator that we wish to duplicate each of the strings in the string array rather than the size of the string array. I suggest testing your knowledge of Perl variables and implicit values by changing this statement in a small program and examining its output.

The full program is shown in Figure 8.4. It analyzes lexically ambiguous words in the package Ambi.pm (Figure 8.5) and deals with syntactic ambiguity in the package Parse.pm (Figure 8.6). The package Parse prints out the results.

Figure 8.4 A Perl parser for lexically and syntactically ambiguous sentences

#!usr/local/bin/perl

# ambi6.pl = ambi3

# uses Ambi.pm and Parse.pm

use Ambi;

use Parse;

my %lex = ("I" => "N",

"block" => "N;V",

"egg" => "N",

"table" => "N;V",

"bowl" => "N",

"floor" => "N",

"put" => "V",

"has" => "V",

"is" => "V",

"get" => "V",

"move" => "V;N",

"see" => "V",

"on" => "P",

"in" => "P",

"with" => "P",

"a" => "DET",

"an" => "DET",

"the" => "DET");

print "Please type a sentence\n";

chop( $input = <> ); #Get the string from the standard input

# Main program loop

until ( $input eq 'thanks' ) {

my @words = split / /, $input;

my @string = ambilex(\@words, \%lex); #ambilex constructs a string for each ambiguous word

parse(@string); # parse produces a parse for each string of terminal elements

chop( $input = <> ); #Get another string from the standard input

} #end until

Figure 8.5. The package Ambi.pm

#!/usr/local/bin/perl

# Ambi.pm

# Works with ambi6.pl

# Produces ambiguous strings for input to Parse.pm

package Ambi;

use Exporter;

our @ISA = ('Exporter');

our @EXPORT = qw(&ambilex @string);

sub ambilex {

my $no1 = 0; # initialize loop index for old string array

my $no2 = 0; # initialize loop index for expanded string array

my $index = 0; # initialize loop index

my @words = @{$_[0]};

my %lex = %{$_[1]};

my @string = '';

foreach $word (@words) {

@ambig = split /;/, $lex{$word} ;

$no1 = $#string; # $no records the index of the last element in the array of strings

@string = (@string) x @ambig; #replicate the string array

my $i = 0;

$no2 = $#string; # the index of the expanded array of strings

foreach $ambig (@ambig) { #pick one of the ambiguous values

for ($index = 0; $index <= $no1; $index = $index + 1) {

$string[$i] = $string[$i] . $ambig . ' '; #add the ambiguous value

if ( $no2 > 0 ) {

$i = $i + 1;

} #end if

} #end for index

} #end foreach ambig

} #end foreach word

return @string;

} #end sub ambilex

return 1;

Figure 8.6. The package Parse.pm

#!/usr/local/bin/perl

# Parse.pm

# Works with ambi6.pl and Ambi.pm

# test syntactic module

package Parse;

use Exporter;

our @ISA = ('Exporter');

our @EXPORT = qw(&parse );

my $NP = "(DET )?(ADJ )*(N)";

my $PP = "P $NP";

my $NP2 = "$NP( $PP)*";

my $V1 = "V ?($NP)? ?";

my $V2 = "V ?";

sub parse {

my $index = 0; # initialize the index for strings

my @string = @_;

my $string = '';

foreach $string (@string) {

chop($string);

print "\nstring[$index] is: $string\n";

my $string1 = $string;

my $string2 = $string1;

if ( $string1 =~s/($NP2) ($V1)// ) { # VP with object NP

my $subject = $1; my $object = $10;

$parse1 = NP ($subject) . " VP[V" . NP ($object) . PP ($string1) . "]";

print " parse 1 = $parse1\n";

}

$string2 =~s/($NP2) ($V2)//; # plain VP

$subject = $1;

if ( $string2 =~ /^$PP/ ) { #A PP verb complement?

$parse2 = NP ($subject) . " VP[V" . PP ($string2) . "]"

}

elsif ( $string2 !~ /$NP/ ) { #No direct object?

$parse2 = NP ($subject) . " VP[V]";

}

else { #A direct object

$parse2 = NP ($subject) . " VP[V" . NP ($string2) . "]";

}

print " parse 2 = $parse2\n";

$index = $index + 1; #increment the string index

} # end foreach

} # end sub parse

sub NP { # parse NP

my $string = shift;

my $parse;

if ( $string =~ s/($NP)// ) {

$parse = " NP[$1" . PP ($string) . "]"; # call PP

}

} # end sub NP

sub PP { # parse PP

my $string = shift;

my $parse;

if ( $string !~ s/(P) ($NP)// ) { # end recursion

return;

}

else { # PP recursion

$parse = " PP[$1 NP[$2" . PP($string) . "]]";

}

} # end sub PP

Statistical Methods of Ambiguity Resolution

Statistical analyses of texts offers a non-linguitic approach to resolving ambiguous parses. The basic idea is that you can use the probability of one word being followed by another as a basis for deciding how to parse specific sentences. To be effective, though, the probability estimates should be based on large samples of tagged texts. Fortunately, such samples are now available. The Brown corpus contains about a million words, all labeled with their parts of speech. The Penn Treebank is another source of tagged text.

These texts can be analyzed to calculate the conditional probability that word B follows word A. We know that each word has a certain probability of occuring by itself, but we would like to know if the occurrence of word A influences the occurrence of word B. Assume that word A occurs in 30 out of 100 sentences. We find that word B occurs in 15 of the sentences where word A appears. We then know that word B occurs in half of the sentences in which we find word A (15 out of 30). Put another way, the probability of finding word B in sentences containing word A is .5. Statisticians write the conditional probability of finding word B given word A as PROB(word B | word A). Conditional probability is defined by the formula:

PROB(e | e’) = PROB(e & e’) / PROB(e’)

where PROB(e & e’) is the probability of the two events e and e’ occurring together.

From the above example, we know that PROB(word A) = .3 and PROB(word A & word B) = .15. Using the formula for conditional probability, we can calculate the PROB(word B | word A) as:

PROB(word B | word A) = PROB(word B & word A) / PROB(word A)

= .15 / .30

= .5

We can also use Bayes’ theorem to calculate the PROB(word A | word B) from the PROB(word B | word A). This relation is useful when you can’t calculate the conditional probability directly. Bayes’ theorem is:

PROB(B | A) * PROB(A)

PROB(A | B) = PROB(B)

Using Bayes’ theorem, we can estimate the conditional probability of finding word A in a sentence that contains word B if we know the probability of finding word B. Assuming that word B occurs in 20 out of 100 sentences, the conditional probability is:

PROB(word B | word A) * PROB(word A)

PROB(word A | word B) = PROB(word B)

= (.5 * .30) / .2

= .75

Conditional probabilities are useful if the occurrence of one event influences the occurrence of another. However, the events may be independent of one another. Two events are independent of one another if the occurrence of one event does not affect the probability that the other will occur. Following the previous example, assume that word C occurs in 60 of the 100 sentences, and that word B occurs in 12 of these sentences. The probability of word B given the occurrence of word C is PROB(word B | word C) = 12/60 = .2. Since this is equal to the probability that word B occurs by itself, we can conclude the occurrence of word B is independent of word C. Formally, two events A and B are independent if and only if:

PROB(A | B) = PROB(A)

Using the definition of conditional probability, we can also show:

PROB(A & B) = PROB(A) * PROB(B)

Given the previous information, we can show that the occurrence of word A is not independent of word B since PROB(word B & word A) = .15 while PROB(word B) * PROB(word A) = .2 * .3 = .06. Words A and B appear together much more frequently then we’d expect by chance.

We can put these ideas to work in a parser to determine the probability that a word like move is a noun or verb. The problem can be stated formally as finding PROB(Cat = N | move) and PROB(Cat = V | move). These conditional probabilities can be calculated as:

PROB(N | move) = PROB(move & N) / PROB(move)

PROB(V | move) = PROB(move & V) / PROB(move)

This means we need to find PROB(move & N) and PROB(move & V). We can estimate these probabilities from a text database like the Brown corpus. Assume we have a corpus of simple sentences that contain a total of 1,273,000 words. The word move occurs 1000 times in the corpus, 400 times as a N, and 600 times as a V. The probability of randomly selecting the word move in this corpus is then

PROB(move) = 1000 / 1,273,000 = .0008

The probabilities of finding move as a noun or verb are:

PROB(move & N) = 400 / 1,273,000 = .0003

PROB(move & V) = 600 / 1,273,000 = .0005

We can then use this information to estimate how often move appears as a verb:

PROB(V | move) = PROB(V & move) / PROB(move)

= .0005 / .0008 = .625

This doesn’t seem like much of an improvement given that we already knew that move appears as a verb 60% of the time (600 out of 1000). The usefulness of this method increases with increased information about the contexts of use for each word. For example, assume we have a sequence of words w₁ ... w_n that correspond to a sequence of lexical categories C₁ ... C_n. The conditional probability for such a sequence would then be

PROB(C₁ ... C_n) | w₁ ... w_n)

You can see that it would take an enormous amount of time to calculate these conditional probabilities for sequences of words (of variable lengths) in a million word corpus. Bayes’ theorem can be used to make this problem a little more tractable. Using Bayes’ theory, the conditional probability is:

(PROB(C₁ ... C_n) * PROB(w₁ ... w_n | C₁ ... C_n)) / PROB(w₁ ... w_n)

In such situations, it is practical to just search for the maximum conditional probability. This allows us to ignore the common denominator PROB(w₁ ... w_n) and focus on finding the largest numerator. Even so, we may still not be able to estimate exact values for the numerator given a large corpus. In this situation, we can reduce the problem to a practical scale by making certain assumptions. The most common assumption is to restrict the sequence of categories to just two or three. The bigram model looks at category pairs and estimates the conditional probability that category C_i will follow category C_i-1. The trigram model uses the conditional probability that one category will follow two preceding categories. The trigram model will generally produce better estimates since it analyzes more of the context. I will use the bigram model here to illustrate the general approach. The bigram model generates the following estimate:

PROB(C₁ ... C_n) = ∏_{i=1, n} PROB(C_i | C_i-1)

The conditional probability for the first word in a sentence is PROB(C₁ | 0). The estimate for the probability of the sequence DET N V N using bigrams is then

PROB(DET N V N) = PROB(DET | 0) * PROB(N | DET) * PROB(V | N) * PROB(N | V)

The probability PROB(w₁ ... w_n | C₁ ... C_n) can be estimated by assuming that the category for each word is independent of the preceding or succeeding categories. Using this assumption, the conditional probability for the word sequence is approximately the product of the probability that each word occurs as the indicated part of speech:

PROB(w₁ ... w_n | C₁ ... C_n) = ∏_{i=1, n} PROB(w_i | C_i)

Now, if we want to find the maximum probability for a sequence of words, we can look for the sequence C₁ ... C_n that maximizes the value of

∏_{i=1, n} PROB(C_i | C_i-1) * ∏_{i=1, n} PROB(w_i | C_i)

Although we have made a number of assumptions to get to this formula, we now have a practical way to estimate the probability of a given sequence of lexical categories. The bigram probabilities can be calculated by counting the number of times each pair of categories occurs compared to the times each individual category occurs. The probability that a verb follows a noun is simply:

Number(N at position i-1 and V at i)

PROB(C_i=V | C_i-1=N) = Number(N at position i-1)

The following table provides some artificial bigram frequencies as an example