Computer Science 3675
Summer 2004
Programming Assignment 6

Due: Tuesday, July 27

The following is a very simple grammar for a tiny subset of English.

   <sentence>            ::= <noun-phrase> <verb-phrase>
                           | <noun-phrase> is <adjective>
   <noun-phrase>         ::= <proper-noun> 
                           | <determiner> <common-noun>
                           | <determiner> <adjective> <common-noun>
   <verb-phrase>         ::= <transitive-verb> <noun-phrase>
   <determiner>          ::= a | the
   <proper-noun>         ::= archie | jughead | betty | veronica
   <common-noun>         ::= malt | hamburger | dog | shop
   <transitive-verb>     ::= eats | likes
   <adjective>           ::= happy | hungry | sleepy
For example, here is a way to derive sentence "archie likes betty" by rewriting one nonterminal at each step.
  <sentence>    => <noun-phrase> <verb-phrase>
                => <proper-noun> <verb-phrase>
                => archie <verb-phrase>
                => archie <transitive-verb> <noun-phrase>
                => archie likes <noun-phrase>
                => archie likes <proper-noun>
                => archie likes betty

A parser takes a purported sentence as input, and either finds a derivation of the sentence (and so claims that the input is, in fact, a sentence) or claims that the input is not a sentence at all.

Logic programming makes it easy to write parsers. See page 601 and 602 for a brief discussion of parsing using the idea of a difference list. Page 624 discusses the same thing in the context of ISO Prolog, which provides a special notation for writing parsers.


Difference lists

Sometimes, it is convenient to specify a list as a difference of two other lists. If A and B are lists, where B is a suffix of A, then the list A-B is defined to be the list X such that A = X ++ B. For example, [archie,likes,the,hamburger] - [the,hamburger] = [archie,likes]. That is, A-B is the result of removing list B from the end of list A. The difference A-B is only defined when B is a suffix of A.

You might imagine implementing a grammar by writing a predicate for each nonterminal, where A(X) is true if list X can be generated by nonterminal A. Unfortunately, that means you will always be testing an entire list. It is more convenient to write a parser that will extract a prefix of a list that is derivable from a given nonterminal, leaving the suffix behind.

An easy way to accomplish that is to use difference lists. For each nonterminal A, you create a predicate A so that A(X,Y) is true if list X-Y is something that can be generated by nonterminal A. Precisely, you write predicate A so that A(X,Y) means

  1. X and Y are lists and Y is a suffix of X. That is, there is a list Z such that X = Z ++ Y.

  2. The list Z from (1) is a phrase that can be generated by nonterminal A.

For example, in our English grammar, properNoun([archie, eats, the, hamburger], [eats, the, hamburger]) is true since nonterminal properNoun can derive the phrase [archie]. Similarly, verbPhrase([eats, the, hamburger], [ ]) is true.

A logic grammar consists of a predicate for each nonterminal and a collection of axioms, one for each rule in the grammar. For example, the first rule of our grammar is

   <sentence> ::= <noun-phrase> <verb-phrase>

Here is a fact for <sentence> that corresponds to that rule.
   Astarte:  case Sentence(?x,?y). <- Exists r | 
                                        NounPhrase(x,r). 
				        VerbPhrase(r,y).
                                      %Exists

   Prolog:   sentence(X,Y) :- nounPhrase(X,R), verbPhrase(R,Y).
Look at the axiom carefully. It says that, to extract a sentence from the front of list X, leaving behind Y, you can do the following.

  1. Extract a noun phrase NP from the front of X, leaving behind R. That is, find a suffix R of L so that L = NP ++ R for some list NP that is a noun phrase. Notice that NP is never explicitly constructed here, but it is implicitly constructed by finding the suffix R that comes after it.

  2. Extract a verb phrase VP from the front of R, leaving behind Y. Again, VP is not explicitly constructed.

Lexical facts are very simple. For example, the fact that eats is a transitive verb can be stated as follows.

    Astarte: case TransitiveVerb("eats"::?r, ?r). <-
    Prolog: transitiveVerb([eats|R], R).
This says that you can extract a transitive verb, namely eats, from the front of list X, leaving behind R, provided X has the form eats::R.


Assignment

Using these methods, write a logic program that tells whether a list is a sentence, using the simple grammar above. Run your program on a few examples, including at least two sentences and at least two non-sentences. To test whether "archie likes the hamburger" is a sentence, write

  Astarte:  Sentence(["archie", "likes", "the", "hamburger"], []).
  Prolog:   sentence([archie, likes, the, hamburger], []).
Alternatively, in Astarte you can write the sentence and explode it into words.
  Sentence(wsExplode "Archie likes the hamburger", []).
The Prolog interpreter will respond with Yes or No. The Astarte line will succeed if the list is a sentence, and will fail if it is not. To print whether "archie likes the hamburger" is a sentence in Astarte, you might write a tester
  Define TestSentence ?s. =
    Try
      Sentence(wsExplode s,[]).
    then
      Writeln["yes"].
    else
      Writeln["no"].
    %Try
  %Define
Now
  TestSentence "archie likes the hamburger".
prints yes.

For this exercise, use either Astarte or Prolog. You must use logic programming style for this program. To use Prolog, use the following notes.


Writing and running Prolog programs

In prolog, you write [A|B] for the list whose head is A and whose tail is B.

Be sure to put a period at the end of every axiom. In Prolog, variables start with upper case letters, and constants start with lower case letters. Be careful about that. Predicate names also start with lower case letters. If you write Archie, you are writing a variable, not a constant. If you say

        properNoun([Archie|R] , R).
you are claiming that every word is a proper noun, since variable Archie can stand for any word at all.

I have not installed ISO Prolog, but you can use the pl command on the Unix computers in the lab to run SWI Prolog. To use pl, first compile your file. If your Prolog program is in file myprog.pl, type

  pl -o myprog -c myprog.pl
This only compiles your program into an intermediate code. To run your program, use command
  myprog
You will be prompted for goals. Each goal must end on a period. For examle you might type goal
  sentence([archie, likes, the, hamburger], []).
After a result is shown, you can type a semicolon to ask for more solutions.

To exit from SWI Prolog, you can type a control-D character.


Important notes for writing this in Astarte

Choice mode

In Astarte, the default mode for trying cases is to select only the first one that works. You do not want that. You wan to use backtracking. You can request backtracking by using choice mode all. For example,

  Define all
    case TransitiveVerb(("eats"::?r, ?r). <-
    case ...
  %Define

Mutual recursion

If you have several mutually recursive definitions, you can compile them together by putting them in a team. It has the form

  Team
    Define ... %Define
    Define ... %Define
    ...
  %Team
Inside a team, you can use something that is defined later in the team. The compiler does two passes over the team.

End markers

We are using predicate names that begin with upper case letters. They need end markers or periods. If you look at the definition cases, you will see periods that end the predicate calls. The periods are required even in the heading.


What to turn in

Turn the source of your program, using the submit program as before. Submit as assignment 6.