Computer Science 3675
Summer 1999
Programming Assignment 9

The following is a very simple grammar for a tiny subset of English.

   <sentence>            ::= <noun-phrase> <verb-phrase>
                           | <noun-phrase> is <adjective>
   <noun-phrase>         ::= <proper-noun> 
                           | <determiner> <common-noun>
                           | <determiner> <adjective> <common-noun>
   <verb-phrase>         ::= <transitive-verb> <noun-phrase>
   <determiner>          ::= a | the
   <proper-noun>         ::= archie | jughead | betty | veronica
   <common-noun>         ::= malt | hamburger | dog | shop
   <transitive-verb>     ::= eats | likes
   <adjective>           ::= happy | hungry | sleepy

For example, here is a way to derive sentence "archie likes betty".

  <sentence>    => <noun-phrase> <verb-phrase>
                => <proper-noun> <verb-phrase>
                => archie <verb-phrase>
                => archie <transitive-verb> <noun-phrase>
                => archie likes <noun-phrase>
                => archie likes <proper-noun>
                => archie likes betty

A parser for a grammar does this process backwards. It takes a purported sentence as input, and either finds a derivation of the sentence (and so claims that the input is, in fact, a sentence) or claims that the input is not a sentence at all.

Logic programming makes it easy to write parsers for grammars. There is more than one form of "logic grammar". We will use one that is fairly easy to understand.

A purported sentence is a list of symbols. Sometimes, it is convenient to specify a list as a difference of two other lists. If A and B are lists, where B is a suffix of A, then the list A-B is defined to be the list X such that A = X ++ B. For example, [archie,likes,the,hamburger] - [the,hamburger] = [archie,likes]. That is, A-B is the result of removing list B from the end of list A. It is only defined when B is a suffix of A.

You might imagine implementing a grammar by writing a predicate for each nonterminal, where A(X) is true if list X can be generated by nonterminal A. It turns out to be more convenient to use difference lists. For each nonterminal A, you create a predicate A so that A(X,Y) is true if list X-Y is something that can be generated by nonterminal A. Precisely, you write predicate A so that A(L,Y) means

L and Y are lists and Y is a suffix of L. That is, there is a list X such that L = X ++ Y.
The list X from (1) is a phrase that can be generated by nonterminal A.

For example, in our English grammar, properNoun([archie, eats, the, hamburger], [eats, the, hamburger]) is true since nonterminal properNoun can derive the phrase [archie]. Similarly, verbPhrase([eats, the, hamburger], [ ]) is true.

A logic grammar consists of a predicate for each nonterminal and a collection of axioms, one for each production in the grammar. For example, here is a fact for <sentence> that comes from the first production.

   sentence(L,Y) :- nounPhrase(L,R), verbPhrase(R,Y)

This fact says that a sentence can be a noun phrase followed by a verb phrase. Look at it carefully. It says that, to extract a sentence from the front of list L, leaving behind Y, you can do the following.

Extract a noun phrase NP from the front of L, leaving behind R. That is, find a suffix R of L so that L = NP ++ R (for some list NP) and NP is a noun phrase. Notice that NP is never explicitly constructed here, but it is implicitly constructed by finding the suffix R that comes after it.
Extract a verb phrase VP from the front of R, leaving behind Y. Again, VP is never explicitly constructed.

Lexical facts are very simple. For example, the fact that eats is a transitive verb can be stated as follows.

    transitiveVerb([eats|R], R).

This says that you can extract a transitive verb, namely eats, from the front of list L, leaving behind R, provided L has the form [eats|R].

Using these methods, write a logic program that tells whether a list is a sentence, using the simple grammar above. Run your program on a few examples, including at least two sentences and at least two non-sentences.

For this exercise, use Prolog. Write your rules in Prolog syntax. Be sure to put a period at the end of every axiom. In Prolog, variables start with upper case letters, and constants start with lower case letters. Be careful about that. If you write Archie, you are writing a variable, not a constant. If you say

        properNoun([Archie|R] , R).

you are claiming that every word is a proper noun, since variable Archie can stand for any word at all.

We have two implementations of Prolog available. BinProlog is started by command bp. To use it, write your program in a file called something like myprog.pl, and type consult ("myprog.pl"). after entering bp. Then give goals. Each will be answered.

SWI prolog is started by command pl. To use pl, first compile your file. If your Prolog program is in file myprog.pl, type

  pl -o myprog -c myprog.pl

This only compiles your program into an intermediate code. To run your program, use command

  myprog

You will be prompted for goals. Each goal must end on a period. After a result is shown, you can type a semicolon to ask for more solutions.

Try a few sentences and a few nonsentences. Try to exercise all of your rules. Turn the source of your Prolog program and a script of the execution of your tests. You can create a script by typing command script. See the man page for script.

Computer Science 3675 Summer 1999 Programming Assignment 9

Computer Science 3675
Summer 1999
Programming Assignment 9