Computer Science 3675
Fall 2003
Programming Assignment 3

Due: Mon Oct 20.

This is an exercise to see how a nontrivial program can be written using pure functions. The goal is to write short and simple programs. In a few places you will need to use imperative operations that call for doing something, but that should be kept to a bare minimum. Instead, just describe what you want using equations. (Note: giving something a name, as in Let x = y %Let, is considered a functional style, since all you are doing is naming something, not changing something. On the other hand, Relet x = y %Relet is asking to change the binding of x, and is not a functional style.)

Where possible and convenient, use higher order functions to make your work simpler. Let the tool-building tools build your tools for you.

This is a fairly long assignment. First, read the entire assignment. Then follow the steps listed. If you do the program step by step, you will find that the individual steps are very short and simple, and you get a working product quite quickly.

Do not try to write the entire program and then test it, regardless of how appealing that approach might seem to you right now. Test each function as you go. Failure to do this will cause you grief.


The RSA system

For this exercise, you will encipher and decipher files using the RSA system. (The name comes from the names of the authors of a paper that described it: Rivest, Shamir and Adleman.) RSA is an industrial strength cryptographic system used in such applications as PGP.

Please read how the RSA system works.


The Assignment

For this assignment, you will write three Astarte programs,

  1. A program to compute the key sets (called keygen.ast);
  2. A program to encipher (called encipher.ast);
  3. A program to decipher (called decipher.ast).
In addition, you will write a package of functions for those three packages to share. So there will be a total of four packages. Each package will be fairly short, in the rough vicinity of 30 noncomment lines. (Your line count will vary. Noncomment lines do not count blank lines.)

Break your programs into small, simple functions.

Please include clear and useful comments in your packages. Write them for other people to read. Some rules of thumb about commenting are the following.

  1. Direct your comments to someone who knows a little less than you do, but assume the reader knows the basics of the language and what fundamental library functions do. Do not explain the language.

  2. Use examples in your comments. Showing how your program processes an example can make it much clearer what is going on, if the examples are chosen well.

  3. Write in clear, complete sentences. Spell words correctly, and use correct punctuation.

  4. Write the comments into the program during development, not when you are finished with development. That way, they will help you. Those who write clear comments during development will finish sooner.

You will need to import some library packages. See [Hints on using importing library packages].


Part 1. Strings and Numbers

The RSA system enciphers numbers. What you want to encipher, however, is a string. Also, you will find it convenient to use a string as the key that the user gives. The user might need to remember the key, and strings are easier to remember than numbers. So you need a way to convert from strings to integers and back. You will create a function stringToNumber that converts a string into a number, and numberToString that converts back. (So numberToString(stringToNumber(s)) = s.) These functions will be useful in more than one place, so put them into your package of shared functions. Your package will look something like this. (Note the expect part, which tells other packages that functions strToNum and numToStr are defined here, but does not say how they work. Parts in black should be written exactly as they are here. Parts in red need to be replaced with your definitions.)
Package RSATools

Export

Expect
  stringToNumber: String -> Natural;
  numberToString: Natural -> String;
%Expect

implementation

 Your definitions go here.

%Package

After you write these functions, be sure to test them before going on. Be sure that numberToString(stringToNumber(s)) = s. Below are more details on how to write these functions.

listToNumber

To help with stringToNumber, write a function listToNumber so that listToNumber b x takes a base b and a list of integer digits x, and produces the number that x represents as a base b number. For example, listToNumber 10 [9,4,2] = 942, listToNumber 10 [5,3,7,6] = 5376 and listToNumber 2 [1,1,0,0] = 12. Test listToNumber. [Hint on listToNumber]

stringToNumber

Now write a function stringToNumber so that stringToNumber(s) is a number that string s represents.

A string is a list of characters. Each character has a code, which is a natural number. Function rank will give you the code of a character. For example, rank('a') = 97. If you map rank onto a string, you get a list of numbers. For example, map rank "abc" = [97, 98, 99].

A good approach is to write this function in two stages. Before handling arbitrary characters, write a version that only allows decimal digits. For this initial version, you will get stringToNumber("352") = 352. Write stringToNumber so that it uses a function called myRank in place of rank. Make myRank take a decimal digit and convert it to a number. For example, myRank('3') = 3. You can define it by myRank(c) = rank(c) -- rank('0'). Test this version of stringToNumber.

But the characters in our strings are not necessarily decimal digits. They might conceivably be any value that can be put into a byte. (We might even be asked to encipher a binary file.) An obvious thing to do is just to redefine myRank to be the same as rank, and to use base 256 instead of base 10. That idea is close, but it has a problem. Notice that listToNumber 10 [0,1] = 1 and listToNum 10 [1] = 1. Leading zeros do not affect the result. But that means that it is impossible to recover the original list from the number. If all you have is the number 1, how do you know whether it came from list [1] or from list [0,1] or [0,0,1], etc.? You can avoid this problem by ensuring that the number 0 does not occur in the list. So define myRank(c) = rank(c) + 1. Then your modified character codes will be integers from 1 to 256 instead of integers from 0 to 255.

With the new definition of myRank, you can handle any byte, but you must work in base 257 instead of 256. Modify your stringToNumber function to work in base 257 with this new myRank function. This should be a very easy modification. It will be a little hard for you to test this function, since arithmetic in base 257 is not something that you do every day. That is why it was useful to start in base 10.

numberToList and numberToString

You will also need to convert back from a number to a string. Write a function numberToString so that numberToString(stringToNumber(s)) = s. That is, numberToString does the inverse transformation of stringToNumber. The main part of this is a function numberToList so that (numberToList b n) produces the list of digits represented by number n, in base b, without any leading zeros. For example, numberToList 10 942 = [9,4,2]. [Hint for numberToList]


Part 2. Selecting Keys

The user will give you two things. The first is a security parameter, indicating how much security is desired. A security parameter of 2 will give very low security (but rapid computation), and a parameter of 50 will give very high security (but slow computation). The second part of the input is a key string, which is used to compute the key integers n, e and d.

Write a program that reads the security parameter S and the key string and computes the numbers n, e and d that are part of the RSA key sets. The prime numbers p and q should be chosen to be random prime numbers in the range from 257S to 257S+1. The program should then write the ordered triple (2S, n, e) to one file, called key.pub, and the ordered triple (2S, n, d) to another file called key.priv. (Number 2S (twice S) has been added to help break the file down into parts. See enciphering, below.) You can use function WriteFile to write the files.

To get a string from the user, just get an entire line, and strip any white space at either end, since it is invisible, but will affect the key. Use stringToNatural to convert a string of decimal digits to a natural number. [Hint on getting a line]

Check your results. Do they look reasonable? Print out S, e, d and n to see whether they are reasonable. Is S the value that you typed in? Is e*d mod phi = 1? Just write out e*d mod phi to find out. Turn the lights on for testing.


Part 3. Enciphering

Write a program that enciphers. In the Unix version, you should use command

  astr encipher myfile.txt myfile.cph
to place, in myfile.cph, an enciphered version of myfile.txt. The program should read the key set (2S,n,e) from file key.pub.

The command line arguments can be obtained from commandLine. In the example above, commandLine is the list ["encipher", "myfile.txt", "myfile.cph"]. (For the Windows version, set the command line parameters by typing only the arguments, not the program name, into the the bottom edit box of the window that comes up when you select the run/command lines menu item. For example, you might type myfile.txt myfile.cph.)

The content of file myfile.txt, as a string, can be obtained as the value of expression infile [binaryMode] ("myfile.txt"). What you want is infile [binaryMode] (commandLine#2). See infile.

The string in the file to encipher will, in general, be too long to encipher as a single unit. Recall that decipher(encipher(k)) = k only for k < n, so the numbers to encipher must not be too large. You will need to break the string to encipher down into pieces of a reasonable length, so that you can encipher each piece separately. You need to convert the long string into a list of shorter strings. The first part of the key triple, 2S, tells how many characters to put in each piece. The prime numbers p and q were chosen to be larger than 257S, so n is sufficiently large that, for any string x of 2S characters, stringToNumber(x) < n. [Hint on breaking up the input]

Get the public key triple. [Hint on getting the key triple] At this point, you have made the file content into a list of strings, where the individual pieces have length 2S. Change this into a list of numbers, by using stringToNumber on each string in the list. Then get another list where each number x is replaced by encipher(x). (What kind of operation is this?) This list is the enciphered version of the file. Write it to the file that should hold the enciphered text. [Hint on writing the list]


Part 4. Deciphering

To decipher, reverse the process. For the Unix version, command

   astr decipher myfile.cph myfile.plain
should write, into file myfile.plain, the deciphered version of myfile.cph. It should get the key triple from file key.priv.

The deciphered version of the file should be identical to the original. The Unix command diff compares files, and tells you how they differ. If you do the encipher and decipher operations shown, and then do command

   diff myfile.txt myfile.plain
then you should find no differences. (The diff command will not print anything at all to show that there are no differences.) If there are any differences at all, fix your program.

One thing to watch for is misuse of the $ function. If you apply $ to a string, $ will put quote marks around the string. For example, $("abc") = "\"abc\"". Only run $ on a string if you want to add the quote marks.

See [Hint on reading a list] to see how to read the enciphered file in.


Part 5. Reporting progress

This program will be fairly slow when n is large. After the program works, modify the encipher and decipher programs so that they say how many blocks must be enciphered or deciphered, and then print a dot as each block is finished. Put all of the dots on one line. So use Write instead of Writeln. This should be a very easy modification. If it looks difficult, you are missing something. Note that this is an imperative aspect of the program. You are performing an action while computing results.

Be careful. The Write procedure accumulates a string into a buffer and only prints the buffer when it is ready. You will want to flush the buffer at each write, so that the progress can be seen. Use

   Flush.
to flush the standard output buffer.


Extra credit

There are two security problems with this program. For extra credit implement these improvements. You can add just one of them too. At the top of your encipher file, say which, if any, of these improvements you have made. Stay with a functional style for them.

Credit for these improvements will only be given if the basic features are working.


Short pieces (fairly simple)

After you break up a file into pieces for encryption, the last segment of a file can be shorter than the rest. That is a security loophole, since it might be possible to figure out what that last segment is. An extreme example is where the file just contains one of the words "yes" or "no".

There is a solution to this problem called salting. If the last piece is shorter than 2S characters, then add some random letters to it, padding its length to exactly 2S.

When you decipher, you will need to be able to remove those random letters. An easy way to do that is to write, just before the list of numbers that enciphers the file, what the length of the original file is. Decipher all of the pieces, concatenate them together, and then get a prefix of the desired length, removing the salting.


Sequencing and mixing documents (much harder)

Even if someone cannot decipher documents, he or she might reorder the parts, or even mix the pieces of different documents together. For example, some unscrupulous person might get hold of two enciphered document. He or she might take the first block of one followed by the second block of another, etc., and put together a new documents. Since each individual block is correctly enciphered, the newly manufactured document will appear to be a correctly enciphered document. Or, this unscrupulous person might simple rearrange the pieces of a single document, producing a new document. That kind of thing should be disallowed. This problem can be avoided by adding additional information to each piece before encryption. Select a short random string that identifies this document. Also add a small number of sequence bytes. Use sequence number 0 on the first part, 1 on the second part, etc. It would be reasonable to use two bytes for the sequence number, since that allows documents of over 65,000 parts.

Modify the program to select a random identity and to attach the identity and sequence number to each packet in the document. The decryption program must remove and check that information. It should refuse to decrypt if the identities are not all the same, or if the parts are out of sequence.


Turning in your program.

Turn in the source code of your program using the handin program.

   handin3675 2 
using the alias from assignment 1. Be sure that handin says that it was successful. If do not get a report of success, presume that it was not successful.

Include your tests. After testing one function, do not delete the tests. Keep all of the tests in the program.

Be sure to include your name in the program.