Computer Science 3675
Summer 1999
Programming Assignment 3

Due: June 1.

This is an exercise in the use of powerful operations. The idea is to write short and simple programs without using loops or recursion, and yet to get quite powerful functions. In some cases you might need to use recursion, but you can do a lot without it.

This exercise is probably the longest exercise that you will be assigned in this class. However, there is a lot of help in this handout. If you do the program step by step, you will find that the individual steps are very short and simple, and you get a working product quite quickly. Do not try to write the entire program and then test it. Test each function as you go. Failure to do this will cause you grief.

The RSA system

For this exercise, you will encipher and decipher files using the RSA system. It is an industrial strength cryptographic system that works like this.

Selecting the keys.

First, you must select two sets of keys, a public set and a private set. Do this as follows.

  1. Select at random two large prime numbers p and q. (For high security, p and q should have about 100 decimal digits each. You will want to test your program with much smaller numbers than that, but it should be capable of using very large numbers.) Prime numbers p and q must be selected independently of one another, or you will lose security. Do not, for example, choose q to be the next prime number after p. You do not want anybody to be able to guess p and q.

  2. Let n = pq, and let phi = (p-1)(q-1). Select a small number e > 2 such that gcd(e,phi) = 1.

  3. Find an integer d such that (ed) mod phi = 1. (It is guaranteed that such a number d exists.)

The public key is the pair (n,e), and the private key is the pair (n,d).

Enciphering a message.

A value to be enciphered is an integer k where 0 < k < n. The encipher function is
encipher(k) = (k^e) mod n
where ^ indicates exponentiation.

The decipher function has the property decipher(encipher(k)) = k. It is defined by

decipher(k) = (k^d) mod n

Example

. For this example, we choose small prime numbers p = 7 and q = 13. Then n = 91 (since n is 7 times 13) and phi = 72 (since phi is 6 times 12). We can choose e = 5, since gcd(5, 72) = 1. Then d = 29, since 5*29 mod 72 = 1. The encipher and decipher functions are
encipher(k) = k^5 mod 91

decipher(k) = k^29 mod 19

Notice that the encipher and decipher functions have the same form, but different exponents. That is convenient, because then the same program can be used to encipher as to decipher. The program is just given different parameters. In practice, it is sometimes convenient to write separate programs for enciphering and deciphering, but they can share functions.

Also notice that anybody who has access to the public key pair (n,e) can encipher a message, but only somebody who has access to the private key pair (n,d) can decipher a message. Of course, anybody who can figure out the private information from the public information can also decipher messages. But it is generally believed that it is not practically feasible to compute the private key from the public key. (You could do that by factoring n, since then you would know p and q and could compute the private information. But factoring large numbers is very time consuming.)

The Assignment

For this assignment, you will write two Astarte programs, one to encipher and one to decipher. In addition, you will write a package of functions for both to share. The programs should be fairly short and simple. Use recursion only where absolutely necessary, prefering to let powerful operations do the work for you. You will need to import some libraries for this. To get the random number generator, for example, you must say
       Import "misc/random".
and to get the select function, you need
       Import "collect/search".
The documentation tells you which packages to import.

1. Strings and Numbers

The RSA system enciphers numbers. What you want to encipher, however, is a string. Also, you will find it convenient to use a string as the key, rather than numbers. So you need a way to convert from strings to integers and back.

You will want to convert between a character and its character code. Function rank converts a character to its numeric code. For example, rank('a') = 98. Function char converts back. For example, char(98) = 'a'.

Think of each character as its character code. Then a string is just a sequence of bytes, each a number between 0 and 255. If you think of a string as representing a number in base 256, then a string is easily converted to an integer. Actually, however, zeros present a problem. In a string, leading zeros are important. In a number, they are ignored. You can get rid of zeros by adding 1 to each byte, so that each byte becomes a number from 1 to 256 instead of a number from 0 to 255. Then the string can be considered as a number in base 257. This conversion does not lose information; you can convert back to a string and be sure to get the same string as what you started with, as long as your are careful to subtract 1 from each byte.

Write two functions, strToNum and numToStr, that convert from strings to numbers and from numbers to strings, respectively, using the method just outlined.

Your functions use base 257. But to see how to write strtoNum, consider the similar problem of converting a string of decimal digits to a number, as a decimal number. The general principles work for base 257, with some changes to the constants involved.

For convenience, suppose that binary operator $ is defined so that x $ y = 10*x + y. So, for example, 2 $ 4 = 24. Then conversion of string "2" to a number (in decimal) yields 0 $ 2. Conversion of string "24" to decimal yields 0 $ 2 $ 4, and conversion of "245" to a number yields 0 $ 2 $ 4 $ 5, where the operations are done from left to right. So conversion of a string to a number is just a left fold. Keep in mind that, although it is convenient to use a binary operator to see what is going on in a left fold, there is no need for the actual function to be a binary operator. It must just be a function that takes an ordered pair as a parameter. What modification to this idea do you need to do conversion using base 257? Don't forget to add one to each byte before converting. Write the function now.

The second function, numToStr, converts from a number to a string. It must exactly undo what strToNum does, so that numToStr(strToNum(x)) = x. We saw that functon strToNum is a left fold, and its inverse strToNum therefore turns out to be a right unfold. You can read about function runfold if you like, but it is also fine just to implement numToStr recursively, using direct equations. What numToStr wants to do is break a number into n into n mod 257 and n div 257. Think about it.

Put functions strToNum and numToStr in a utility package that the encipher and decipher programs can share. Your package will look something like this. Note the expectation, which tells other packages that functions strToNum and numToStr are defined here, but does not say how they work.

Package RSAUtilities

Export

Expect
  strToNum: String -> Natural;
  numToStr: Natural -> String;
%Expect

Implementation

 Your definitions go here.

%Package
Be sure to check these functions before going on.

2. Selecting Keys

The user will give a string as his or her key. You will want a function that converts that string into triple (n,e,d), where n, e and d are the values selected according to the rules described for the RSA cryptosystem. Put that function into the utility package, and be sure to expect it in the export part. Its type will be String -> (Natural, Natural, Natural).

Standard package misc/random.ast provides a pseudo-random number generator. It starts with a seed value provided to it, and produces a fairly long sequence of random numbers from that seed. You can use the string provided by the user as the seed, and then can use the random number generator to select the prime numbers p and q. The seed is required to be a string of decimal digits, not an arbitrary string. But you can turn any string s into a string of decimal digits by computing $(strToNum s). If x is a string of decimal digits, then

      SetRandomSeed x.
sets the seed to x. Then randomRange(a,b) produces a "randomly" generated number between a and b inclusive. Only set the seed once, not every time you want another pseudo-random number.

Note: If you use a seed that comes from the user, then the numbers generated will depend on the seed that the user selected. This is where the dependence on the user's key comes in.

To select a random prime number, select a random start value k in the range from 2 to n, and build the infinite list [k, k+1, k+2, ...]. Then select the first member of this list that is prime. Useful functions are _upto_, select and prime?. For example, (k _upto_ infinity) produces the desired infinite list. Be sure to select p and q independently, not from the same infinite list.

After selecting prime numbers p and q, you will need to compute n, phi, e and d. When you compute phi, use operator -- instead of -, to make the compiler know that the answer is not a negative number. That is, phi = (p--1)*(q--1). The gcd function is standard.

To compute d, you will need to use modular arithmetic, provided by package number/modular.ast. You can use the following to compute d from e.

      Let d = remainder( /(e _modulo_ phi)).
Here, unary operator / takes the reciprocal modulo phi. (The ordinary reciprocal of x is some number y such that xy = 1. The reciprocal of x modulo phi is some number y such that xy mod phi = 1.)

Try writing the key information to standard output before proceeding. Use short strings as keys. Does it look reasonable? Is e*d mod phi = 1?

3. Enciphering

Write a program that enciphers. You should write
  astr encipher myfile.txt myfile.cph
to place, in myfile.cph, an enciphered version of myfile.txt. The program should ask the user for a key string. To get the key string, just get an entire line from the user, and strip any white space at either end, since it is invisible, but will affect the key. (Should you do this work yourself? Of course not. readLine(stdin!) will produces a line from the standard input. skipWS(x) is the result of stripping all leading white space from the front of string x. How can you strip white space from both ends? (Hint: use reverse.))

The command line arguments can be obtained from commandLine. In the example above, commandLine is the list ["encipher", "myfile.txt", "myfile.cph"].

The content of file myfile.txt can be obtained as the value of expression infile("myfile.txt"). The content is a string.

The string in the file to encipher will, in general, be too long to encipher. You will need to break it down into pieces of a reasonable length, so that you can encipher each piece separately. If the pieces are too short, then you have no security. (An extreme example is choosing the pieces to have only one character in them. In that case, your code is nothing more than the kind of cipher that newspaper readers are invited to break, next to the crossword puzzle, for their amusement. That is certainly not an industrial strength cryptosystem!) If the pieces are too long, the RSA system will lose information. What you require is that, for each piece s, strToNum(s) < n, where n is the value from the RSA key pair. If you take a piece of length k, then you require that k < log base 257(n). Use the log and floor functions to compute k.

There is no need to use recursion to break up the string. There is a function called pullUpToN that will extract a given number of characters from the front of a string. It is a curried function. For example, pullUptoN 4 "abcdefghi" = ("abcd", "efghi"). You can apply it repeatedly using function pulls. pulls (pullUpToN k) will produce the list that is obtained by repeatedly pulling k characters from the string, with the last string possibly being shorter. For example, pulls (pullUpToN 5) "abcdefghijk" = ["abcde", "fghij", "k"].

Get the public key pair, using the function produced above. At this point, you have made the file content into a list of strings, the individual pieces of length k. Change this into a list of numbers. Then change the list of numbers by replacing each number x by encipher(x). This list is the enciphered version of the file. Write it to the file that should hold the enciphered text.

You can use WriteFile to write an entire file (open, write, close all in one). To make the file called f hold string s, use

       WriteFile f,s.

To compute the encipher function, you will need to perform exponentiation. If you write expression (x^e) _mod_ n, you will run into a very serious problem. Imagine that x and e are large, say about 10^50. How large is x^e? It is far too large ever to be stored in any computer that anyone will ever build. To get around this, you must reduce mod n at every step. If you use the exponentiation provided in package number/modular.ast, then this will be taken care of. To compute (x^e) mod n, write remainder((x _modulo_ n)^e).

4. Deciphering

To decipher, reverse the process. Command should write, into file myfile.plain, the deciphered version of myfile.cph.

If you have written a list of numbers into a file called f, you can get the list back by doing a pattern match against the contents of the file. Use

     Assume x: [Natural].
     Match $(?x) = infile(f).
to bind x to the list of natural numbers that was written into file f. At some point you will want to concatenate all of the strings back together, to reverse the process of cutting up the file into short pieces. How can you do that?

Try your program. You should be able to encipher a file and decipher it , and get exactly what you started with. You might find the Unix diff utility useful.

Turning in your program.

Email your programs (the raw programs, NOT the compiler listings), as attachments, to karl@cs.ecu.edu. There should be three attachments, one for the utility package, one for the encipher program and one for the decipher program. The body of the message should be empty. The subject of the message should be
3675 assn3 your name
If you send a question by email, be sure to use a different subject, so that I do not mistake your question for a submission.

Notes

Algorithms.

Note that it is critical how some of the computations are performed. Suppose that x^d is computed by multiplying x by itself d times. If d is about 10^100 , that will take a long time indeed. A much more efficient method is absolutely required. The algorithm works by repeated squaring, allowing it to compute large powers quickly.

What about testing whether a number is prime? How might you test whether a 100 digit number is prime? If you try all factors up to the square root of the number, you have about 10^50 factors to try. That is roughly the number of protons in the earth. A better method is required. The better method has at its heart the following idea. There is a theorem due to Fermat, called Fermat's Little Theorem. It goes like this.

Fermat's Little Theorem: If p is prime, and 0 < x < p, then x^(p-1) mod p = 1.
For example, 4^30 mod 31 = 1. The amazing thing about this theorem is that, with the exception of some extremely rare numbers called Carmichael numbers, if n is not prime, then for at least half of the numbers x where 0 < x < n, you find that x^(p-1) mod p is not 1. To find out whether n is prime, try a few randomly chosen numbers x, and see whether x^(p-1) mod p is 1. If any are found that are not 1, you know that n is not prime, by Fermat's Little Theorem. If all of the results yield 1, then n is probably prime, or is a Carmichael number. It turns out that there is a separate test that will detect Carmichael numbers, but they are rare anyway.

End Effects.

After you break up a file into pieces for encipherment, the last segment of a file can be shorter than the rest. That is a security loophole, since it might be possible to figure out what that last segment is. Can you see how to fix this problem? (Don't bother to do it for your program.)

Export Controls.

The United States Government has placed export controls on strong encryption technology, and the RSA system falls under that heading. At the time of this writing, it is illegal for you to export your program to any other country (assuming that it works)! There is a move to bring some sanity to Washington, and this export control will possibly be removed soon.