Hints for assignment 3


Importing library packages

You will need to import some libraries for this assignment. To get the random number generator, for example, you must say

       Import "misc/random".
and to get the select function (which you will find useful), you need
       Import "collect/search".

The documentation tells you which packages to import for each function. See the library by category for brief descriptions of library functions.


Getting the prime numbers p and q

You need to select two prime numbers p and q in the range from 257S to 257S+1. To select a random prime number, select a random start value k in the range from 257S to 257S+1, and build the infinite list [k, k+1, k+2, ...]. Then select the first member of this list that is prime. Useful functions are _upto_, select (package collect/search.ast) and prime? (package math/prime.ast). For example, (k _upto_ infinity) produces the desired infinite list [k, k+1, ...].

Be sure to select p and q independently, not from the same infinite list. Write a function that selects a random prime number in a given range, and use it twice.

See the next section for how to get a random number.

Warning. The ^ operator has higher precedence than the + operator. Pay attention to what you write.


Getting "Random" Numbers

Package misc/random.ast provides a pseudo-random number generator. It starts with a seed value provided to it, and produces a fairly long sequence of random numbers from that seed. If x is a string of decimal digits, then

      SetRandomSeed x.
sets the seed to x. String x should have at least 40 digits, although it can be shorter than that. See random number generator.

You can use the string provided by the user as the seed. The seed is required to be a string of decimal digits, not an arbitrary string. But you can turn any string s into a string of decimal digits using your stringToNumber function. Compute $(stringToNumber s), since stringToNumber converts a string to a number, and built-in function $ converts a number into its decimal representation.

After setting the seed, you can use the random number generator to select the prime numbers p and q. Only set the seed once, not every time you want another pseudo-random number. Expression randomRange(x,y) produces a "randomly" generated number between x and y inclusive.

Note: If you use a seed that comes from the user, then the numbers generated will depend on the seed that the user selected. This is where the dependence on the user's key comes in. A different key yields a different seed which yields different numbers p and q.

Note: The random number generator is not functional. Each time you call randomRange(x,y), a new pseudo-random number is created. This is one of the small non-functional aspects of your program.


Getting phi and e

After selecting prime numbers p and q, you will need to compute n, phi, e and d. When you compute phi, use operator -- instead of -, to make the compiler know that the answer is not a negative number. That is, phi = (p--1)*(q--1). (x -- y is |x-y|.) Use the select function to select e from the list [3,4,5,...]. You are guaranteed to find a suitable value for e very quickly. What is the predicate that e must satisfy? The gcd function is standard.


Getting d

The number d is chosen so that d*e mod phi = 1. Do not try to find d by searching for it. There will be far too many values to try. There is a well-known algorithm, called the Extended Euclid Algorithm, that can find d. The details of that algorithm are beyond the scope of this class. However, the algorithm has already been implemented for you.

You can think of arithmetic mod m, for any m, as defining alternative operations +, - and * to the usual ones. You define x + y in this alternative arithmetic as (x + y) mod m. For example, mod 14, you say that 5 + 12 = 3. This is the kind of arithmetic (mod 12) that you perform when you use a clock (with the exception that the numbers are 1-12 rather than 0-11). For example, if you add an hour to 12 o'clock, you get 1 o'clock. So, in clock arithmetic, 12+1 = 1 and 10 + 4 = 2.

If we agree to do arithmetic mod phi, then the characteristic desired for d is that d*e = 1. But that is the usual requirement of the reciprocal of e. (The reciprocal of a number x is a number y such that x*y = 1.) So, in a sense, d is the reciprocal of e. But it is not the usual reciprocal, since we are using an unusual form of arithmetic. You cannot use the 1/x key on your calculator to find it. Still, this seems to involve some kind of analog of division in the alternative arithmetic.

Import package math/modular.ast. It defines a type Modular of numbers on which the alternative arithmetic is done. Expression (x _modulo_ m) produces number x, where arithmetic on x is done mod m. The operations of +, -, * and / are all defined for type Modular. There is a unary version of / that takes the reciprocal. So /x is the reciprocal of x. To get back from a Modular number to an ordinary number, you can use operator _modulo_ in a pattern. So to compute d from e you can write

      Match ?d _modulo_ phi = /(e _modulo_ phi).
That is, d is the reciprocal of e when arithmetic is done mod phi.


Enciphering and deciphering

To compute the encipher and decipher functions, you will need to perform exponentiation. Exponentiation is available as operator ^. But if you write expression (x^d) _mod_ n, you will run into a very serious problem that you cannot ignore. Imagine that x and d are large, say about 1050. How large is xd? It is far too large ever to be stored in any computer that anyone will ever build. To get around this, you must reduce mod n at every step. That is, every time you do a multiplication, you must take the result mod n. That keeps the numbers small (and produces the correct result). Try computing 38 mod 5 this way. Each time you do a multiplication, take the result mod 5. Check your answer with a calculator.

There is another serious problem as well. When you computed 38, you probably did 7 multiplications. To compute 320, you would do 19 multiplications. But if you try to compute xd using d-1 multiplications, where d is about 1050, then you find that the computation does not terminate within your lifetime. (1050 is very roughly the number of protons in the Earth. You cannot expect to do that many multiplications.) A better algorithm is needed. What you do is repeated squarings to get large powers rapidly.

If you use the exponentiation provided in package math/modular.ast, then this will be taken care of. To compute m = (xd) mod n, write

  Match ?m _modulo_ n = (x _modulo_ n)^d.
Now you are using the modular arithmetic exponentiation function, which is efficient, and then extracting the result m as a natural number. Here is the algorithm that is used. modpow(x,n,m) produces xn mod m.

Define modpow by first
  case modpow(?,0,?) = 1
  case modpow(?x,2*(?n),?m) = (p*p) _mod_ m |
         Let p = modpow(x,n,m).
  case modpow(?x,?n+1,?m) = x*modpow(x,n,m)
%Define


Computing listToNumber

You can write function listToNumber directly, using recursion. You can also write it without recursion.

listToNumber is a kind of fold. To see this, consider the case where the base is 10. Define a binary operator $ so that x $ y = 10*x + y. What x $ y does is shift x to the left by one digit, and then add y. So, for example, 234 $ 5 = 2345.

Notice that 2345 = 0 $ 2 $ 3 $ 4 $ 5, where the $ operations are done from left to right. So listToNumber is a left fold, and you can use function lfold to compute listToNumber.

The definition of operator $ depends on the base (10 in the example). If you define function dol so that dol b (x,y) = b*x + y, then $ is just (dol 10). You should be able to see how to do the fold.


Computing numberToList

You can use recursion to define numberToList, and you should feel free to do that. Try some examples in base 10 to see what needs to be done, and generalize to arbitrary bases. (What works in base 10 will work in other bases too.)

An alternative is to recognize that, by the principle of duality, since listToNumber is a left fold, its inverse numberToList is a right unfold. For decimal aritmetic, you have used function f(x,y) = 10*x + y. You unfold by the inverse of this function, where f-1(10*x + y) = (x,y). You will not be able to write the function definition in exactly this form, since the pattern matcher does not know how to solve equations of the form 10*(?x) + ?y = n. But you should be able to see how to write this function.

You can use function runfold to do the job. You will need to read the documentation of runfold to use it.


Getting a line of text

Expression readLine() produces a line of text that is read from the standard input. This is a nonfunctional aspect, but is an easy way to get the input.

Expression skipWS(x) is the result of stripping all leading white space from the front of string x. For example, skipWS("  abc  ") = "abc  ". How can you strip white space from both ends? (Hint: use reverse.)

Functions skipWS and readLine are both defined in package collect/string.ast.


Breaking up the input string

You need to break the input string into a list of strings, where each string in the list is short enough to encipher. Lists are typically read using pullers.

A puller is a function that takes a list (such as a string) and extracts something from the front of the list. It returns a pair consisting of the value extracted and the rest of the list. For example, if function pull2 is designed to remove two characters from the front of a string, then pull2("abcdefgh") = ("ab", "cdefgh").

Higher order function pulls takes a puller and uses it repeatedly until the list is exhausted. For example, (pulls pull2 "abcdefgh") = ["ab", "cd", "ef", "gh"]. What pulls has done is run pull2 to extract the first two characters, then run it again to get the next two characters, etc.

You will want a library function called pullUpToN. It is a curried function. Expression (pullUpToN k) produces a puller that extracts k characters from the front of the list. (If there are not k characters remaining, it gets them all.) So, for example, pullUpToN 3 "abcdefg" = ("abc", "defg").


Writing a list to a file

You can use WriteFile to write an entire file (open, write, close all in one). To make the file called f hold string s, use

       WriteFile f,s.
WriteFile is an imperative operation. Function $ will convert a list to a string. For example, $([23,4,567]) = "[23,4,567]". Use that to print a list.


Reading a list from a file

If you have written a list of numbers into a file called f, you can get the list back by doing a pattern match against the contents of the file. What you wrote to the file was $(L) for some list L, so use

     Match $(?x:[Natural]) = infile(f).
to bind x to the list of natural numbers that was written into file f. You must tell the compiler what type of thing to look for, since it has no other way of determining what you want.

At some point you will want to concatenate all of the strings back together, to reverse the process of cutting up the file into short pieces. How can you do that?


Getting the key triple

Use pattern matching to get the key triple from a key file. If you write key (2S, n, e) using

   WriteFile keyfile, $(2*S,n,e).
then you can get them back from file keyfile using
   Match $(?twoS,?n,?e) = infile(keyfile).

Sections below are for information only
and are not required for implementing this program


Factoring and Security

The RSA system is a public key system. What that means is that the key required to encipher can be made public, keeping the decipher key private.

If the public key is (n,e), then the private key (n,d) can be found by anyone who has the prime factors of n. Since you created n by finding p and q and then making n = p*q, you know the prime factors of n. But n is part of the public information, so anyone else could get that information by running a factoring algorithm to factor n. The question then is, how hard is it to factor a number?

The answer depends on the size and nature of n, and on the algorithm that is used. Algorithms can be classified by the rough size of numbers that can be factored in a reasonable amount of time. Here, a reasonable time might be considered to be anything from a few seconds to a year.

Increases in the size of numbers that can be factored have been made over time, partly because computers have become faster, but much more significantly due to improvements in algorithms. Here are a few algorithms.

  1. The naive algorithm tries all factors up to the square root of n. It can factor numbers up to about 18 digits in a reasonable amount of time. That would require testing a billion potential factors.

  2. Pollard's algorithm runs much more efficiently than the naive algorithm. Its time is roughly proportional to the fourth root of n, instead of the square root of n. You might hope to factor 35 digit numbers with it. Function factors, from package math/prime.ast, uses Pollard's algorithm.

  3. Variations of the Quadratic Sieve algorithm have been used to factor numbers in the 90-100 digit range, taking months of time on hundreds of computers running in parallel.

  4. The Number Field algorithm has factored 125 digit numbers on a large network of computers.

At present, factoring 200 digit numbers is out of practical reach.


Checking primality

One way to discover whether a number is prime is to find its prime factors. That is clearly unacceptable, however. We need to be able to check whether a very large number is prime in a reasonable amount of time, in spite of the fact that factoring large numbers if very difficult.

There is a very clever way to check whether a number is prime. A result due to Pierre de Fermat, called Fermat's Little Theorem, states that

If p is prime and x is an integer that is not divisible by p, then xp-1 mod p = 1.
If you select a number x, where 0 < x < n, and you discover that xn-1 mod n is not 1, then n must clearly not be prime, since otherwise Fermat's Little Theorem would be violated. For example, 28 mod 9 = 4, so 9 cannot be prime.

Say that x is a witness to the fact that n is not prime if 0 < x < n and xn-1 mod n is not 1. For example, 2 is a witness to the fact that 9 is not prime.

Remarkably, for all but rarely occurring numbers called Carmichael numbers, if n is not prime, then at least half of the numbers from 1 to n-1 are witnesses that n is not prime. To find out whether n is prime, guess a few numbers x at random. If any of them are witnesses that n is not prime, then you know for sure that n is not prime. If none of the guessed numbers are witnesses, then, with high probability, n is prime. (If it were not, you would probably have found a witness.) This approach is much faster than factoring for large numbers.

This does not work for the Carmichael numbers, but there is a separate test for them. (In fact, Carmichael numbers are always easy to factor.) Since Carmichael numbers are rare, the extra test for them is rarely needed in practice.

Function prime? in package math/prime.ast implements this algorithm, with the added test for Carmichael numbers, and so can be used to test whether very large numbers are prime.

The algorithm just described has the disadvantage that it can make errors. With low probability, you can fail to find a witness, even though one exists. In 2002 a new algorithm was discovered that efficiently tests whether a number is prime, without the possibility of ever making an error. Researchers had thought that an algorithm like this must exist, but had not found one earlier. Probably the most surprising thing about the new algorithm is how simple and short it turned out to be. Few people were expecting that. That is to the credit of the algorithm's discoverers.

In practice, the new algorithm is still considerably slower than the randomized one based on Fermat's Little Theorem, and the probability that the randomized algorithm makes an error can be made smaller than the probability that the computer malfunctions because it was hit by a cosmic ray. So the randomized algorithm is still used. There remains the possibility of a truly fast practical algorithm that never errs.


Practical encryption

RSA is a public key cipher. That means that you have two keys, one to encipher (the public key) and one to decipher (the private key). Knowing how to encipher does not help you determine how to decipher, within a reasonable amount of time.

The RSA system is versatile; it allows you to exchange information over a channel with an eaves dropper, even with someone whom you have never communicated with before. (You choose a random key, and send the public key to the other person. That person can send encrypted replies back. The eaves dropper will know the public key, but only you can read the encrypted messages because only you have the private key.)

Unfortunately, RSA is very slow. Practical systems like PGP use RSA during an initial "getting to know you" phase, during which keys are exchanged for a simpler and more efficient private-key system, such as the Advanced Encryption Standard. It is also possible to publish public keys for anyone to read. The remaining information is encrypted using the private-key system. Private key systems have just one key that is used for both enciphering and deciphering. If you tell someone how to encipher a message, you have also given away information on how to decipher.