Hints for assignment 3

Importing library packages

You will need to import some libraries for this assignment. To get the random number generator, for example, you must say

       Import "misc/random".

and to get the select function (which you will find useful), you need

       Import "collect/search".

The function documentation tells you which packages to import. See the library by category for brief descriptions of library functions.

Getting the prime numbers p and q

You need to select two prime numbers p and q in the range from 257^S to 257^S+1. To select a random prime number, select a random start value k in the range from 257^S to 257^S+1, and build the infinite list [k, k+1, k+2, ...]. Then select the first member of this list that is prime. Useful functions are _upto_, select (package collect/search.ast) and prime? (package math/prime.ast). For example, (k _upto_ infinity) produces the desired infinite list [k, k+1, ...].

Be sure to select p and q independently, not from the same infinite list. Write a function that selects a random prime number in a given range, and use it twice.

See the next section for how to get a random number.

Getting "Random" Numbers

Standard package misc/random.ast provides a pseudo-random number generator. It starts with a seed value provided to it, and produces a fairly long sequence of random numbers from that seed. If x is a string of decimal digits, then

      SetRandomSeed x.

sets the seed to x. String x should have at least 40 digits, although it can be shorter than that. See random number generator.

You can use the string provided by the user as the seed. The seed is required to be a string of decimal digits, not an arbitrary string. But you can turn any string s into a string of decimal digits using your strToNum function. Compute $(strToNum s), since strToNum converts a string to a number, and built-in function $ converts a number into its decimal representation.

After setting the seed, you can use the random number generator to select the prime numbers p and q. Only set the seed once, not every time you want another pseudo-random number. Expression randomRange(x,y) produces a "randomly" generated number between x and y inclusive.

Note: If you use a seed that comes from the user, then the numbers generated will depend on the seed that the user selected. This is where the dependence on the user's key comes in. A different key yields a different seed which yields different numbers p and q.

Getting phi and e

After selecting prime numbers p and q, you will need to compute n, phi, e and d. When you compute phi, use operator -- instead of -, to make the compiler know that the answer is not a negative number. That is, phi = (p--1)*(q--1). (x -- y is |x-y|.) Use the select function to select e from the list [3,4,5,...]. What is the predicate that e must satisfy? The gcd function is standard.

Getting d

The number d is chosen so that d*e mod phi = 1. There is a well-known algorithm, called the Extended Euclid Algorithm, that can find d. The details of that algorithm are beyond the scope of this class. However, the algorithm has already been implemented for you.

You can think of arithmetic mod m, for any m, as defining alternative operations +, - and * to the usual ones. You define x + y in this alternative arithmetic as (x + y) mod m. For example, mod 14, you say that 5 + 12 = 3. This is the kind of arithmetic (mod 12) that you perform when you use a clock (with the exception that the numbers are 1-12 rather than 0-11). For example, if you add an hour to 12 o'clock, you get 1 o'clock. So, in clock arithmetic, 12+1 = 1.

If we agree to do arithmetic mod phi, then the characteristic desired for d is that d*e = 1. But that is the usual requirement of the reciprocal of e. So, in a sense, d is the reciprocal of e. But it is not the usual reciprocal, since we are using an unusual form of arithmetic. Still, this seems to involve some kind of analog of division in the alternative arithmetic.

Import package math/modular.ast. It defines a type Modular of numbers on which the alternative arithmetic is done. Expression (x _modulo_ m) produces number x, where arithmetic on x is done mod m. The operations of +, -, * and / are all defined for type Modular. There is a unary version of / that takes the reciprocal. So /x is the reciprocal of x. To get back from a Modular number to an ordinary number, you can use operator _modulo_ in a pattern. So to compute d from e you can write

      Match ?d _modulo_ phi = /(e _modulo_ phi).

Enciphering and deciphering

To compute the encipher and decipher functions, you will need to perform exponentiation. Exponentiation is available as operator ^. But if you write expression (x^d) _mod_ n, you will run into a very serious problem. Imagine that x and d are large, say about 10⁵⁰. How large is x^d? It is far too large ever to be stored in any computer that anyone will ever build. To get around this, you must reduce mod n at every step. That is, every time you do a multiplication, you must take the result mod n. That keeps the numbers small (and produces the correct result). Try computing 3⁴ mod 5 this way.

There is another problem as well. When you computed 3⁴, you probably did 3 multiplications. To compute 3¹⁰, you would do 9 multiplications. But if you try to compute x^d using d-1 multiplications, where d is about 10⁵⁰, then you find that the computation does not terminate within your lifetime. (10⁵⁰ is very roughly the number of protons in the Earth. You cannot expect to do that many multiplications.) A better algorithm is needed. What you do is repeated squarings to get large powers rapidly.

If you use the exponentiation provided in package math/modular.ast, then this will be taken care of. To compute m = (x^d) mod n, write

  Match ?m _modulo_ n = (x _modulo_ n)^d.

Now you are using the modular arithmetic exponentiation function, which is efficient, and then extracting the result m as a natural number.

Computing listToNum

You can write function listToNum directly, using recursion. You can also write it without recursion.

listToNum is a kind of fold. To see this, consider the case where the base is 10. Define a binary operator $ so that x $ y = 10*x + y. What x $ y does is shift x to the left by one digit, and then add y. So, for example, 234 $ 5 = 2345.

Notice that 2345 = 0 $ 2 $ 3 $ 4 $ 5, where the $ operations are done from left to right. So listToNum is a left fold, and you can use function lfold to compute listToNum.

The definition of operator $ depends on the base. Define it inside function listToNum, where the base is known. Give it a name other than $. For example, you might define f(x,y) = b*x + y, (where b is from the context of listToNum) and use f to do the fold.

Computing numToList

You can use recursion to define numToList, and that is recommended. Try some examples in base 10 to see what needs to be done, and generalize to arbitrary bases.

The alternative is to recognize that, since listToNum is a left fold, its inverse numToList is a right unfold. You can use function runfold to do the job. You will need to read the documentation of runfold to use it.

Getting a line of text

Expression readLine(bxStdin) produces a line of text that is read from the standard input (bxStdin.)

Expression skipWS(x) is the result of stripping all leading white space from the front of string x. For example, skipWS(" abc ") = "abc ". How can you strip white space from both ends? (Hint: use reverse.)

Functions skipWS and readLine are both defined in package collect/string.ast.

Breaking up the input string

You need to break the input string into a list of strings, where each string in the list is short enough to encipher. Lists are typically read using pullers.

A puller is a function that takes a list (such as a string) and extracts something from the front of the list. It returns a pair consisting of the value extracted and the rest of the list. For example, if function pull2 is designed to remove two characters from the front of a string, then pull2("abcdefgh") = ("ab", "cdefgh"). To pull a list apart into pairs of characters, you could use pull2 repeatedly, until there is nothing left.

You will want a library function called pullUpToN. It is a curried function. Expression (pullUpToN k) produces a puller that extracts k characters from the front of the list. (If there are not k characters remaining, it gets them all.) So, for example, pullUpToN 3 "abcdefg" = ("abc", "defg").

Writing a list to a file

You can use WriteFile to write an entire file (open, write, close all in one). To make the file called f hold string s, use

       WriteFile f,s.

WriteFile is an imperative operation. Function $ will convert a list to a string. For example, $([23,4,567]) = "[23,4,567]". Use that to print a list.

Reading a list from a file

If you have written a list of numbers into a file called f, you can get the list back by doing a pattern match against the contents of the file. What you wrote to the file was $(L) for some list L, so use

     Assume x: [Natural].
     Match $(?x) = infile(f).

to bind x to the list of natural numbers that was written into file f. (The assume line is needed if the compiler has no other way of knowing what you are trying to get from the file.)

At some point you will want to concatenate all of the strings back together, to reverse the process of cutting up the file into short pieces. How can you do that?

Getting the key triple

Use pattern matching to get the key triple from a key file. If you write key (2S, n, e) using

   WriteFile keyfile, $(2*S,n,e).

then you can get them back from file keyfile using

   Match $(?twoS,?n,?e) = infile(keyfile).

Factoring and Security

The RSA system is a public key system. What that means is that the key required to encipher can be made public, keeping the decipher key private.

If the public key is (n,e), then the private key (n,d) can be found by anyone who has the prime factors of n. Since you created n by finding p and q and then making n = p*q, you know the prime factors of n. But anyone else could get that information by running a factoring algorithm to factor n. The question then is, how hard is it to factor a number?

The answer depends on the size and nature of n, and on the algorithm that is used. Algorithms can be classified by the rough size of numbers that can be factored in a reasonable amount of time. Here, a reasonable time might be considered to be anything from a few seconds to a year.

Increases in the size of numbers that can be factored have been made over time, partly because computers have become faster, but much more significantly due to improvements in algorithms. Here are a few algorithms.

The naive algorithm tries all factors up to the square root of n. It can factor numbers up to about 18 digits in a reasonable amount of time. That would require testing a billion potential factors.
Pollard's algorithm runs much more efficiently than the naive algorithm. Its time is roughly proportional to the fourth root of n, instead of the square root of n. You might hope to factor 35 digit numbers with it. Function factors, from package math/prime.ast, uses Pollard's algorithm.
Variations of the Quadratic Sieve algorithm have been used to factor numbers in the 90-100 digit range, taking months of time on hundreds of computers running in parallel.
The Number Field algorithm has factored 125 digit numbers on a large network of computers.

At present, factoring 200 digit numbers is out of practical reach.

Checking primality

One way to discover whether a number is prime is to find its prime factors. That is clearly unacceptable, however. We need to be able to check whether a very large number is prime in a reasonable amount of time, in spite of the fact that factoring large numbers if very difficult. Here are two approaches.

A randomized algorithm

There is a very clever way to check whether a number is prime. A result due to Pierre de Fermat, called Fermat's Little Theorem, states that

If p is prime and x is an integer that is not divisible by p, then x^p-1 mod p = 1.

If you select a number x, where 0 < x < n, and you discover that x^n-1 mod n is not 1, then n must clearly not be prime, since otherwise Fermat's Little Theorem would be violated. For example, 2⁸ mod 9 = 4, so 9 cannot be prime.

Say that x is a witness to the fact that n is not prime if 0 < x < n and x^n-1 mod n is not 1. For example, 2 is a witness to the fact that 9 is not prime.

Remarkably, for all but rarely occurring numbers called Carmichael numbers, if n is not prime, then at least half of the numbers from 1 to n-1 are witnesses that n is not prime. To find out whether n is prime, guess a few numbers x at random. If any of them are witnesses that n is not prime, then you know for sure that n is not prime. If none of the guessed numbers are witnesses, then, with high probability, n is prime. (If it were not, you would probably have found a witness.) This approach is much faster than factoring for large numbers.

This does not work for the Carmichael numbers, but there is a separate test for them. (In fact, Carmichael numbers are always easy to factor.) Since Carmichael numbers are rare, the extra test for them is rarely needed in practice.

Function prime? in package math/prime.ast implements this algorithm, with the added test for Carmichael numbers, and so can be used to test whether very large numbers are prime.

A deterministic algorithm

In 2002, Agrawal, Kayal and Saxena discovered a fast (and surprisingly easy to state, for a number theorist) algorithm that tells whether a number is prime, and does not rely on randomization. It never gives a wrong answer. This is a major breakthrough, but, even though the algorithm is fairly fast, it cannot compete in terms of practical speed with the randomized algorithm.

Practical encryption

The RSA system is versatile; it allows you to exchange information over a channel with an eaves dropper even with someone whom you have never communicated with before. (You choose a random key, and send the public key to the other person. That person can send encrypted replies back. Only you can read the messages.) Unfortunately, RSA is very slow. Practical systems like PGP use RSA during an initial "getting to know you" phase, during which keys are exchanged for a simpler and more efficient private-key system, such as the Advanced Encryption Standard. The remaining information is encrypted using the private-key system.