Hints for assignment 3

Getting the prime numbers p and q

You need to select two prime numbers p and q in the range from 2 to N. But how large is N? The value of N determines how large the blocks are that are being enciphered at once. It is N that determines the security of the cipher system. You should choose a fixed value for N in your program. Make it larger than 1000. A reasonable choice might be 10^10 or 10^20. That will provide virtually no security, but will let you test things. For security, you should really choose N to be about 10^50 (low security) or 10^100 (high security). Be warned that the choice of N has a strong effect on how fast the program runs. You are using a fairly slow interpreter, and it will take a very long time for N = 10^100.

To select a random prime number, select a random start value k in the range from 2 to N, and build the infinite list [k, k+1, k+2, ...]. Then select the first member of this list that is prime. Useful functions are _upto_, select (package collect/search.ast) and prime? (package math/prime.ast). For example, (k _upto_ infinity) produces the desired infinite list.

Be sure to select p and q independently, not from the same infinite list. Write a function that selects a random prime number in a given range, and use it twice.

See the next section for how to get a random number.

Getting "Random" Numbers

Standard package misc/random.ast provides a pseudo-random number generator. It starts with a seed value provided to it, and produces a fairly long sequence of random numbers from that seed. If x is a string of decimal digits, then

      SetRandomSeed x.

sets the seed to x.

You can use the string provided by the user as the seed. The seed is required to be a string of decimal digits, not an arbitrary string. But you can turn any string s into a string of decimal digits by computing $(strToNum s), since strToNum converts a string to a number, and built-in function $ converts a number into its decimal representation.

After setting the seed, you can use the random number generator to select the prime numbers p and q. Only set the seed once, not every time you want another pseudo-random number. Expression randomRange(a,b) produces a "randomly" generated number between a and b inclusive.

Note: If you use a seed that comes from the user, then the numbers generated will depend on the seed that the user selected. This is where the dependence on the user's key comes in. A different key yields a different seed which yields different numbers p and q.

Getting phi and e

After selecting prime numbers p and q, you will need to compute n, phi, e and d. When you compute phi, use operator -- instead of -, to make the compiler know that the answer is not a negative number. That is, phi = (p--1)*(q--1). Use the select function to select e from the list [3,4,5,...]. What is the predicate that e must satisfy? The gcd function is standard.

Getting d

The number d is chosen so that d*e mod phi = 1. There is a well-known algorithm, called the Extended Euclid Algorithm, that can find d. The details of that algorithm are beyond the scope of this class. However, the algorithm has already been implemented for you.

You can think of arithmetic mod m, for any m, as defining alternative operations +, - and * to the usual ones. You define x + y in this alternative arithmetic as (x + y) mod m. For example, mod 14, you say that 5 + 12 = 3. This is the kind of arithmetic (mod 12) that you perform when you use a clock (with the exception that the numbers are 1-12 rather than 0-11). For example, if you add an hour to 12 o'clock, you get 1 o'clock.

If we agree to do arithmetic mod phi, then the characteristic desired for d is that d*e = 1. That is, d is the reciprocal of e. This seems to involve some kind of analog of division in the alternative arithmetic.

Import package math/modular.ast. It defines a type Modular of numbers on which the alternative arithmetic is done. Expression (x _modulo_ m) produces number x, where arithmetic on x is done mod m. The operations of +, -, * and / are all defined for type Modular. There is a unary version of / that takes the reciprocal. So /x is the reciprocal of x. (It is NOT the usual reciprocal that your calculator will give you, since it is the reciprocal in the alternative arithmetic, mod m.) To get back from a Modular number to an ordinary number, you can use operator _modulo_ in a pattern. So to compute d from e you can write

      Match ?d _modulo_ phi = /(e _modulo_ phi).

Enciphering and Deciphering

To compute the encipher and decipher functions, you will need to perform exponentiation. If you write expression (x^d) _mod_ n, you will run into a very serious problem. Imagine that x and d are large, say about 10^50. How large is x^d? It is far too large ever to be stored in any computer that anyone will ever build. To get around this, you must reduce mod n at every step. That is, every time you do a multiplication, you must take the result mod n. That keeps the numbers small (and produces the correct result). Try computing 3^4 mod 5 this way.

There is another problem as well. If you try to compute x^d using d-1 multiplications, where d is about 10^50, then you find that the computation does not terminate within your lifetime. (10^50 is very roughly the number of protons in the earth.) What you do is repeated squarings to get large powers rapidly.

If you use the exponentiation provided in package math/modular.ast, then this will be taken care of. To compute m = (x^d) mod n, write

  Match ?m _modulo_ n = (x _modulo_ n)^d.

Computing listToNum

You can write function listToNum directly, using recursion. You can also write it without recursion.

listToNum is a kind of fold. To see this, consider the case where the base is 10. Define a binary operator $ so that x $ y = 10*x + y. What x $ y does is shift x to the left by one digit, and then add y. So, for example, 234 $ 5 = 2345.

Notice that 2345 = 0 $ 2 $ 3 $ 4 $ 5, where the $ operations are done from left to right. So listToNum is a left fold, and you can use function lfold to compute listToNum.

The definition of operator $ depends on the base. Define it inside function listToNum, where the base is known. Give it a name other than $. For example, you might define f(x,y) = b*x + y, and use f to do the fold.

Computing numToList

You can use recursion to define numToList, and that is recommended. Try some examples in base 10 to see what needs to be done, and generalize to arbitrary bases.

The alternative is to recognize that, since listToNum is a left fold, its inverse numToList is a right unfold. You can use function runfold to do the job. You will probably find recursion easier to understand, however.

Getting a line of text

Expression readLine(stdin!) produces a line of text that is read from the standard input.

Expression skipWS(x) is the result of stripping all leading white space from the front of string x. For example, skipWS(" abc ") = "abc ". How can you strip white space from both ends? (Hint: use reverse.)

Functions skipWS and readLine are both defined in package collect/string.ast.

Breaking up the input string

You need to break the input string into a list of strings, where each string in the list is short enough to encipher. Lists are typically read using pullers.

A puller is a function that takes a list (such as a string) and extracts something from the front of the list. It returns a pair consisting of the value extracted and the rest of the list. For example, if function pull2 is designed to remove two characters from the front of a string, then pull2("abcdefgh") = ("ab", "cdefgh").

Higher order function pulls takes a puller and uses it repeatedly until the list is exhausted. For example, (pulls pull2 "abcdefgh") = ["ab", "cd", "ef", "gh"]. What pulls has done is run pull2 to extract the first two characters, then run it again to get the next two characters, etc.

You will want a library function called pullUpToN. It is a curried function. Expression (pullUpToN k) produces a puller that extracts k characters from the front of the list. (If there are not k characters remaining, it gets them all.) So, for example, pullUpToN 3 "abcdefg" = ("abc", "defg").

The only remaining problem is to decide how large the pieces should be. You need for k < log(base 257)(n). You can use the log function provided in package math/math.ast. Expression (log b x) produces the log to base b of x. Then take the floor of that to convert to an integer.

There are some type problems that you need to be careful about. The log function is picky about types. In (log b x), x must be of type Real. Use unary operator ~ to convert to Real. So ~n is the result of converting n to type Real.

The floor function produces a result of type Integer. You need a result of type Natural. Use function natural to convert from Integer to Natural. So natural(floor(log 257 (~n))) is the log of n, as a natural number.

Writing a list to a file

You can use WriteFile to write an entire file (open, write, close all in one). To make the file called f hold string s, use

       WriteFile f,s.

Function $ will convert a list to a string. For example, $([23,4,567]) = "[23,4,567]".

Reading a list from a file

If you have written a list of numbers into a file called f, you can get the list back by doing a pattern match against the contents of the file. What you wrote to the file was $(L) for some list L, so use

     Assume x: [Natural].
     Match $(?x) = infile(f).

to bind x to the list of natural numbers that was written into file f. (The assume line is needed if the compiler has no other way of knowing what you are trying to get from the file.)

At some point you will want to concatenate all of the strings back together, to reverse the process of cutting up the file into short pieces. How can you do that?

Factoring and Security

The RSA system is a public key system. What that means is that the key required to encipher can be made public, keeping the decipher key private. Your implementation here is not designed to do that, but more sophisticated implementations are.

If the public key is (n,e), then the private key (n,d) can be found by anyone who has the prime factors of n. Since you created n by finding p and q and then making n = p*q, you know the prime factors of n. But anyone else could get that information by running a factoring algorithm to factor n. The question then is, how hard is it to factor a number?

The answer depends on the size and nature of n, and on the algorithm that is used. Algorithms can be classified by the rough size of numbers that can be factored in a reasonable amount of time. Here, a reasonable time might be considered to be anything from a few seconds to a year. Increases in the size of numbers that can be factored have been made over time, partly because computers have become faster, but much more significantly due to improvements in algorithms. Here are a few algorithms.

The naive algorithm tries all factors up to the square root of n. It can factor numbers up to about 18 digits. That would require testing a billion potential factors.
Pollard's algorithm runs much more efficiently than the naive algorithm. Its time is roughly proportional to the fourth root of n, instead of the square root of n. You might hope to factor 35 digit numbers with it.
Variations of the Quadratic Sieve algorithm have been used to factor numbers in the 100 digit range, taking months of time on hundreds or thousands of computers.
The Number Field algorithm has factored 125 digit numbers on a large network of computers.

Checking primality

One way to discover whether a number is prime is to find its prime factors. That is clearly unacceptable, however. We need to be able to check whether a very large number is prime in a reasonable amount of time, in spite of the fact that factoring large numbers if very difficult.

There is a very clever way to check whether a number is prime. A result due to Pierre de Fermat, called Fermat's Little Theorem, states that

If p is prime and x is an integer that is not divisible by p, then x^(p-1) mod p = 1.

If you select a number x, where 0 < x < n, and you discover that x^(n-1) mod n is not 1, then n must clearly not be prime, since otherwise Fermat's Little Theorem would be violated. For example, 2^(8) mod 9 = 4, so 9 cannot be prime.

Say that x is a witness to the fact that n is not prime if 0 < x < n and x^(n-1) mod n is not 1. For example, 2 is a witness to the fact that 9 is not prime.

Remarkably, for all but rarely occurring numbers called Carmichael numbers, if n is not prime, then at least half of the numbers from 1 to n-1 are witnesses that n is not prime. To find out whether n is prime, guess a few numbers x at random. If any of them are witnesses that n is not prime, then you know for sure that n is not prime. If none of the guessed numbers are witnesses, then with high probability n is prime. (If it were not, you would probably have found a witness.)

This does not work for the Carmichael numbers, but there is a separate test for them. Since Carmichael numbers are rare, the other test is rarely needed in practice.

Function prime? in package math/prime.ast implements this algorithm, and so can be used to test whether very large numbers are prime.