Hints for assignment 3

Importing library packages

You will need to import some libraries for this assignment. To get the random number generator, for example, you must say

       Import "misc/random".
and to get the select function (which you might find useful), you need
       Import "collect/search".

The documentation tells you which packages to import for each function.

Getting the prime numbers p and q

You have a number s, the security parameter, and you need to select two prime numbers p and q in the range from 256s to 256s+1. To select a random prime number, select a random start value k in the range from 256s to 256s+1, and build the infinite list [k, k+1, k+2, ...]. Then select the first member of this list that is prime. (Don't worry about the rare case where the result is larger than 256s+1. It is rare, and won't matter anyway.) Useful functions are select (import "collect/search.cmg") and prime? (package "math/prime.cmg"). Build the list [k,k+1,...] using [k,...] See the next section for how to get a random number.

Be sure to select p and q independently, not from the same infinite list. Choose a separate random number for each. Write a function that selects a random prime number in a given range, and use it twice.

Warning. The ^ operator has higher precedence than the + and * operators. Pay attention to what you write. 256s+1 is not the same as 256s+1.

Getting "Random" Numbers

Module misc/random.cmg provides a pseudo-random number generator. You will need to import it. The pseudo-random number generator starts with a seed value provided to it, and produces a fairly long sequence of random numbers from that seed. If x is a string of decimal digits, then

      SetRandomSeed x.
sets the seed to x. See SetRandomSeed. For each chosen seed you get exactly the same sequence of pseudo-random numbers out. So they are not really random at all.

You want to use the string provided by the user as the seed. Unfortunately, what the user types could be any string, including letters and punctuation characters, but SetRandomSeed needs a string of decimal digits. But you can turn any string t into a string of decimal digits using your stringToNumber function. Compute $(stringToNumber t), since stringToNumber converts a string to a number, and built-in function $ converts a number into its decimal representation (a string).

After setting the seed, you can use the random number generator to select the prime numbers p and q. Only set the seed once, not every time you want another pseudo-random number. Use function randomRange to select a random number. Expression randomRange(x,y) produces a "randomly" generated number between x and y inclusive.

Note: The pseudo-random numbers given to you depend on the seed. Since you use a seed that comes from the user, the numbers generated will depend on the key string that the user types. Try it, to see that you get different random numbers when you use different key strings.

Note: The random number generator is an imperative feature. Each time you call randomRange(x,y), a new pseudo-random number is created. That violates the rules of functional programming, which require two occurrences of the same expression in the same context to yield the same value. This is one of the small non-functional aspects of your program.

Getting φ and e

After selecting prime numbers p and q, you will need to compute n, φ, e and d. The values of n and φ are easy to get, since n = pq and φ = (p - 1)(q - 1).

Select a value of e from the list [3,4,5,...] such that gcd(e,φ) = 1. You are guaranteed to find a suitable value for e very quickly. Use select and gcd.

Getting d

The number d is chosen so that d*e mod φ = 1. Do not try to find d by searching for it. There will be far too many values to try. There is a well-known algorithm, called the Extended Euclid Algorithm, that can find d. The details of that algorithm are beyond the scope of this class. However, the algorithm has already been implemented for you.

You can think of arithmetic mod m, for any m, as defining alternative operations +, − and * to the usual ones. You define x + y in this alternative arithmetic as (x + y) mod m. For example 5 + 12 = 3 (mod 14). This is the kind of arithmetic (mod 12) that you perform when you use a clock (with the exception that the numbers are 1-12 rather than 0-11). For example, if you add an hour to 12 o'clock, you get 1 o'clock. So, in clock arithmetic, 12+1 = 1 and 10 + 4 = 2.

If we agree to do arithmetic mod φ, then the characteristic desired for d is that d*e = 1. But that is the usual requirement of the reciprocal of e. (The reciprocal of a number x is a number y such that x*y = 1.) So, in a sense, d is the reciprocal of e. But it is not the usual reciprocal, since we are using an unusual form of arithmetic. You cannot use the 1/x key on your calculator to find it. Still, this seems to involve some kind of analog of division in the alternative arithmetic.

Import package math/modular.cmg. It defines a type Modular of numbers on which the alternative arithmetic is done. Expression (x `modulo` m) produces number x, where arithmetic on x is done mod m. For example, (10 `modulo` 14) + (9 `modulo` 14) = (5 `modulo` 14). The operations of +, −, * and / are all defined for type Modular. There is a unary version of / that takes the reciprocal. So /x is the reciprocal of x. To get back from a Modular number to an ordinary number, you can use operator `modulo` in a pattern. So to compute d from e you can write

      {d `modulo` ~(phi) =~ /(e `modulo` phi)}
That is, d is the reciprocal of e when arithmetic is done mod φ.

Enciphering and deciphering

To compute the encipher and decipher functions, you will need to perform exponentiation. Exponentiation is available as operator ^. Writing expression (x^d) `mod` n will work, but only because the compiler knows to replace that by something else because package std/number.cmg contains Expand x^y `mod` m => modularPower(x, y, m). Imagine that x and d are large, say about 1050. How large is xd? It is 1050(1050), which is far too large ever to be stored in any computer that anyone will ever build. (It would require more storage bits than there are protons in the earth.)

To get around this, you must reduce mod n at every step. That is, every time you do a multiplication, you must take the result mod n. That keeps the numbers small (and produces the correct result). Try computing 38 mod 5 this way. Each time you do a multiplication, take the result mod 5. Check your answer with a calculator.

There is another serious problem as well. When you computed 38, you probably did 7 multiplications. To compute 320, you would do 19 multiplications. But if you try to compute xd using d−1 multiplications, where d is about 1050, then you find that the computation does not terminate within your lifetime. A better algorithm is needed. What you do is repeated squarings to get large powers rapidly.

Both of those improvements are part of the modularPower function. See here for the algorithm that is used.

Writing stringToNumber

To convert a string to a number, remember that the string is a list of characters. Standard function rank converts a character into its numeric code. For example, rank('a') = 97, since the ascii code of 'a' is 97. If you replace each character by its code, you get a list of numbers that are less than 256, which you can think of as the digits of a number in base 256. Then convert that list of numbers into a single number.

Although you are using base 256, let's look at the same process in the more familiar base 10. To convert the list [2,3,4,5] into 2345, you can start with 0 and successively add digits, giving 2, 23, 234 and finally 2345. This is a kind of fold. Use foldLtoR to do the fold.

A good idea is to make the base a parameter to your function. Test it using small bases such as 10 and 2, but use it in base 256. Consider defining a curried function where the first argument is the base. For example, if you define

  {listToNumberWithBase b s = ...}
then your definition of listToNumber would be
  {listToNumber = listToNumberWithBase 256}

Writing numberToString

First convert from a number to a list of its digits in base 256. You will find this easiest to do by recursion. Then convert the digits into characters, giving the desired string.

To see how this works, look at it in base 10 instead of base 256 to start. After you understand it in base 10, you should see how to do the same thing in base 256. You will find the `mod` and `div` and useful.

Again, consider adding a base to a function, numberToListWithBase, and defining function numberToList using that function. The advantage is that you can do tests with small bases.

Check that numberToString(stringToNumber(x)) = x for a few strings x. Does it work?

Getting a line of text

Expression readLine() produces a line of text that is read from the standard input. This is an imperative aspect, but is an easy way to get the input.

Breaking up the input string

You need to break the input string into a list of strings, where each string in the list is short enough to encipher. Lists are typically read using getters. A getter is a function that takes a list (such as a string, which is a list of characters) and extracts something from the front of the list. It returns a pair consisting of the value extracted and the rest of the list. For example, if function get2 is designed to remove two characters from the front of a string, then get2("abcdefgh") = ("ab", "cdefgh"), providing the part "ab" that was read and the part "cdefgh" that is left over.

Higher order function repeatGet takes a getter and uses it repeatedly until the list is exhausted. For example, (repeatGet get2 "abcdefgh") = ["ab", "cd", "ef", "gh"]. What repeatGet has done is run get2 to extract the first two characters, then run it again to get the next two characters, etc.

You will want to use a library function called getUpToN. It is a curried function. Expression (getUpToN k) produces a getter that extracts k characters from the front of the list. (If there are not k characters remaining, it gets them all.) So, for example, getUpToN 3 "abcdefg" = ("abc", "defg") and repeatGet(getUpToN 3) "abcdefg" = ["abc", "def", "g"].

Salting

You gain security by adding random characters to the string. If the string were just "yes", you could not get much security by enciphering only that string. Instead, you might add three random characters to the front and encipher "uqgyes". The extra characters are called salt. When you decipher, you need to remove the salt.

You know how to select a random number in a given range. For ease of reading, choose a number from 97 to 122 for each salt character. Those are codes of the lower case letters. Convert the number to a character using function char.

You want a whole string of s randomly chosen characters. You can use fdup to run a function repeatedly and build a list of its results. Then concatenate the salt string with the actual string. Be sure to use a different salt string for each piece of the input file.

Writing a list to a file

You can use WriteFile to write an entire file (open, write, close all in one). To make the file called f  hold string t, use

       WriteFile f, t.
WriteFile is an imperative operation.

Function $ will convert a list to a string. For example, $([23,4,567]) = "[23,4,567]".

Reading a list from a file

If you have written a list of numbers into a file called f, you can get the list back by doing a pattern match against the contents of the file. Function fileContents takes the name of file and yields the contents of a file (as a string). What you wrote to the file was $(x) for some list x, so use

     {$(x: [Integer]) =~ fileContents(f)}
to bind x to the list of natural numbers that was written into file f. You must tell the compiler what type of thing to look for, since it has no other way of determining what you want. You will need to do import
  Import "collect/string".
to use $ in patterns.

At some point you will want to concatenate all of the strings back together, to reverse the process of cutting up the file into short pieces. How can you do that?

Getting the key triple

Use pattern matching to get the key triple from a key file. If you write key (s, n, e) using

   WriteFile keyFileName, $(s,n,e).
then you can get s, n and e back from file keyfile using
   {$(s,n,e) =~ fileContents(keyFileName)}

Using tags

Although you are not required to write type information in most places, you will find it useful to write it anyway. One way to do that is to use assumptions. Decide on a naming convention. If you decide that anything named nums is a list of integers, write paragraph

  Assume nums: [Integer].
The compiler will follow this assumption, and anything called nums is treated as a list of integers.

An alternative is to tag something where you create it. For example,

  {nums: [Integer] = map stringToNumber strings}
explicitly tells the compiler that nums is a list of integers. You might find the following types useful.

Type Meaning
Integer An integer
Char A single character
[Integer] A list of integers
[Char] A string (a list of characters)
String String is the same as [Char]
[String] A list of strings
(Integer, Integer) An ordered pair of two integers
(Integer, Integer, Integer) An ordered triple of three integers

Sections below are for information only
and are not required for implementing this program

Factoring and security

See factoring and security for a discussion of security of the RSA system.

Other security issues

See here for a discussion of problems with how the RSA system is used in this assignment, and how they can be fixed.

Checking primality

The method of checking whether a number is prime is important, since the RSA system needs very large prime numbers. See checking primality.

Practical encryption

See here for a discussion of practical use of the RSA system.