12.5. The Subset Sum Problem


The Subset Sum problem

The subset sum problem is as follows.

Input. A list of positive integers x1, x2, …, xm and a positive integer K.

Question. Does there exist an index set I ⊆ {1, 2, …, m} so that ∑iI xi = K?

For example, suppose that the list is (5, 2, 9, 3, 5, 8) and K = 20. Then the answer is true: 9 + 3 + 8 = 20.

Similarly, if the list is (5, 2, 9, 3, 5, 8) and K = 12, the answer is true: 5 + 2 + 5 = 12.

But if the list is (5, 2, 9, 3, 5, 8) and K = 4, the answer is clearly false.


Subset Sum is in NP

The question for Subset Sum is: Does there exist an index set I ⊆ {1, 2, …, m} so thatiI xi = K?

So a nondeterministic algorithm for Subset Sum is as follows.

Evidence: An index set I ⊆ {1, 2, …, m}.

Accept evidence if:iI xi = K.

It should be clear that

  1. the evidence is short (bounded in length by a polynomial in n, the length of the input) and

  2. testing the evidence can be done in polynomial time.


Subset Sum is NP-complete

Theorem. The Subset Sum problem is NP-complete.

We have seen that Subset Sum is in NP. All that is left is to reduce some known NP-complete problem to Subset Sum.

We reduce 3-SAT to Subset Sum.

The reduction function takes a clausal formula φ with 3 literals per clause and it yields a list (x1, x2, …, xm) and a positive integer K.

Here is how the reduction works.

Suppose that φ = c1c2 ∧ … ∧ cu, and that φ contains v variables.

As a running example, suppose that φ is: (xy ∨ ¬z) ∧ (¬ywz) ∧ (¬w ∨ ¬xy).

There are m = 2(u + v) numbers in the list (x1, x2, …, xm). They are as follows, written in standard decimal notation, for the example. The lines are just to break them up.

w x y z   c1 c2 c3
w 1 0 0 0 | 0 1 0
w 1 0 0 0 | 0 0 1
x 1 0 0 | 1 0 0
x 1 0 0 | 0 0 1
y 1 0 | 1 0 1
y 1 0 | 0 1 0
z 1 | 0 1 0
z 1 | 1 0 0
| 1 0 0
| 1 0 0
| 1 0
| 1 0
| 1
| 1
 
1 1 1 1 3 3 3

The value of K is shown at the end: 1111333.

In general, there are two numbers for each variable x, one standing for x and one for ¬x. Let's write Nx and Nx for those two variables.

There is also a column for each variable. Notice that K has a 1 in each of those columns. Any collection of these numbers that adds up to K must have exactly one of the numbers Nx and Nx.

Selecting Nx corresponds to choosing x to be true.

Selecting Nx corresponds to choose x to be false.

Now look at the columns labeled by clauses.

  1. There is a 1 in column ci of Nx if clause ci contains literal x.

  2. There is a 1 in column ci of Nx if clause ci contains literal ¬x.

  3. Among the numbers that stand for variables, there are no more than 3 1's in a given column, since each clause contains only 3 literals.

  4. Each clause column has two additional 1's that only occur in that column.

  5. So each clause column has no more that 5 1's. That means, even if you select all of the columns, there can never be a carry. The columns add up independently of one another.

  6. Notice that there are only two 1's in a clause node that are not in numbers that correspond to variables, and each clause column must add up to 3.

    So for each clause ci, at least one of the numbers that corresponds to a variable (or its negation) that has a 1 in column ci must be selected.

  7. Choosing Nx, having a 1 in column ci, corresponds to x being true.

    Nx has a 1 in column ci because clause ci contains x. So clause ci is true.

  8. Choosing Nx, having a 1 in column ci, corresponds to x being false.

    Nx has a 1 in column ci because clause ci contains ¬x. So again, clause ci is true.

There is a direct correspondence between variable assignments that make φ true and collections of numbers from the given list that add up to K.

So φ is satisfiable if and only if the Subset Sum problem just described has a solution. ◊