Tag Archives: knapsack

What is a “Hard” Problem?

Throughout our lives, we are introduced to a wide variety of problems. Naturally we tend to think of some problems as more difficult than others. If you were doing the exercises at the end of a chapter in a book, the author generally tends to begin with those exercises that can be completed in a few minutes or directly from the material in the chapter. The exercises in the end tend to be more time consuming and possibly require outside resources. In this sense, these authors tend to compare the difficulty of problems by the amount of time they expect the average reader to take to solve them.

This is a popular way to look at difficulty – in terms of how difficult it is for not only a single person to solve a problem (say you or I), but how long it would take our peers as well as yourself to solve the problem. But how can we measure this? Should I assume that because a problem takes me two years to solve that it would take the average person the same amount of time? Or, if a problem has never been solved, is that because of the difficulty of the problem or could it be for another reason like because no-one had thought of the problem before?

In particular, these questions were being asked in the world of computer science in the 1960s and 1970s. It was around this time that Stephen Cook came up with the concept being able to compare how difficult one problem was in comparison to another. A problem A can be “reduced” to another problem B if a way to solve problem B quickly also gives way to a way to solve problem A quickly.

This concept of reducibility provided a foundation for this concept of difficulty as it was no longer enough to say “I think this problem is hard”, or “the average person would take just as long as me to solve this problem”. No, instead, Cook considered problems that many thought were difficult and showed that the satisfiability problem (the question of whether a given boolean formula contains an assignment to the variables that satisfies all the clauses) could be used to show that all quickly verifiable decision problems (also called problems in NP), where the instances with a “yes” answer have short proofs of the fact that the answer is indeed “yes” could be reduced to this problem.

This meant that the satisfiability problem was at least as hard as every one of these quickly verifiable decision problems, so if this problem could be solved quickly, then every one of these quickly verifiable decision problems could also be solved quickly. So the satisfiability problem is at least as hard as every problem in this class, NP. We call such problems that have this property NP-Hard. Decision Problems that are NP-Hard are called NP-Complete. Richard Karp later published a paper that proposed 21 NP-Complete problems, one of which was the Knapsack Problem I spoke about last time.

Garey Johnson NP-Complete Image

This concept of NP-Complete gives light to the image that became famous in the book by Garey and Johnson “Computers and Intractability: A Guide to the Theory of NP-Completeness”. It shows that the concept of difficulty still builds on the ability of our peers to solve a given problem. But by introducing a class of “hard” problems, and a litmus test for inclusion into that class, researchers could now consider new problems and determine their difficulty by comparing it to known hard problems.

(a quick note. I’ve used the term “quickly” here, but the formal phrase that I mean by that is that the problem is solvable by an algorithm whose worse case running time is bounded by a polynomial on the input parameters of the problem)

Other Blogs that have covered this topic:
To Imaging, and Beyond!

Knapsack Problems

Knapsack Image

I have added a script to help users understand the knapsack problem as well as some attempts at solving it.

To help understand this problem, I want you to think about a common situation in many people’s lives. You have a road trip coming up today and you’ve overslept and are at risk of missing your flight. And to top matters off, you were planning to pack this morning but now do not have the time. You quickly get up and begin to get ready. You grab the first bag you see and quickly try to make decisions on which items to take. In your head you’re trying to perform calculations on things you’ll need for the trip versus things that you can purchase when you get there; things that you need to be able to have a good time versus things you can do without. And to top matters off, you don’t have time to look for your ideal luggage to pack these things. So you have the additional constraint that the items you pick must all fit into this first bag you found this morning.

The situation I described above is a common problem. Even if we ignore the part about the flight, and just concentrate on the problem of trying to put the most valuable set of items in our bag, where each item has its own value and its own size limitations, this is a problem that comes up quite often. The problem is known (in the math, computer science and operations research communities) as the knapsack problem. It is known to be difficult to solve (it is said to be NP-Hard and belongs to a set of problems that are thought to be the most difficult problems within its class). Because of this difficulty, this problem has been well studied.

What I provide in my script are two approaches to solving this problem. The first is a greedy approach, which selects a set of items by iteratively choosing the item with the highest remaining value to size ratio. This approach solves very fast, but can be shown to give sub-optimal solutions.

The second approach is a dynamic programming approach. This algorithm will solves the problem by ordering the items 0, 1, …, n and understanding that in order to have the optimal solution on the first i items, the optimal solution must have been first selected on the fist i-1 items. This algorithm will optimally solve the problem, but it requires the computation of many sub-problems which causes it to run slowly.

Update (4/2/2013): I enjoy this problem so much that I decided to implement two additional approaches to the problem: Linear Programming and Backtracking.

The Linear Programming approach to this problem comes from the understanding that the knapsack problem (as well as any other NP-Complete problem) can be formulated as an Integer Program (this is a mathematical formulation where we seek to maximize a linear objective function subject to a set of linear inequality constraints with the condition that the variables take on integer values). In the instance of the knapsack problem we would introduce a variable xi for each item i; the objective function would be to maximize the total value of items selected. This can be expressed as a linear objective function by taking the sum of the products of the values of each item vi and the variable xi; the only constraint would be the constraint saying that all items fit into the knapsack. This can be expressed as a linear inequality constraint by taking the sum of the products of the weights of each item wi and the variable xi. This sum can be at most the total size of the knapsack. In the integer programming formulation, we either select an item or we do not. This is represented in our formulation by allowing the variable xi = 1 if the item is selected, 0 otherwise.

The LP relaxation of an integer program can be found by dropping the requirements that the variables be integer and replacing them with linear equations. So in the case of the knapsack problem, instead of allowing the variables to only take on values of 0 and 1, we would allow the variables to take on any value in the range of 0 and 1, i.e 0 <= xi <= 1 for each item i. We can then solve this LP to optimality to get a feasible solution to the knapsack problem.The second knapsack approach I implemented today is through backtracking. Similar to the Dynamic Programming approach to this problem, the backtracking approach will find an optimal solution to the problem, but these solutions generally take a long time to compute and are considered computationally inefficient. The algorithm I implemented here first orders the item by their index, then considers the following sub-problems for each item i "What is the best solution I can obtain with this initial solution?". To answer this question, the algorithm begins with an initial solution (initially, the empty set) and a set of unchecked items (initially, all items) and recursively calls itself on sub-problems with an additional item as a part of the initial solution and with this item removed from the unchecked items.So check out my knapsack problem page. I think its a good way to be introduced to the problem itself, as well as some of the techniques that are used in the fields of mathematics, computer science, engineering and operations research.

Other Blogs covering this topic:
Journey to the Four Point Oh