I just finished my weekly task of shopping for groceries. This can be a somewhat daunting task because I generally have a list of things that I’ll need which cannot all be purchased at a single location. What often happens is that I find that many of the items on my list are ONLY offered at certain stores – generic brands of certain items for example. My goal then changes from minimizing the total amount of money spent to minimizing the number of stores that I must visit to purchase all of my items.

To formulate this as a mathematical problem, suppose that I have a grocery list of items I would like to buy, represented by the lists item_{1}, item_{2}, …, item_{n}, where n represents the number of items I have on this list. Suppose also that there are stores Store_{1}, Store_{2}, …, Store_{m} (each one distinct) that offer some combination of items I have on my list. What I would like to do is minimize the number of stores I have to visit to purchase these items.

The problem I just described is famous because it is one that many people face on a regular basis. In a more general form, it is so famous that it has a name for it, called the Set Cover Problem (or the Minimum Set Cover Problem). In the general form of this problem, we replace the grocery list with a set of items called our universe. The lists of items offered at each store are the collections of subsets of the universe. In the problem, as in the example above, we would like to select enough subsets from this collection that we are able to obtain every element in our universe. We would like to do this with as low a number of sets as possible.

In my previous post, I described the 21 problems that Karp proved were NP-Complete. Set Cover was one of those problems, showing that this is a hard problem to solve. What I will do is introduce three ways to reach a near-optimal solution relatively quickly.

**Greedy Method**

One of the first approaches one may take to solve this problem is to repeatedly select the subset that contains the most new items. That’s how the greedy approach to set cover operates. The method knows to terminate when all elements belong to one of the selected sets. In the shopping example above, this would be accomplished by visiting the store that had the most items on my list and purchasing those items at this store. Once this is done, the items that have been purchased can be crossed off my list and we can visit the store with the most items on my remaining list, stopping when the list is empty.

**Linear Programming Relaxation**

Instead of stating the set cover problem with words, there is a way of describing the situation with mathematical inequalities. For instance, suppose that the soap I like to purchase is only available at stores Store_{1}, Store_{4} and Store_{9}. Then I could introduce a variable x_{i} for each store i and the requirement that I purchase this soap can be restated as :

_{1}+ x

_{4}+ x

_{9}1

Because we can either purchase some items or not purchase these items, each variable x_{i} is 0 or 1 (called a binary variable). We can introduce similar constraints for each element in our universe (or on our grocery list). These inequalities (called constraints) have the form:

for each element e U, _{i | e Si} x_{i} 1

Our goal of minimizing the number of sets chosen (stores visited) can be stated by the objective function:

minimize _{1 i n} x_{i}

So the mathematical formulation for this problem can be stated as

minimize _{1 i n} x_{i}

Subject to

for each element e U, _{i | e Si} x_{i} 1

for each set i, x_{i} {0, 1}.

Formulations of this type, where variables are restricted to a finite set (in this case the x variables being either 0 or 1) are called integer programs. Unfortunately, there is no easy way to solve these formulations either. However, there is a related problem which can be solved quickly.

Instead of restricting the x variables to the values of 0 or 1, we could allow them to take on any value within this range, i.e. 0 x_{i} 1 for each set S_{i}. Doing this converts the problem from an integer programming problem into a linear programming problem (called the LP-Relaxation), which can be solved quickly. The issue with this method though is that the solution obtained by an LP-Relaxation is not guaranteed to be an integer. In this case, how do we interpret the values x_{i}?

**Randomized Rounding Method**

One approach to dealing with a non-integer solution to the LP-Relaxation is to treat the x_{i} values as probabilities. We can say that x_{i} is the probability that we select set i. This works because each value of x_{i} is in the range of 0 to 1, which is necessary for a probability. We need to repeatedly select sets with their associated probabilities until all elements in our universe are covered. Selecting our sets based on this procedure is the randomized rounding approach.

**Deterministic Rounding Method**

A second approach to dealing with a non-integer solution to the LP-Relaxation is to base our solution on the most occurring element. If we let f be this frequency (i.e.the number of sets that the most occurring element occurs in), then we can define a solution by selecting set i if the LP=Relaxation solution gives the variable x_{i} a value of at least (1/f).

None of these three approaches is guaranteed to give an optimal solution to an instance of this problem. I will not go into it in this post, but these can all be shown to be within some guaranteed range of the optimal solution, thus making them approximation algorithms.

You can see how the three algorithms compare on random problem instances here.

Hope you enjoy.

A small optimization for some problems is to first choose any sets with unique elements. These will have to be chosen anyway. The problem can then be reduced to a smaller set and fewer subsets, before applying one of these methods.

This occurred to me when I played with the simulator, as in some caees there is no choice as to the sets — once the sets with unique elements are chosen, that was the solution.