Tag Archives: research

The Assignment Problem

I just finished a script that generates instances of the assignment problem and solves them step by step. You can check it out here.

Assignment Problem Image

Suppose you are the owner of a company and need to delegate tasks to your employees. You’ve generated a table that tells how long (in minutes) you think it would take each person to accomplish each individual task (called Jobs). Your goal is to find an assignment of people to jobs that minimizes the total amount of time it will take to complete all jobs. The requirements are that each job must be completed by only one person, and each person can complete only one job.

We can think of the employees at the job as our supply and the tasks as the demand. In order for this problem to have a feasible solution, we must have enough people (supply) to complete the number of jobs (demand). Because of this, our examples will all include situations where there are exactly the same number of people as jobs.

To solve this problem, we must first generate an initial assignment and see how good this assignment is. There are several ways of generating an initial solution, but two that I wanted to focus on are the “NorthWest Corner Rule” and the “Minimum Matrix Method”.

  1. The Northwest Corner Rule considers the matrix and repeatedly assigns the top remaining row to the left-most remaining column. If we think of the cost matrix as a being like a map then “top” becomes similar to “north” and left-most becomes similar to “west”, hence the derivation of the name. In assignment problems, this will result in the main diagonal being selected.
  2. The Minimum Matrix Method is an iterative method that searches the matrix for the minimum cell in the matrix and assigns that person to the associated job and removes them from consideration and repeats itself until all people have been assigned to jobs.

Once we have formulated an initial feasible solution, we need to check it for optimality. To do this, we use the Network Simplex Method, where we build a basis based on this initial solution. When we consider this problem as a network flow problem, a basis for the problem is a spanning tree (one less than twice the number of nodes in the graph that does not have any cycles) of the network. Because the assignment solution only contains one edge for every two nodes in the graph, we need to add a number of edges to the basis that contains no flow (which makes the solution degenerate) to form this spanning tree.

Once a spanning tree is formulated, we can solve for the dual variables by arbitrarily setting one node’s dual value to zero and solving for the remaining dual variables under the requirement that all arcs in the basis (spanning tree) must satisfy the equation uk + vi = cki for each person k and job i.

When we have dual variables, we can check to see if this solution is optimal by checking to see if all the other constraints are violated. This means that for every person k and every job i, we must have uk + vi cki (notice that this is a more relaxed version of what we had when we were solving for the dual variables themselves).

If a constraint is found to be violated, then we need to add the associated edge to the basis and remove an edge on the cycle that is formulated as a result, which generates a new solution.

So check out The Assignment Problem Script and let me know what you think.

Approximating the Set Cover Problem

Set Cover Problem Instance

I just finished my weekly task of shopping for groceries. This can be a somewhat daunting task because I generally have a list of things that I’ll need which cannot all be purchased at a single location. What often happens is that I find that many of the items on my list are ONLY offered at certain stores – generic brands of certain items for example. My goal then changes from minimizing the total amount of money spent to minimizing the number of stores that I must visit to purchase all of my items.

To formulate this as a mathematical problem, suppose that I have a grocery list of items I would like to buy, represented by the lists item1, item2, …, itemn, where n represents the number of items I have on this list. Suppose also that there are stores Store1, Store2, …, Storem (each one distinct) that offer some combination of items I have on my list. What I would like to do is minimize the number of stores I have to visit to purchase these items.

The problem I just described is famous because it is one that many people face on a regular basis. In a more general form, it is so famous that it has a name for it, called the Set Cover Problem (or the Minimum Set Cover Problem). In the general form of this problem, we replace the grocery list with a set of items called our universe. The lists of items offered at each store are the collections of subsets of the universe. In the problem, as in the example above, we would like to select enough subsets from this collection that we are able to obtain every element in our universe. We would like to do this with as low a number of sets as possible.

In my previous post, I described the 21 problems that Karp proved were NP-Complete. Set Cover was one of those problems, showing that this is a hard problem to solve. What I will do is introduce three ways to reach a near-optimal solution relatively quickly.

Greedy Method

One of the first approaches one may take to solve this problem is to repeatedly select the subset that contains the most new items. That’s how the greedy approach to set cover operates. The method knows to terminate when all elements belong to one of the selected sets. In the shopping example above, this would be accomplished by visiting the store that had the most items on my list and purchasing those items at this store. Once this is done, the items that have been purchased can be crossed off my list and we can visit the store with the most items on my remaining list, stopping when the list is empty.

Linear Programming Relaxation

Instead of stating the set cover problem with words, there is a way of describing the situation with mathematical inequalities. For instance, suppose that the soap I like to purchase is only available at stores Store1, Store4 and Store9. Then I could introduce a variable xi for each store i and the requirement that I purchase this soap can be restated as :

x1 + x4 + x9 greater than or equals 1

Because we can either purchase some items or not purchase these items, each variable xi is 0 or 1 (called a binary variable). We can introduce similar constraints for each element in our universe (or on our grocery list). These inequalities (called constraints) have the form:

for each element e in U, sumi | e in Si xi greater than or equals 1

Our goal of minimizing the number of sets chosen (stores visited) can be stated by the objective function:
minimize sum1 less than or equals i less than or equals n xi

So the mathematical formulation for this problem can be stated as

minimize sum1 less than or equals i less than or equals n xi
Subject to
for each element e in U, sumi | e in Si xi greater than or equals 1
for each set i, xi in {0, 1}.

Formulations of this type, where variables are restricted to a finite set (in this case the x variables being either 0 or 1) are called integer programs. Unfortunately, there is no easy way to solve these formulations either. However, there is a related problem which can be solved quickly.

Instead of restricting the x variables to the values of 0 or 1, we could allow them to take on any value within this range, i.e. 0 less than or equals xi less than or equals 1 for each set Si. Doing this converts the problem from an integer programming problem into a linear programming problem (called the LP-Relaxation), which can be solved quickly. The issue with this method though is that the solution obtained by an LP-Relaxation is not guaranteed to be an integer. In this case, how do we interpret the values xi?

Randomized Rounding Method

One approach to dealing with a non-integer solution to the LP-Relaxation is to treat the xi values as probabilities. We can say that xi is the probability that we select set i. This works because each value of xi is in the range of 0 to 1, which is necessary for a probability. We need to repeatedly select sets with their associated probabilities until all elements in our universe are covered. Selecting our sets based on this procedure is the randomized rounding approach.

Deterministic Rounding Method

A second approach to dealing with a non-integer solution to the LP-Relaxation is to base our solution on the most occurring element. If we let f be this frequency (i.e.the number of sets that the most occurring element occurs in), then we can define a solution by selecting set i if the LP=Relaxation solution gives the variable xi a value of at least (1/f).

None of these three approaches is guaranteed to give an optimal solution to an instance of this problem. I will not go into it in this post, but these can all be shown to be within some guaranteed range of the optimal solution, thus making them approximation algorithms.

You can see how the three algorithms compare on random problem instances here.

Hope you enjoy.