# Floyd-Warshall Shortest Paths

The Floyd Warshall algorithm is an all pairs shortest paths algorithm. This can be contrasted with algorithms like Dijkstra’s which give the shortest paths from a single node to all other nodes in the graph.

Floyd Warshall’s algorithm works by considering first the edge set of the graph. This is the set of all paths of the graph through one edge. Node pairs that are connected to one another through an edge will have their shortest path set to the length of that edge, while all other node pairs will have their shortest path set to infinity. The program then runs through every triplet of nodes (i, j, k) and checks if the path from i to k and the path from k to j is shorter than the current path from i to j. If so, then the distance and the path is updated.

So lets consider an example on the graph in the image above. The edge set of this graph is E = {(0, 1), (0, 2), (0, 3), (1, 3), (3, 4)}. So our initial table is:

 0 1 2 3 4 0 inf (0, 1) (0, 2) (0, 3) inf 1 (0, 1) inf inf (1, 3) inf 2 (0, 2) inf inf inf inf 3 (0, 3) (1, 3) inf inf (3, 4) 4 inf inf inf (3, 4) inf

As we look to update the paths, we first look for routes that go through node 0:

Because node 0 connects to both node 1 and node 2, but node 1 does not connect to node 2, we have the following truth holding in the matrix above:
cost(0, 1) + cost(0, 2) < cost(1, 2), so we can update the shortest path from node 1 to node 2 to be (1, 0, 2).

Because node 0 connects to both node 2 and node 3, but node 2 does not connect to node 3, we have the following truth holding in the matrix above:
cost(0, 2) + cost(0, 3) < cost(2, 3), so we can update the shortest path from node 2 to node 3 to be (2, 0, 3).

Because node 3 connects to both node 0 and node 4, but node 0 does not connect to node 4, we have the following truth holding in the matrix above:
cost(0, 3) + cost(3, 4) < cost(0, 4), so we can update the shortest path from node 0 to node 4 to be (0, 3, 4).

Because node 3 connects to both node 1 and node 4, but node 1 does not connect to node 4, we have the following truth holding in the matrix above:
cost(1, 3) + cost(3, 4) < cost(1, 4), so we can update the shortest path from node 1 to node 4 to be (1, 3, 4).

Because node 3 connects to both node 2 and node 4, but node 2 does not connect to node 4, we have the following truth now holding:
cost(2, 3) + cost(3, 4) < cost(2, 4), so we can update the shortest path from node 2 to node 4 to be (2, 0, 3, 4).

The final table giving the list of shortest paths from every node to every other node is given below.

 0 1 2 3 4 0 inf (0, 1) (0, 2) (0, 3) (0, 3, 4) 1 (0, 1) inf (1, 0, 2) (1, 3) (1, 3, 4) 2 (0, 2) (1, 0, 2) inf (2, 0, 3) (2, 0, 3, 4) 3 (0, 3) (1, 3) (2, 0, 3) inf (3, 4) 4 (0, 3, 4) (1, 3, 4) (2, 0, 3, 4) (3, 4) inf

To see more examples and to help answer questions, check out the script in my examples section on the Floyd-Warshall algorithm

# Degree Centrality of a Graph

I wanted to spend some time on centrality measures of a graph. These are measurements of how important each node (or edge) is to the overall graph. But how do we define, or determine, importance? There is no unique way to answer this question, so there are varying metrics for measuring centrality. Which one you choose depends on several factors including how many other nodes of the graph are included, as well as the run time of the metrics you’re considering.

I have just published a script focusing on the degree centrality metric. The degree centrality metric is called a “walk metric” because it determines how important a node is by how many other nodes that can be reached by walks of up to a certain length. Lets look at the definition of the degree of a node to see if we can understand why it is called a walk metric.

In an undirected graph G = (V, E), the degree of a node u [in] V is the |{v | (u, v) [in] E}|. This is the size of the set of nodes that are connected to node u via a single edge. Another way of describing a single edge is a walk of length one. So the degree metric measures the importance of a node by the number of unique walks of length one.

The normalized degree centrality of a node v in a graph G = (V,E) measures how many nodes are connected to the node v, compared to the maximum possible number of edges that can be connected to this node. Because we are dealing with simple undirected graphs (at most a single edge between any two distinct vertices), this maximum possible number will always be |V – 1|. So the normalized degree can be calculated by dividing the degree of the node (the number of nodes it is connected to) by |V – 1|.

So for the example above, the node 0 has degree 6 because it is connected to nodes 2, 5, 9, 10, 11, and 12. There are 15 total nodes in this graph, so to calculate the normalized degree centrality of the node 0, it will be 6 / 14, which rounds to 0.428571.

To see more examples and to help answer questions, check out the script in my examples section on degree centrality

# Tarjan’s Strongly Connected Components Algorithm

A strongly connected component of a graph is a subgraph S of G where every pair of nodes, u and v in S there is a path from u to v and a path from v to u.

To find these strongly connected components we implement Tarjan’s algorithm. The idea behind Tarjan’s algorithm is to begin by running a depth first search from an arbitrary node in the graph, labeling nodes reachable from this start node in the order they are reached. The algorithm is also interested in the “oldest” node that could be reached by a given node. This is indicated by the keeping track of the lowest label that can be reached from that node. We will call the first property label(v) and the second lowlink(v).

When the algorithm starts label(v) is the same as lowlink(v) whenever a node is discovered. As the algorithm is executed, the DFS is being run on each discovered node, which in turn updates the lowlink(v) property telling of (older) nodes that can be reached. If an older node can be reached, then we update lowlink. If we reach a node that cannot connect to any older nodes after the DFS call, i.e if label(v) is the same a lowlink(v), then this means that this node does not have a path to any node with a lower label. So this node will be the first node of a new strongly connected component.

Feel free to check it out an let me know what you think in the comments below.

# Independent Set Puzzles

In this post, I want to return to the idea of NP-Complete problems. There is a more technical, more formal definition that I can refer you to, but I like to refer to the images from Garey and Johnson’s “Computers and Intractability: A Guide to the Theory of NP Completeness”. The images helps to understand the difficulty of NP-Complete problems by presenting two images. The first image shows a single individual speaking to someone saying that he has been unable to solve the problem. The second image shows that same individual speaking to the same person behind the desk, but saying that not only was he unable to solve the problem but neither was a long line of people. The theory of NP Complete problems revolves around the concept that if an efficient algorithm exists for an NP-Complete problem, then an efficient algorithm exists for all problems in the class NP.

Today, I would like to present a puzzle I created to play with the Independent Set problem. This is the problem where we are given a graph G = (V, E) and are asked to find a maximum set of vertices S such that there is no edge in the graph G between any two vertices in S. The decision version (the problem asking whether there is an independent set of size k) of this problem is NP-Complete, so the known algorithms for problem either have a slow running time, or do not solve it exactly.

This problem is very related to another puzzle I posted last year called the clique problem. In fact, Karp originally proved that Clique was NP-Complete by showing that if Clique could be solved efficiently, then Independent Set could be solved efficiently. He did this by constructing a second graph [G bar], called the compliment of G (containing the same vertices in G, along with the edges that are not present in G. Any edge present in G will not be present in ). Then he showed that the nodes representing a maximum clique in G would represent a maximum independent set in . He had already shown that Independent Set was NP-Complete, which meant that both Independent Set and Clique were among the most difficult problems within the class known as NP.

The puzzle begins with an undirected graph and asks users to find a maximum independent set. Users should click on the numbers in the table below the graph indicating the nodes they wish to select in their independent set (purple indicates that the node is selected, gray indicates that it is not). Once a user have a potential solution, they can press the “Check” button to see if their solution is optimal. If a user is having trouble and simply wishes to see the maximum independent set, they can press the “Solve” button. And to generate a new problem, users can press the “New Problem” button.As a result of this relationship between the Clique problem and the Independent Set problem, the Bron-Kerbosch algorithm that was used to find maximum cliques previously can also be used here.

# Topological Sort

One of the things I generally say about myself is that I love learning. I can spend hours upon hours reading papers and algorithms to better understand a topic. Some of these topics are stand alone segments that I can understand in one sitting. Sometimes, however, there is a need to read up on some preliminary work in order to fully understand a concept.

Lets say that I was interested in organizing this information into a new course. The order I present these topics is very important. Knowing which topics depend on one another allows me to use the topological sorting algorithm to determine an ordering for the topics that respects the preliminary work.

The input for the topoligical sorting algorithm is a Directed Acyclic Graph (DAG). This is a set of relationships between pairs of topics, where if topic 1 must be understood before topic 2, we would add the relationship (topic 1, topic 2) to the graph. DAGs can be visualized by a set of nodes (points) representing the topics. Relationships like the one above (topic 1, topic 2) can then represented by a directed arc originating at topic 1 and flowing in the direction of topic 2. We say that the graph is “Acyclic” because there cannot be a cycle in the topic preliminaries. This amounts to us saying that a topic cannot be a prerequisite for itself. An example of a DAG is shown in the image above.

With the topics represented as a DAG, the topologial ordering algorithm works by searching the set of nodes for the one with no arcs coming into it. This node (or these nodes is multiple are present) represents the topic that can be covered next without losing understanding of the material. Such a node is guaranteed to exist by the acyclic property of the DAG. Once the node is selected, we can remove this node as well as all arcs that originate at this node from the DAG. The algorithm then repeats the procedure of searching for a nod with no arcs coming into it. This process repeats until there are no remaining nodes from which to choose.

Now lets see how the topological sort algorithm works on the graph above. We will first need to count the in-degree (the number of arcs coming into) each node.

Node | Indegree
—————-
0 | 2
1 | 2
2 | 0
3 | 2
4 | 2
5 | 2
6 | 0
7 | 2
8 | 3

Node to be removed (i.e. node with the minimum indegree): Node 2.
Arcs connected to node 2: (2, 5), (2, 3)
Resulting Indegree Count:
Node | Indegree
—————-
0 | 2
1 | 2
3 | 1
4 | 2
5 | 1
6 | 0
7 | 2
8 | 3

Node to be removed: Node 6:
Arcs connected to node 6: (6, 1), (6, 3), (6, 4), (6, 5), (6, 7), (6, 8)
Resulting Indegree Count:
Node | Indegree
—————-
0 | 2
1 | 1
3 | 0
4 | 1
5 | 0
7 | 1
8 | 2

Node to be removed: Node 3
Arcs connected to node 3: (3, 0), (3, 8)
Resulting Indegree Count:
Node | Indegree
—————-
0 | 1
1 | 1
4 | 1
5 | 0
7 | 1
8 | 1

Node to be removed: Node 5
Arcs connected to node 5: (5, 0), (5, 8)
Resulting Indegree Count:
Node | Indegree
—————-
0 | 0
1 | 1
4 | 1
7 | 1
8 | 0

Node to be removed: Node 0:
Arcs connected to node 0: (0, 1), (0, 4)
Resulting Indegree Count:
Node | Indegree
—————-
1 | 0
4 | 0
7 | 1
8 | 0

Node to be removed: Node 1
Arcs connected to node 1: none
Resulting Indegree Count:
Node | Indegree
—————-
4 | 0
7 | 1
8 | 0

Node to be removed: Node 4
Arcs connected to node 4: none
Resulting Indegree Count:
Node | Indegree
—————-
7 | 1
8 | 0

Node to be removed: Node 8
Arcs connected to node 8: (8, 7)
Resulting Indegree Count:
Node | Indegree
—————-
7 | 0

Node to be removed: Node 7
Arcs connected to node 7: none
Resulting Indegree Count:
Node | Indegree
—————-

Since there are no nodes remaining, we have arrived at a topological ordering. Going through this iteration, we can see that we arrived at the ordering (2, 6, 3, 5, 0, 1, 4, 8, 7). There were several occasions where there were multiple nodes with indegree of 0 and we could have selected an alternative node. This would have given us a different topological ordering of the nodes, but it would still be valid.

There are more learning opportunities and an interactive demonstration of the algorithm at Topological Sort Examples at LEARNINGlover.

# The Depth-First-Search Algorithm

I remember when I was younger I used to play the game of hide-and-seek a lot. This is a game where a group of people (at least two) separate into a group of hiders and a group of seekers. The most common version of this that I’ve seen is having one person as the seeker and everyone as hiders. Initially, the seeker(s) is given a number to count towards and close their eyes while counting. The hiders then search for places to hide from the seeker. Once the seeker is finished counting, their job is to find where everyone is hiding or admitting that they cannot find all the seekers. Any seekers not found are said to have won, and seekers that are found are said to have lost.

I played this game a number of times in my childhood, but I remember playing it with a friend named Dennis in particular. Dennis had a certain way he played as seeker. While many of us would simply go to places we deemed as “likely” hiding spots in a somewhat random order, Dennis would always begin by looking in one area of the room, making sure that he had searched through every area connected to that area before going to a new area. He continued this process until he either found everybody or concluded that he had searched every spot he could think of and gave up.

It wasn’t until years later that I was able to note the similarity between Dennis’s way of playing hide-and-seek and the Depth-First-Search algorithm. The Depth-First-Search Algorithm is a way of exploring all the nodes in a graph. Similar to hide-and-seek, one could choose to do this in a number of different ways. Depth-First-Search does this by beginning at some node, looking first at one of the neighbors of that node, then looking at one of the neighbors of this new node. If the new node does not have any new neighbors, then the algorithm goes to the previous node, looks at the next neighbor of this node and continues from there. Initially all nodes are “unmarked” and the algorithm proceeds by marking nodes as being in one of three states: visited nodes are marked as “visited”; nodes that we’ve marked to visit, but have not visited yet are marked “to-visit”; and unmarked nodes that have not been marked or visited are “unvisited”.

Consider a bedroom with the following possible hiding locations: (1) Under Bed, (2) Behind Cabinet, (3) In Closet, (4) Under Clothes, (5) Behind Curtains, (6) Behind Bookshelf, and (7) Under Desk. We can visualize how the bedroom is arranged as a graph and then use a Breadth First Search algorithm to show how Brent would search the room. Consider the following bedroom arrangement, where we have replaced the names of each item by the number corresponding to that item. Node (0) corresponds to the door, which is where Dennis stands and counts while others hide.

Now consider how a Breadth First Search would be run on this graph.

The colors correspond to the order in which nodes are visited in Depth-First-Search.

The way we read this is that initially Dennis would start at node 0, which is colored in Blue.
While Dennis is at node 0, she notices that nodes 1, 5, and 6 (under bed, behind curtains, and behind bookshelf) are the nearby and have not been checked yet so she places them on the “to visit” list.
Next, Dennis will begin to visit each node on the “to visit” list, and when a node is visited, she labels it as visited. At each location, she also takes note of the other locations she can reach from this location. Below is the order of nodes Dennis visits and how he discovers new locations to visit.

 Order Visited Node Queue Adding Distance From Node 0 1 0 6,5,1 0 2 6 5,1 7,3,2 1 3 7 3,2,5,1 2 4 3 2,5,1 2 5 2 5,1 4 2 6 4 5,1 3 7 5 1 1 8 1 1

Here is a link to my Examples page that implements the Depth-First-Search Algorithm on Arbitrary Graphs.

I remember when I was younger I used to play the game of hide-and-seek a lot. This is a game where a group of people (at least two) separate into a group of hiders and a group of seekers. The most common version of this that I’ve seen is having one person as the seeker and everyone as hiders. Initially, the seeker(s) is given a number to count towards and close their eyes while counting. The hiders then search for places to hide from the seeker. Once the seeker is finished counting, their job is to find where everyone is hiding or admitting that they cannot find all the seekers. Any seekers not found are said to have won, and seekers that are found are said to have lost.

I played this game a number of times in my childhood, but I remember playing it with a friend named Brenda in particular. Brenda had a certain way she played as seeker. While many of us would simply go to places we deemed as “likely” hiding spots in a somewhat random order, Brenda would always take a survey of the room, and no matter where she began searching, she would always make note of the locations close to her starting point and make sure she was able to give them all a look before she looked at locations that were close to the points she deemed close to the starting point. She continued this process until she either found everybody or concluded that she had searched every spot she could think of and gave up.

It wasn’t until years later that I was able to note the similarity between Brenda’s way of playing hide-and-seek and the Breadth-First-Search algorithm. The Breadth-First-Search algorithm is a way of exploring all the nodes in a graph. Similarly to hide-and-seek, one could choose to do this in a number of different ways. Breadth-First-Search does this by beginning at some node, looking first at each of the neighbors of the starting node, then looking at each of the neighbors of the neighbors of the starting node, continuing this process until there are no remaining nodes to visit. Initially all nodes are “unmarked” and the algorithm proceeds by marking nodes as being in one of three stages: visited nodes are marked as “visited”; nodes that we’ve marked to visit, but have not visited yet are marked “to-visit”; and unmakred nodes that have not been marked are “unvisited”.

Consider a bedroom with the following possible hiding locations: (1) Under Bed, (2) Behind Cabinet, (3) In Closet, (4) Under Clothes, (5) Behind Curtains, (6) Behind Bookshelf, and (7) Under Desk. We can visualize how the bedroom is arranged as a graph and then use a Breadth First Search algorithm to show how Brenda would search the room. Consider the following bedroom arrangement, where we have replaced the names of each item by the number corresponding to that item. Node (0) corresponds to the door, which is where Brenda stands and counts while others hide.

Now consider how a Breadth First Search would be run on this graph.

The colors correspond to the order in which nodes are visited in Breadth-First-Search.

The way we read this is that initially Brenda would start at node 0, which is colored in Blue.
While Brenda is at node 0, she notices that nodes 1, 5, and 6 (under bed, behind curtains, and behind bookshelf) are the nearby and have not been checked yet so she places them on the “to visit” list.
Next, Brenda will begin to visit each node on the “to visit” list, and when a node is visited, she labels it as visited. At each location, she also takes note of the other locations she can reach from this location. Below is the order of nodes Brenda visits and how she discovers new locations to visit.

 Order Visited Node Queue Adding Distance From Node 0 1 0 – 1, 5, 6 0 2 1 5, 6 2, 4 1 3 5 6, 2, 4 – 1 4 6 2, 4 3, 7 1 5 2 4, 3, 7 – 2 6 4 3, 7 – 2 7 3 7 – 2 8 7 – – 2

Here is a link to my Examples page that implements the Breadth-First-Search Algorithm on Arbitrary Graphs.

# Clique Problem Puzzles

I still remember how I felt when I was first introduced to NP-Complete problems. Unlike the material I had learned up to that point, there seemed to be such mystery and intrigue and opportunity surrounding these problems. To use the example from Garey and Johnson’s book “Computers and Intractability: A Guide to the Theory of NP Completeness”, these were problems that not just one researcher found difficult, but that a number of researchers had been unable to find efficient algorithms to solve them. So what they did was show that the problems all had a special relationship with one another, and thus through this relationship if someone were to discover an algorithm to efficiently solve any one of these problems they would be able to efficiently solve all the problems in this class. This immediately got my mind working into a world where I, as a college student, would discover such an algorithm and be mentioned with the heavyweights of computer science like Lovelace, Babbage, Church, Turing, Cook, Karp and Dean.

Unfortunately I was a student so I did not have as much time to devote to this task as I would have liked. In my spare time though I would try to look at problems and see what kind of structure I found. One of my favorite problems was, The Clique Problem. This is a problem where we are given an undirected graph and seek to find a maximum subset of nodes in this graph that all have edges between them, i.e. a clique (Actually the NP-Complete version of this problem takes as input an undirected graph G and an integer k and asks if there is a clique in G of size k).

Although I now am more of the mindset that there do not exist efficient algorithms to solve NP-Complete problems, I thought it would be a nice project to see if I could re-create this feeling – both in myself and others. So I decided to write a program that generates a random undirected graph and asks users to try to find a maximum clique. To test users answers, I coded up an algorithm that works pretty well on smaller graphs, the Bron-Kerbosch Algorithm. This algorithm uses backtracking to find all maximal cliques, which then allows us to sort them by size and determine the largest.

Users should click on the numbers in the table below the canvas indicating the nodes they wish to select in their clique (purple indicates that the node is selected, gray indicates that it is not). Once they have a potential solution, they can press the “Check” button to see if their solution is optimal. If a user is having trouble and simply wishes to see the maximum clique, they can press the “Solve” button. And to generate a new problem, users can press the “New Problem” button.

So I hope users have fun with the clique problem puzzles, and who knows maybe someone will discover an algorithm that efficiently solves this problem and become world famous.

# Assembly Line Scheduling

I wanted to take a minute to help some users become more familiar with Dynamic Programming, so I decided to write a script on the Assembly Line Scheduling Problem.

To introduce the problem I want to tell you a story about a friend of mine. Keisha recently started a clothing company that uses two assembly lines to produce articles of clothing. She has separated the the process of manufacturing an item of clothing into n steps, so each assembly line is separated into n different stations, with each station performing a specific task (So for example station three’s job may be to add a right sleeve to shirts). The task of a specific station is independent of which line the station occurs on (so if station three’s job is to add a right sleeve to shirts, this will be true in both assembly line 1 and assembly line 2). Lets denote the jth station (with j = 1, 2, …, n) on line i (where i is 1 or 2) by Si, j. Although they’re doing the same jobs the time it takes the employee at station S1, j may be different from the time it takes the employee at station S2, j. We will denote the time required at station Si, j by ai, j. For each line, there is also an amount of time required for the article of clothing to enter assembly line i, ei; and an amount of time required for the article of clothing to exit assembly line i, xi.

One of the reasons that assembly lines are very productive is that stations on the same assembly line are generally in close proximity to one another, resulting in a very low cost of transferring an item from one station to the next on the same assembly line. When we have multiple lines in place, as Keisha has, there is a (possibly beneficial) cost of transferring an item from one line to another. Lets denote this cost by ti, j which represents the cost of transferring a partially completed item of clothing from line i after having gone through station Si, j (again, i is 1 or 2 and j = 1, 2, …, n).

The problem that Keisha would like solved is to determine which station to choose between lines 1 and 2 in order to minimize the total time it takes to produce an article of clothing.

Consider the following example:

Our goal is to get the clothing through the 3 states to produce a final product. What if we initially had the product take the route through station S2, 1 instead of station S1, 1? Lets assume that we make the decisions to send the article of clothing to stations S2, 2 and S2, 3 afterwards. This would result in a solution whose total cost is 3 + 8 + 4 + 6 + 3 = 24. Is this solution optimal (aka is this solution the minimum total time through the factory)? Lets consider what would happen if we had chosen station S1, 1 instead of S2, 1. The entry cost for line 1 is 1, the time required at station S1, 1 is 5 and the transfer time to go to assembly line 2 is 1. So the cost of this new solution is 1 + 5 + 1 + 4 + 6 + 3 = 20, which gives a cheaper solution.

This is called the principle of optimality (optimal substructure property) which states that in order for an overall solution to be optimal, the solution must also give the optimal solutions to every subproblem of the original problem. This problem of solving all subproblems may seem like a daunting task at first, but lets consider the example above again.

Initially, we have a new product and there are two options – either line one or line two. We will need these values in the future, so lets keep track of both choices in the form of a table.

 Station 1 cost0 e1 + a1, 1 cost1 e2 + a2, 1

After this initial step, the question becomes given the current path to station j-1, which assembly line can best serve station j? This cam be computed for each j > 1 by
cost1(j) = min{cost1(j-1) + a1, j, cost2(j-1) + t2, j-1 + a1, j}
cost2(j) = min{cost2(j-1) + a2, j, cost1(j-1) + t1, j-1 + a2, j}

As you can see, the calculation of costi(j) relies on the computation of costi(j-1). By calculating these values from station 1 to to station n, we are able to simply look up the values in the table instead of having to recalculate these values.

These give optimal solutions to each of the subproblems. We repeat this same step for all stages j = 2, …, n then we arrive at the final step were we finish the job. Lets define total_cost to indicate the cost of the optimal solution.
total_cost = min{cost1(n) + x1, cost2(n) + x2}

We’d like to see which value minimizes total_cost. Then we can trace back to find the values that minimized cost1 or cost2 at each step depending on which assembly line was chosen. The following algorithm does just this, and stores the assembly line chosen at each state in the variable line.

For the above example, the table would be calculated as follows:

 Station 1 Station 2 Station 3 Total Cost cost1 6 13 18 21 cost2 11 11 17 20

We can reconstruct the optimal path through assembly lines by seeing that we finish by going through station S2, 3.
We arrive at station S2, 3 by going through the assembly line station S2, 2.
We arrive at station S2, 2 by going through the assembly line station S1, 1.

This is precisely the path that is highlighted in the image above.

The algorithm to construct these paths and compute the total_cost for such problems is given below.

Algorithm FastestWay(a, t, e, x, m)
cost1 [<-] e1 + a1, 1
cost2 [<-] e2 + a2, 1
for (j [<-] 2 to n) if (cost1(j-1) + a1, j [<=] cost2(j-1) + t2, j-1 + a1, j
cost1(j) [<-] cost1(j-1) + a1, j
line1(j) [<-] 1 else cost1(j) [<-] cost2(j-1) + t2, j-1 + a1, j
line1(j) [<-] 2 if (cost2(j-1) + a2, j [<=] cost1(j-1) + t1, j-1 + a2, j
cost2(j) [<-] cost2(j-1) + a2, j
line2(j) [<-] 1 else cost2(j) [<-] cost1(j-1) + t1, j-1 + a2, j
line2(j) [<-] 2 if (cost1(n) + x1 [<=] cost2(n) + x2)
total_cost = cost1(n) + x1
final_line = 1
else
total_cost = cost2(n) + x2
final_line = 2

Note: I used Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein to help with this post.

# The Assignment Problem

Suppose you are the owner of a company and need to delegate tasks to your employees. You’ve generated a table that tells how long (in minutes) you think it would take each person to accomplish each individual task (called Jobs). Your goal is to find an assignment of people to jobs that minimizes the total amount of time it will take to complete all jobs. The requirements are that each job must be completed by only one person, and each person can complete only one job.

We can think of the employees at the job as our supply and the tasks as the demand. In order for this problem to have a feasible solution, we must have enough people (supply) to complete the number of jobs (demand). Because of this, our examples will all include situations where there are exactly the same number of people as jobs.

To solve this problem, we must first generate an initial assignment and see how good this assignment is. There are several ways of generating an initial solution, but two that I wanted to focus on are the “NorthWest Corner Rule” and the “Minimum Matrix Method”.

1. The Northwest Corner Rule considers the matrix and repeatedly assigns the top remaining row to the left-most remaining column. If we think of the cost matrix as a being like a map then “top” becomes similar to “north” and left-most becomes similar to “west”, hence the derivation of the name. In assignment problems, this will result in the main diagonal being selected.
2. The Minimum Matrix Method is an iterative method that searches the matrix for the minimum cell in the matrix and assigns that person to the associated job and removes them from consideration and repeats itself until all people have been assigned to jobs.

Once we have formulated an initial feasible solution, we need to check it for optimality. To do this, we use the Network Simplex Method, where we build a basis based on this initial solution. When we consider this problem as a network flow problem, a basis for the problem is a spanning tree (one less than twice the number of nodes in the graph that does not have any cycles) of the network. Because the assignment solution only contains one edge for every two nodes in the graph, we need to add a number of edges to the basis that contains no flow (which makes the solution degenerate) to form this spanning tree.

Once a spanning tree is formulated, we can solve for the dual variables by arbitrarily setting one node’s dual value to zero and solving for the remaining dual variables under the requirement that all arcs in the basis (spanning tree) must satisfy the equation uk + vi = cki for each person k and job i.

When we have dual variables, we can check to see if this solution is optimal by checking to see if all the other constraints are violated. This means that for every person k and every job i, we must have uk + vi cki (notice that this is a more relaxed version of what we had when we were solving for the dual variables themselves).

If a constraint is found to be violated, then we need to add the associated edge to the basis and remove an edge on the cycle that is formulated as a result, which generates a new solution.

So check out The Assignment Problem Script and let me know what you think.