Tag Archives: search

Tarjan’s Strongly Connected Components Algorithm

I just added a program that finds the strongly connected components of a graph using Tarjan’s Algorithm.

A strongly connected component of a graph is a subgraph S of G where every pair of nodes, u and v in S there is a path from u to v and a path from v to u.

To find these strongly connected components we implement Tarjan’s algorithm. The idea behind Tarjan’s algorithm is to begin by running a depth first search from an arbitrary node in the graph, labeling nodes reachable from this start node in the order they are reached. The algorithm is also interested in the “oldest” node that could be reached by a given node. This is indicated by the keeping track of the lowest label that can be reached from that node. We will call the first property label(v) and the second lowlink(v).

When the algorithm starts label(v) is the same as lowlink(v) whenever a node is discovered. As the algorithm is executed, the DFS is being run on each discovered node, which in turn updates the lowlink(v) property telling of (older) nodes that can be reached. If an older node can be reached, then we update lowlink. If we reach a node that cannot connect to any older nodes after the DFS call, i.e if label(v) is the same a lowlink(v), then this means that this node does not have a path to any node with a lower label. So this node will be the first node of a new strongly connected component.

Feel free to check it out an let me know what you think in the comments below.

The Depth-First-Search Algorithm

I remember when I was younger I used to play the game of hide-and-seek a lot. This is a game where a group of people (at least two) separate into a group of hiders and a group of seekers. The most common version of this that I’ve seen is having one person as the seeker and everyone as hiders. Initially, the seeker(s) is given a number to count towards and close their eyes while counting. The hiders then search for places to hide from the seeker. Once the seeker is finished counting, their job is to find where everyone is hiding or admitting that they cannot find all the seekers. Any seekers not found are said to have won, and seekers that are found are said to have lost.

I played this game a number of times in my childhood, but I remember playing it with a friend named Dennis in particular. Dennis had a certain way he played as seeker. While many of us would simply go to places we deemed as “likely” hiding spots in a somewhat random order, Dennis would always begin by looking in one area of the room, making sure that he had searched through every area connected to that area before going to a new area. He continued this process until he either found everybody or concluded that he had searched every spot he could think of and gave up.

It wasn’t until years later that I was able to note the similarity between Dennis’s way of playing hide-and-seek and the Depth-First-Search algorithm. The Depth-First-Search Algorithm is a way of exploring all the nodes in a graph. Similar to hide-and-seek, one could choose to do this in a number of different ways. Depth-First-Search does this by beginning at some node, looking first at one of the neighbors of that node, then looking at one of the neighbors of this new node. If the new node does not have any new neighbors, then the algorithm goes to the previous node, looks at the next neighbor of this node and continues from there. Initially all nodes are “unmarked” and the algorithm proceeds by marking nodes as being in one of three states: visited nodes are marked as “visited”; nodes that we’ve marked to visit, but have not visited yet are marked “to-visit”; and unmarked nodes that have not been marked or visited are “unvisited”.

Consider a bedroom with the following possible hiding locations: (1) Under Bed, (2) Behind Cabinet, (3) In Closet, (4) Under Clothes, (5) Behind Curtains, (6) Behind Bookshelf, and (7) Under Desk. We can visualize how the bedroom is arranged as a graph and then use a Breadth First Search algorithm to show how Brent would search the room. Consider the following bedroom arrangement, where we have replaced the names of each item by the number corresponding to that item. Node (0) corresponds to the door, which is where Dennis stands and counts while others hide.

Bedroom Items as a Graph

Now consider how a Breadth First Search would be run on this graph.

Bedroom Items as a Graph Colored by DFS

The colors correspond to the order in which nodes are visited in Depth-First-Search.

The way we read this is that initially Dennis would start at node 0, which is colored in Blue.
While Dennis is at node 0, she notices that nodes 1, 5, and 6 (under bed, behind curtains, and behind bookshelf) are the nearby and have not been checked yet so she places them on the “to visit” list.
Next, Dennis will begin to visit each node on the “to visit” list, and when a node is visited, she labels it as visited. At each location, she also takes note of the other locations she can reach from this location. Below is the order of nodes Dennis visits and how he discovers new locations to visit.

Order VisitedNodeQueueAddingDistance From Node 0
106,5,10
265,17,3,21
373,2,5,12
432,5,12
525,142
645,13
7511
811

Here is a link to my Examples page that implements the Depth-First-Search Algorithm on Arbitrary Graphs.

The Breadth-First-Search Algorithm

I remember when I was younger I used to play the game of hide-and-seek a lot. This is a game where a group of people (at least two) separate into a group of hiders and a group of seekers. The most common version of this that I’ve seen is having one person as the seeker and everyone as hiders. Initially, the seeker(s) is given a number to count towards and close their eyes while counting. The hiders then search for places to hide from the seeker. Once the seeker is finished counting, their job is to find where everyone is hiding or admitting that they cannot find all the seekers. Any seekers not found are said to have won, and seekers that are found are said to have lost.

I played this game a number of times in my childhood, but I remember playing it with a friend named Brenda in particular. Brenda had a certain way she played as seeker. While many of us would simply go to places we deemed as “likely” hiding spots in a somewhat random order, Brenda would always take a survey of the room, and no matter where she began searching, she would always make note of the locations close to her starting point and make sure she was able to give them all a look before she looked at locations that were close to the points she deemed close to the starting point. She continued this process until she either found everybody or concluded that she had searched every spot she could think of and gave up.

It wasn’t until years later that I was able to note the similarity between Brenda’s way of playing hide-and-seek and the Breadth-First-Search algorithm. The Breadth-First-Search algorithm is a way of exploring all the nodes in a graph. Similarly to hide-and-seek, one could choose to do this in a number of different ways. Breadth-First-Search does this by beginning at some node, looking first at each of the neighbors of the starting node, then looking at each of the neighbors of the neighbors of the starting node, continuing this process until there are no remaining nodes to visit. Initially all nodes are “unmarked” and the algorithm proceeds by marking nodes as being in one of three stages: visited nodes are marked as “visited”; nodes that we’ve marked to visit, but have not visited yet are marked “to-visit”; and unmakred nodes that have not been marked are “unvisited”.

Consider a bedroom with the following possible hiding locations: (1) Under Bed, (2) Behind Cabinet, (3) In Closet, (4) Under Clothes, (5) Behind Curtains, (6) Behind Bookshelf, and (7) Under Desk. We can visualize how the bedroom is arranged as a graph and then use a Breadth First Search algorithm to show how Brenda would search the room. Consider the following bedroom arrangement, where we have replaced the names of each item by the number corresponding to that item. Node (0) corresponds to the door, which is where Brenda stands and counts while others hide.

Bedroom Items as a Graph

Now consider how a Breadth First Search would be run on this graph.

Bedroom Items as a Graph

The colors correspond to the order in which nodes are visited in Breadth-First-Search.

The way we read this is that initially Brenda would start at node 0, which is colored in Blue.
While Brenda is at node 0, she notices that nodes 1, 5, and 6 (under bed, behind curtains, and behind bookshelf) are the nearby and have not been checked yet so she places them on the “to visit” list.
Next, Brenda will begin to visit each node on the “to visit” list, and when a node is visited, she labels it as visited. At each location, she also takes note of the other locations she can reach from this location. Below is the order of nodes Brenda visits and how she discovers new locations to visit.

Order VisitedNodeQueueAdding Distance From Node 0
101, 5, 60
215, 62, 41
356, 2, 41
462, 43, 71
524, 3, 72
643, 72
7372
872

Here is a link to my Examples page that implements the Breadth-First-Search Algorithm on Arbitrary Graphs.

The A* Algorithm

As a child I remember traveling on road trips, sitting in the back of a car trying to do my best to keep myself busy for what would occupy the next six to ten hours of my life. One of the things I grew to like were the simple maze books that were sold on the magazine racks at some of the gas stations where we’d stop for food. There are two basic strategies I employed for solving these mazes: For simpler mazes, I could generally just take a look at the overall maze structure, decide upon a path through the maze, then write a path without any mistakes. For more complex mazes though, I would generally begin a route that looks the most promising. If that route reaches a point where I can see that it will be impossible to finish, then I’d go back to where the decision was made, exclude the route I had just tried, and select the “most promising” remaining route. I’d continue this process until I had completed the maze, or until it became simple enough for me to solve the maze using only my memory.

The A* Algorithm works in a similar manner to the second approach I just described. We begin at a starting point, and consider where to move next from that starting point. The set of possible options for this next move is determined by the neighbor function for a given cell. For each neighbor the algorithm estimates the length of the route through that cell by calculating the sum of two values, f(n) = g(n) + h(n), where

g(n) is the (known) cost to travel from the starting point to the cell n.
h(n) is the (approximate) cost to travel from the cell n to the final cell.

The sum g(n) + h(n) allows us to approximate the total cost of a route through the cell n.

The A* algorithm begins at the starting position of the maze. There are two sets we will be considering throughout the process of determining the optimal route, called the closedSet and the openSet. The elements of closedSet are the nodes whose total distance from the starting position has been calculated, whereas the elements of openSet represents nodes whose total distance is still under consideration. There is also a map called prev which is used to reconstruct the path. Below is how the algorithm operates:

A* Algorithm Pseudocode
closedSet is the empty set
openSet = {start}
prev is the empty set

While there are still elements in openSet,
     Find the element c* in openSet with the lowest value f(c).
     If c* is the target position
          Reconstruct the path.
     Else If c* is not the target position,
          Remove c* from openSet
          Add c* to closedSet
          For each neighbor n of c* that is not in closedSet,
               Calculate a temporary g value, temp_g(n) = g(c*) + dist(c*, n).
               If n is not in openSet, or if n is in openSet and temp_g(n) < g(n),                     Set prev(n) = c*                     Set g(n) = temp_g(n)                     Set f(n) = g(n) + h(n).                     if n is not in openSet                          add n to openSet.      End If End While End AlgorithmAn important question becomes what makes a good heuristic function to approximate the distance to the goal. This can lead to an in depth discussion based on the word "good", but the necessary condition for any heuristic is that it NEVER over-estimates the cost of the path from the cell to the goal. Some examples of possible heuristics for mazes are the Euclidean Distance (the square root of the sum of the squares of the horizontal and vertical differences in distances, i.e. dE(x, y) = (i(xi – yi)2) and TaxiCab Distance (the sum of the differences in the horizontal and vertical dimensions, i.e. dT(x, y) = i|xi – yi|). Both of these are feasible metrics for heuristics on a maze. Other herusitics, like h(n) = 0 for all n are possible, but doing this would make the algorithm treat all cells equally and ignore the heuristic part of the A* algorithm, turning it into Dijkstra’s algorithm.

I’ve written a script that generates random mazes and uses the A* algorithm to find the optimal path through this maze. For this script, I used the taxicab distance heuristic.

Check it out and let me know what you think.

Permutation Problems

I love to play with puzzles. When I was in grade school I would spend hours at a time figuring out ways to solve from things like Tetris, Mindsweeper, Solitare, and Freecell. Later I was introduced to puzzles involving numbers like Sudoku and Nonograms.

These puzzles are often interesting in part because there is generally a very large way that things can be arranged, but only a few of these arrangements are correct. Generally a person solving a puzzle will figure out certain things that must be true or cannot be true, which helps in solving the puzzle and reducing the number of possible cases. Initially, though, we are often left with a situation where we have a new puzzle and our only method is to keep trying every possible solution until we start to notice a pattern (or reach a solution).

For example, consider a Sudoku puzzle. We are given a partially filled in grid and our job is to fill in the remaining cells with the rules that every row, column and marked subgrid must have the numbers 1 – 9 exactly once. One initial attempt at solving such a puzzle could be to attempt to permute the string 1, 2, 3, 4, 5, 6, 7, 8, 9 until we find a solution that fits the first row, then do the same with the second row, and so on and so forth.

One immediate question is how many ways are there to permute the numbers 1, …, 9? We can answer this by realizing that each permutation is a new string. So for each string that we construct, we have 9 choices for the first element in the string. Then once that element has been chosen, we are not allowed for that element to appear anywhere else. So there are only 8 possible choices for what can go in the second string. Continuing this process, we see that the number of possible permutations we can construct from the string 1, …, 9 is

9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1 = 362880

This is a large number of possible strings to generate just to get one row of a Sudoku, so hopefully you’ll be able to notice the pattern before going through this whole set (because once you’ve generated the first row you still have to do the other 8 rows).

Nonetheless because there is often great value that can be gained by knowing how to permute through all possible solutions, I have written three functions that help with this process: Next_Permutation, Previous_Permutation, and Random_Permutation.

Before I give these algorithms, I want to highlight two notations on ordering string. A string (a1, a2, …, an) is said to be in lexicographical order (or alphabetical order) if for each i [in] 1, …, n-1, ai [<=] ai+1. Likewise, a string is said to be in reverse lexicographical order if for each i [in] 1, …, n-1, ai [>=] ai+1.

Next Permutation

If the given string is not in reverse lexicographic order, then there exists two elements j1 and j2 such that j1 [<] j2 and aj1 [<] aj2.
1. The Next_Permutation algorithm first searches for the largest element j1 such that aj1 [<] aj1 + 1. Since we said the string is not in reverse lexicographic order, this j1 must exist.
2. Once this j1 is found, we search for the smallest element aj2 such that aj2 [>] aj1. Again, since we know that this is true for j1 + 1, we know that such a j2 must exist.
3. We swap the elements aj1 and aj2.
4. The elements after j1 + 1 are then placed in lexicographic order (i.e. in order such that ai [>=] ai+1

If the given string is in reverse lexicographic order, then we simply reverse the string.

Previous Permutation

If the given string is not in lexicographic order, then there exists two elements j1 and j2 such that j1 [<] j2 and aj1 [>] aj2.
1. The Previous_Permutation algorithm first searches for the largest element j1 such that aj1 [>] aj1 + 1. Since we said the string is not in reverse lexicographic order, this j1 must exist.
2. Once this j1 is found, we search for the smallest element aj2 such that aj1 [>] aj2. Again, since we know that this is true for j1 + 1, we know that such a j2 must exist.
3. We swap the elements aj1 and aj2.
4. The elements after j1 + 1 are then placed in reverse lexicographic order (i.e. in order such that ai [<=] ai+1

If the given string is in lexicographic order, then we simply reverse the string.

Random Permutation

This generates a random string permutation.

This script can be seen here, set to work on permutations of colors.