Blog

The Corral Puzzle
I just finished a script on The Corral Puzzle.

I really enjoy puzzles. Even more than simply solving puzzles, I like understanding the thought process behind the puzzle process and trying to generate similar puzzles that are difficult in their own right yet still solvable. So far I’ve written works on The Sudoku Puzzle, The Nonogram Puzzle, and The Shade the Cells Puzzle. Today, I want to write about the Corral Puzzle.

The story behind how I discovered this puzzle is twofold. Initially, a coworker brought a Corral puzzle to my desk to see if I could solve it. It took me a minute, but I was able to get to the correct solution. We did a few more afterwards and I wrote some scripts to try to create new instances of these puzzles, but for a while it remained in the “unfinished work” category. Then sometime earlier this year, I purchased a book “Brain Workout: Math & Logic Puzzles” by Dave Tuller and Michael Rios. I was astonished to see that the first puzzle in this book was the Corral puzzle (which is where I was able to learn the name of the puzzles, as before they were just some things presented to me by a friend). The book has about 20 challenging puzzles (all of which I have not solved), but I was left again wondering how they came up with the puzzles. Before going further, I’ll give the rules of the puzzles and an example of a problem with the corresponding solution.

We are given a square grid (say 4 rows and 4 columns, or 5 rows and 5 columns, or in general n rows and n columns). Inside the grid some of the cells have a number inside of it. The object of the puzzle is for the user to draw a bag (also known as a corral) around the numbers inside the grid. The limitation is that the numbers tell how many neighboring cells can be “seen” from the given cell, looking only horizontally and vertically in both directions without reaching an endpoint of the bag. There are no restrictions on the shape of the bag except that it actually represents a closed loop inside the grid.

Consider the following example:

In this example, we are told the following hints:
1. Cell (1, 2) (row 1, column 2) can see 6 of its neighbors.
2. Cell (3, 1) can see 3 of its neighbors.
3. Cell (4, 2) can see 5 of its neighbors.
4. Cell (4, 3) can see 5 of its neighbors.
From these hints, we can reach the following solution:

Our corral in the solution would be to place the contiguous block of blue (turquoise) cells into a bag. We can check that the solution satisfies the assumptions as follows:
1. Cell (1, 2) can see cells (1, 1), (1, 2), (1, 3), (2, 2), (3, 2), and (4, 2), which is 6 cells.
2. Cell (3, 1) can see cells (3, 1), (3, 2), and (3, 3), which is 3 cells.
3. Cell (4, 2) can see cells (1, 2), (2, 2), (3, 2), (4, 2), and (4, 3), which is 5 cells.
4. Cell (4, 3) can see cells (1, 3), (2, 3), (3, 3), (4, 3), and (4, 2), which is 5 cells.
So we can see that this solution satisfies our assumptions.

Right now, I’m able to work with this by using it as an instance of The Set Cover Problem. The set that we would like to cover is the set of cells inside the Corral. The possible subsets are the cells in the corral that can be viewed by each cell inside the bag. Although Set Cover is an NP-Complete problem, we can still find feasible solutions using a number of algorithms. Here, I generate a feasible solution using the greedy approach.

There is still a problem with ambiguity. Sometimes the initial puzzle generated will allow for multiple Corrals to fit the original description. This is true with the example given as the following is also an optimal solution.

There is a thin line between revealing enough cells to ensure a unique solution and revealing too many cells such that the problem becomes trivial to solve. I’m still working on that and I may update the script in the future if I decide that the ambiguity is too much to live with.

So, enough talk. Go and check out my script on The Corral Puzzle and let me know what you think.
December 14, 2013
Interactive Tutorial of Kruskal’s Algorithm

I just finished working on another implementation of Kruskal’s algorithm. This version depends more on user input to generate the minimum spanning path tree.

Check it out.

November 10, 2013
Assembly Line Scheduling

I wanted to take a minute to help some users become more familiar with Dynamic Programming, so I decided to write a script on the Assembly Line Scheduling Problem.

To introduce the problem I want to tell you a story about a friend of mine. Keisha recently started a clothing company that uses two assembly lines to produce articles of clothing. She has separated the the process of manufacturing an item of clothing into n steps, so each assembly line is separated into n different stations, with each station performing a specific task (So for example station three’s job may be to add a right sleeve to shirts). The task of a specific station is independent of which line the station occurs on (so if station three’s job is to add a right sleeve to shirts, this will be true in both assembly line 1 and assembly line 2). Lets denote the j^th station (with j = 1, 2, …, n) on line i (where i is 1 or 2) by S_{i, j}. Although they’re doing the same jobs the time it takes the employee at station S_{1, j} may be different from the time it takes the employee at station S_{2, j}. We will denote the time required at station S_{i, j} by a_{i, j}. For each line, there is also an amount of time required for the article of clothing to enter assembly line i, e_i; and an amount of time required for the article of clothing to exit assembly line i, x_i.

One of the reasons that assembly lines are very productive is that stations on the same assembly line are generally in close proximity to one another, resulting in a very low cost of transferring an item from one station to the next on the same assembly line. When we have multiple lines in place, as Keisha has, there is a (possibly beneficial) cost of transferring an item from one line to another. Lets denote this cost by t_{i, j} which represents the cost of transferring a partially completed item of clothing from line i after having gone through station S_{i, j} (again, i is 1 or 2 and j = 1, 2, …, n).

The problem that Keisha would like solved is to determine which station to choose between lines 1 and 2 in order to minimize the total time it takes to produce an article of clothing.

Consider the following example:

Our goal is to get the clothing through the 3 states to produce a final product. What if we initially had the product take the route through station S_{2, 1} instead of station S_{1, 1}? Lets assume that we make the decisions to send the article of clothing to stations S_{2, 2} and S_{2, 3} afterwards. This would result in a solution whose total cost is 3 + 8 + 4 + 6 + 3 = 24. Is this solution optimal (aka is this solution the minimum total time through the factory)? Lets consider what would happen if we had chosen station S_{1, 1} instead of S_{2, 1}. The entry cost for line 1 is 1, the time required at station S_{1, 1} is 5 and the transfer time to go to assembly line 2 is 1. So the cost of this new solution is 1 + 5 + 1 + 4 + 6 + 3 = 20, which gives a cheaper solution.

This is called the principle of optimality (optimal substructure property) which states that in order for an overall solution to be optimal, the solution must also give the optimal solutions to every subproblem of the original problem. This problem of solving all subproblems may seem like a daunting task at first, but lets consider the example above again.

Initially, we have a new product and there are two options – either line one or line two. We will need these values in the future, so lets keep track of both choices in the form of a table.

Station 1

cost₀
e₁ + a_{1, 1}

cost₁
e₂ + a_{2, 1}

After this initial step, the question becomes given the current path to station j-1, which assembly line can best serve station j? This cam be computed for each j > 1 by
cost₁(j) = min{cost₁(j-1) + a_{1, j}, cost₂(j-1) + t_{2, j-1} + a_{1, j}}
cost₂(j) = min{cost₂(j-1) + a_{2, j}, cost₁(j-1) + t_{1, j-1} + a_{2, j}}

As you can see, the calculation of cost_i(j) relies on the computation of cost_i(j-1). By calculating these values from station 1 to to station n, we are able to simply look up the values in the table instead of having to recalculate these values.

These give optimal solutions to each of the subproblems. We repeat this same step for all stages j = 2, …, n then we arrive at the final step were we finish the job. Lets define total_cost to indicate the cost of the optimal solution.
total_cost = min{cost₁(n) + x₁, cost₂(n) + x₂}

We’d like to see which value minimizes total_cost. Then we can trace back to find the values that minimized cost₁ or cost₂ at each step depending on which assembly line was chosen. The following algorithm does just this, and stores the assembly line chosen at each state in the variable line.

For the above example, the table would be calculated as follows:

Station 1 Station 2 Station 3 Total Cost

cost₁ 6 13 18 21

cost₂ 11 11 17 20

We can reconstruct the optimal path through assembly lines by seeing that we finish by going through station S_{2, 3}.
We arrive at station S_{2, 3} by going through the assembly line station S_{2, 2}.
We arrive at station S_{2, 2} by going through the assembly line station S_{1, 1}.

This is precisely the path that is highlighted in the image above.

The algorithm to construct these paths and compute the total_cost for such problems is given below.

Algorithm FastestWay(a, t, e, x, m)
cost₁ [<-] e₁ + a_{1, 1}
cost₂ [<-] e₂ + a_{2, 1}
for (j [<-] 2 to n) if (cost₁(j-1) + a_{1, j} [<=] cost₂(j-1) + t_{2, j-1} + a_{1, j}
cost₁(j) [<-] cost₁(j-1) + a_{1, j}
line₁(j) [<-] 1 else cost₁(j) [<-] cost₂(j-1) + t_{2, j-1} + a_{1, j}
line₁(j) [<-] 2 if (cost₂(j-1) + a_{2, j} [<=] cost₁(j-1) + t_{1, j-1} + a_{2, j}
cost₂(j) [<-] cost₂(j-1) + a_{2, j}
line₂(j) [<-] 1 else cost₂(j) [<-] cost₁(j-1) + t_{1, j-1} + a_{2, j}
line₂(j) [<-] 2 if (cost₁(n) + x₁ [<=] cost₂(n) + x₂)
total_cost = cost₁(n) + x₁
final_line = 1
else
total_cost = cost₂(n) + x₂
final_line = 2

For more information please refer to My Assembly Line Scheduling Examples Page.

Note: I used Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein to help with this post.

November 10, 2013
Interactive Tutorial of Prim’s Algorithm

I just finished working on another implementation of Prim’s algorithm. This version depends more on user input to generate the minimum spanning path tree.

Check it out.

October 23, 2013
Interactive Tutorial of Dijkstra’s Algorithm

I just finished working on another implementation of Dijkstra’s algorithm. This version depends more on user input to generate the shortest path tree.

Check it out.

October 20, 2013
A JavaScript Implentation of MapReduce’s WordCount

You can view a javascript implementation of the WordCound Program in Mapreduce at Javascript Implementation of Mapreduce WordCount

One of the big things in the world of Data Science and Cloud Computing is the map-reduce implementation of various algorithms. This is not always a straightforward procedure and so learning to think in terms of map-reduce implementations can be a challenging conversion from thinking in a functional programming frame of mind. In light of this I thought it would be convenient to try to help users “visualize” this concept. This is a challenging task because there are many concepts of cloud computing that I am unable to provide in this environment. However, just as many of the books on MapReduce provide pseudo-code on various implementations of algorithms in a Map-Reduce environment, I will attempt to show how data flows from the input to the mappers to the shuffle and sort phase to the reducers and finally to generate the output. I leave the users the task of actually putting these into the context of a Java MapReduce environment.

I want to first speak about the concept of (key, value) pairs which is a very important in MapReduce programming. I will speak about this in the context of a WordCount program. The purpose of a WordCount program is to count the number of occurrences of each word in a given file. First data is input to the mapper in (key, value) pairs. For our example, the key will be the line number of input (so each line of input will go to a different mapper) and the value will be the text present on that line. Once the mapper has the input, it will perform some operation on it and output data again in (key, value) pairs. In the WordCount example, the mappers will simply output each word that occurs as a new key on that line and the integer “1” as the associated value (note that a single mapper can output multiple (key, value) pairs).

One of the main things to understand in a MapReduce is that there are a number of Mappers running on a given input file and these Mappers cannot interact with one another. Suppose we have two different mappers, lets call them Mapper₁ and Mapper₂ that are each working on two different lines of input from a file. If both lines of input have the word “apple”, there is no way for Mapper₂ to know that Mapper₁‘s line of input also has this word. In this setting that’s perfectly fine because the Shuffle and Sort phase is where all the (key, value) pairs that were output by the mappers, compares the keys to one another and if they are equal to one another combines their respective values into a list of values. Unequal keys are sorted.

So if both Mapper₁ and Mapper₂ contained the word “apple” in their line of text, then the (key, value) pair (apple, 1) will occur twice. So the Shuffle and Sort phase will notice this and output the (key, value) pair (apple, {1, 1}).

Each reducer is then given a key and a list of values that were output by the mappers. The goal will be to perform some operation and again output data in (key, value) pairs. In the WordCount example, we will use what is known as the sumReducer. It gets this name because its job is simply to sum the values in the list of values and output the (key, value) pair that is the original key and this sum of values.

You can view a javascript implementation of this at Javascript Implementation of Mapreduce WordCount

October 13, 2013
The Assignment Problem
I just finished a script that generates instances of the assignment problem and solves them step by step. You can check it out here.

Suppose you are the owner of a company and need to delegate tasks to your employees. You’ve generated a table that tells how long (in minutes) you think it would take each person to accomplish each individual task (called Jobs). Your goal is to find an assignment of people to jobs that minimizes the total amount of time it will take to complete all jobs. The requirements are that each job must be completed by only one person, and each person can complete only one job.

We can think of the employees at the job as our supply and the tasks as the demand. In order for this problem to have a feasible solution, we must have enough people (supply) to complete the number of jobs (demand). Because of this, our examples will all include situations where there are exactly the same number of people as jobs.

To solve this problem, we must first generate an initial assignment and see how good this assignment is. There are several ways of generating an initial solution, but two that I wanted to focus on are the “NorthWest Corner Rule” and the “Minimum Matrix Method”.
1. The Northwest Corner Rule considers the matrix and repeatedly assigns the top remaining row to the left-most remaining column. If we think of the cost matrix as a being like a map then “top” becomes similar to “north” and left-most becomes similar to “west”, hence the derivation of the name. In assignment problems, this will result in the main diagonal being selected.
2. The Minimum Matrix Method is an iterative method that searches the matrix for the minimum cell in the matrix and assigns that person to the associated job and removes them from consideration and repeats itself until all people have been assigned to jobs.
Once we have formulated an initial feasible solution, we need to check it for optimality. To do this, we use the Network Simplex Method, where we build a basis based on this initial solution. When we consider this problem as a network flow problem, a basis for the problem is a spanning tree (one less than twice the number of nodes in the graph that does not have any cycles) of the network. Because the assignment solution only contains one edge for every two nodes in the graph, we need to add a number of edges to the basis that contains no flow (which makes the solution degenerate) to form this spanning tree.

Once a spanning tree is formulated, we can solve for the dual variables by arbitrarily setting one node’s dual value to zero and solving for the remaining dual variables under the requirement that all arcs in the basis (spanning tree) must satisfy the equation u_k + v_i = c_ki for each person k and job i.

When we have dual variables, we can check to see if this solution is optimal by checking to see if all the other constraints are violated. This means that for every person k and every job i, we must have u_k + v_i c_ki (notice that this is a more relaxed version of what we had when we were solving for the dual variables themselves).

If a constraint is found to be violated, then we need to add the associated edge to the basis and remove an edge on the cycle that is formulated as a result, which generates a new solution.

So check out The Assignment Problem Script and let me know what you think.
October 12, 2013
Polynomial Arithmetic

With students beginning to attend classes across the nation, I wanted to focus the site towards some of the things they’re going to be addressing. This latest page publicize some scripts that I wrote to help with polynomial arithmetic. Originally I wrote these as homework exercises for a class in programming, but I have found them useful ever since – both in teaching mathematics classes like college algebra, which spends a lot of attention on polynomials, and in my research life. Its funny (and sad) the number of simple errors that a person (mathematician or not) can make when performing simple arithmetic, so I found it very useful to have a calculator more advanced than the simple scientific calculators that are so easily available.

I’m not going to spend a lot of time discussing the importance of polynomials, or trying to justify their need. I will bring up some problems that I’d like to address in the future, that deal with polynomials. The first is finding the roots of the characteristic polynomial of a matrix. This is useful in research because these roots are the eigenvalues of the matrix and can give many properties of the matrix. There are also some data analysis tools like Singular Value Decomposition and Principal Component Analysis where I will probably build out from this initial set of instances.

The user interface for the scripts I’ve written generate two polynomials and ask the user what is to be done with those polynomials. The options are to add the two, subtract polynomial 2 from polynomial 1, multiply the two, divide polynomial 1 by polynomial 2, and divide polynomial 2 by polynomial 1. There is also the option to make the calculations more of a tutorial by showing the steps along the way. Users who want new problems can generate a new first or second polynomial and clear work.

For addition and subtraction, the program works by first ensuring that both polynomials have the same degree. This can be achieved by adding terms with zero coefficient to the lower degree polynomial. Once this has been accomplished, we simply add the terms that have the same exponent.

For multiplication, the program first builds a matrix A, where the element a_{i, i+j} on row i and column i+j of the matrix A is achieved by multiplying the i^th term of the first polynomial by the j^th term of the second polynomial. If an was not given a value in the matrix, then we put a value of zero in that cell. Once this matrix is formed, we can sum the columns of the matrix to arrive at the final answer.

The division of two polynomials works first by dividing the first term of the numerator by the first term of the denominator. This answer is then multiplied by the denominator and subtracted from the numerator. Now, the first term in the numerator should cancel and we use the result as the numerator going froward. This process is repeated as long as the numerator’s degree is still equal to or greater than the denominator’s degree.

Check out the latest page on polynomial arithmetic and let me know what you think.

August 29, 2013
Hidden Markov Models: The Baum-Welch Algorithm
Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent (hidden) variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o₁, o₂, …, o_T, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:
- The N states of the model, defined by S = {S₁, …, S_N}
- The M possible output symbols, defined by = {₁, ₂, …, _M}
- The State transition probability distribution A = {a_ij}, where a_ij is the probability that the state at time t+1 is S_j, given that the state at time t is S_i.
- The Observation symbol probability distribution B = {b_j(_k)} where b_j(_k) is the probability that the symbol _k is emitted in state S_j.
- The initial state distribution = {_i}, where _i is the probability that the model is in state S_i at time t = 0.
  The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.
  - How often does the casino change dice?
    
    0) Dealer Repeatedly Uses Same Dice
    1) Dealer Uniformly Changes Die
    2) Dealer Rarely Uses Same Dice
  - Which sides on the loaded dice are more likely?
    
    0) Larger Numbers Are More Likely
    1) Numbers Are Randomly Likely
    2) Smaller Numbers Are More Likely
  How often does the casino change dice?
  
  Which sides on
  the loaded dice
  are more likely?
  
  (0, 0) (0, 1) (0, 2)
  
  (1, 0) (1, 1) (1, 2)
  
  (2, 0) (2, 1) (2, 2)
  
  One of the interesting problems associated with Hidden Markov Models is called the Learning Problem, which asks the question “How can I improve a HMM so that it would be more likely to have generated the sequence O = o₁, o₂, …, o_T?
  
  The Baum-Welch algorithm answers this question using an Expectation-Maximization approach. It creates two auxiliary variables _t(i) and _t(i, j). The variable _t(i) represents the probability of being in state i at time t, given the entire observation sequence. Likewise _t(i, j) represents the joint probability of being in state i at time t and of being in state j at time t+1, given the entire observation sequence. They can be calculated by
  
  _t(i) =
  
  (_t(i) * _t(i) )
  
  _{j = 1 to N}(_t(j) * _t(j))
  
  and
  
  _t(i, j) =
  
  (_t(i) * a_{i, j} * _t+1(j) * b_j(o_t+1) )
  
  _{i’ = 1 to N}_{j’ = 1 to N}(_t(i’) * a_{i’, j’} * _t+1(j’) * b_j’(o_t+1) )
  
  As you can see, these are a direct result of calculations of from the Forward algorithm and from the Backwards algorithm. Once we have calculated these variables, we can update the parameters of the model as follows:
  
  _i = ₁(i)
  
  _i,j = _{t = 1 to T-1}(_t(i))
  
  _{t = t to T-1} (_t(i, j))
  
  // [b bar]_{j, k} = Sigma_{t = 1 to T, o_t = o_k} gamma_{t, j} / Sigma_{t = 1 to T} gamma_{t, j}, 1 <= j <= N, 1 <= k <= M
  
  _j(o_k) =
  
  _{t = 1 to T-1, o_t = o_k} _t(j)
  
  _{t = 1 to T-1} _t(j)
  
  We can iterate this procedure a finite number of times or until it converges. This will generate a new model, = {N, , , , }.
  
  There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Baum-Welch Algorithm.
  
  Some further reading on Hidden Markov Models:
  - Hidden Markov Models, Theory and Applications
August 20, 2013
Hidden Markov Models: The Viterbi Algorithm
I just finished working on LEARNINGlover.com: Hidden Marokv Models: The Viterbi Algorithm. Here is an introduction to the script.

Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o₁, o₂, …, o_T, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:
- The N states of the model, defined by S = {S₁, …, S_N}
- The M possible output symbols, defined by = {₁, ₂, …, _M}
- The State transition probability distribution A = {a_ij}, where a_ij is the probability that the state at time t+1 is S_j, given that the state at time t is S_i.
- The Observation symbol probability distribution B = {b_j(_k)} where b_j(_k) is the probability that the symbol _k is emitted in state S_j.
- The initial state distribution = {_i}, where _i is the probability that the model is in state S_i at time t = 0.
  The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.
  - How often does the casino change dice?
    
    0) Dealer Repeatedly Uses Same Dice
    1) Dealer Uniformly Changes Die
    2) Dealer Rarely Uses Same Dice
  - Which sides on the loaded dice are more likely?
    
    0) Larger Numbers Are More Likely
    1) All Numbers Are Equally Likely
    2) Smaller Numbers Are More Likely
  How often does the casino change dice?
  
  Which sides on
  the loaded dice
  are more likely?
  
  (0, 0) (0, 1) (0, 2)
  
  (1, 0) (1, 1) (1, 2)
  
  (2, 0) (2, 1) (2, 2)
  
  One of the interesting problems associated with Hidden Markov Models is called the Decoding Problem, which asks the question “What is the most likely sequence of states that the HMM would go through to generate the sequence O = o₁, o₂, …, o_T?
  
  The Viterbi algorithm finds answers this question using Dynamic Programming. It creates an auxiliary variable _t(i) which has the highest probability that the partial observation sequence o₁, …, o_t can have, given that the current state is i. This variable can be calculated by the following formula:
  
  _t(i) = max_{q₁, …, q_t-1 p{q₁, …, q_t-1, q_t = i, o₁, …, o_t | }.}
  
  ₁(j) = _jb_j(o₁), for 1 j N.
  
  Once we have calculated _t(j) we also keep a pointer to the max state. We can then find the optimal path by looking at arg max _{1 j N} _T(j) and then backtrack the sequence of states using the pointer.
  
  There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Viterbi Algorithm.
  
  Some further reading on Hidden Markov Models:
  - Hidden Markov Models, Theory and Applications
August 18, 2013

	Station 1	Station 2	Station 3	Total Cost
cost₁	6	13	18	21
cost₂	11	11	17	20

	Station 1
cost₀	e₁ + a_{1, 1}
cost₁	e₂ + a_{2, 1}