Tag Archives: math

Dots and Boxes Game

Dots and Boxes Game

When I was in high school, one of my favorite ways to waste time in class (not recommended) was to play a game called dots and boxes (although at the time we just called it dots). I was very surprised to find later that this game belongs to a class of games called “Impartial Combinatorial Games”. These are games where the moves available to the player depend only on the position of the game, and not the player.

In a game of Dots and Boxes, we start with an initial grid with dots at each row and column intersection. At each player’s turn, they have the option of drawing either a horizontal or vertical line between two neighboring dots (depending on if the dots are in the same row or column). If a player fills in the last line on a box (the 4th side), we say that player “owns” the box. The game ends when there are no neighboring dots without a line between them. At the conclusion of the game, the player who owns the most dots is declared the winner.

The game is impartial because there is no restriction on which move a player can make other than the fact that a player cannot re-do a move that has already been made (an impartial version of this game would be if player one could only move horizontally and player two could only move vertically).

I have implemented a javascript version of this game. Check it out and let me know what you think.

I also spoke earlier about the discovery that this game in particular was an active area of research. I wanted to provide a link to a paper entitled “Solving Dots and Boxes” by Joseph K. Barker and Richard E Korf that speaks about winning strategies for each player in a game of dots and boxes.

Nim Games

I enjoy going to schools to give talks. Generally, I try to focus these talks around mathematics that’s not generally taught in classrooms to try to connect to some of the inquisitive nature of the students. One of my favorite ways of doing this is through combinatorial games. These combinatorial games are generally two player sequential games (i.e. players alternate taking moves) where both players know all the information about the game before any moves are made. This is called a game of complete information. In addition, these games are deterministic, in that unlike a game of poker or dice there is no random element introduced into the game.

One of the most common ways of introducing students to combinatorial games is through the game of Nim (which is also called the Subtraction game). I’ve written a script here to help introduce this game. In the game of Nim, there are initially a number (p) of rocks in a pile. There is also an array of possible legal moves that each player can choose from on each turn. Players alternate removing a legal amount of stones from the pile until some player is unable to make a move, at which point the opposing player (the player who made the last move) is declared the winner.

So example a game of (1-2-3)-Nim could go as follows. Suppose initially there are 23 stones.

Stones Player Removed
23 1 3
20 2 1
19 1 2
17 2 2
15 1 1
14 2 3
11 1 2
9 2 1
8 1 3
5 2 2
3 1 1
0 2 1

In the above example, since player 2 removes the last stone, player 1 is unable to move so player 2 is declared the winner. Each move that a player makes is either removing 1, 2 or 3 stones as we initially stated in the rules of the game.

Because Nim is a game of perfect information, we know a lot about the game before any moves are made. In fact, we can determine who should win the game if it is played perfectly just by knowing the set of available moves and the number of stones in the pile. We can do this by considering a game with 0 stones and determining who would win this game (player 2), and increasing the number of stones in the pile one by one and at each new cell, determining who would be the winner. In this method, we can say that we are in a winning position if there is a feasible move that would put the opposing player into a losing position. Consider the following table for the (1-2-3)-Nim game:

Stones 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Winner 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1

We can analyze this table as follows. With 0 stones, there are no moves that any player can make, but since player 1 goes first, they cannot make a move and lose the game. When there is 1, 2, or 3 stones, then player 1 can remove all the stones in the pile and in all cases player 2 will be looking at a situation where there are no stones to remove. When there are 4 stones, no matter how many stones player 1 removes, player 2 will be able to remove the remaining stones to ensure that player 1 is looking at a situation with no stones. We can repeat this process with any number of stones and we arrive at a table similar to the one listed above.

I have a script at my Nim games page where the set of possible moves and the number of stones in the pile are generated randomly and users get to play against a computer. Check it out and let me know what you think.

Assembly Line Scheduling

I wanted to take a minute to help some users become more familiar with Dynamic Programming, so I decided to write a script on the Assembly Line Scheduling Problem.

To introduce the problem I want to tell you a story about a friend of mine. Keisha recently started a clothing company that uses two assembly lines to produce articles of clothing. She has separated the the process of manufacturing an item of clothing into n steps, so each assembly line is separated into n different stations, with each station performing a specific task (So for example station three’s job may be to add a right sleeve to shirts). The task of a specific station is independent of which line the station occurs on (so if station three’s job is to add a right sleeve to shirts, this will be true in both assembly line 1 and assembly line 2). Lets denote the jth station (with j = 1, 2, …, n) on line i (where i is 1 or 2) by Si, j. Although they’re doing the same jobs the time it takes the employee at station S1, j may be different from the time it takes the employee at station S2, j. We will denote the time required at station Si, j by ai, j. For each line, there is also an amount of time required for the article of clothing to enter assembly line i, ei; and an amount of time required for the article of clothing to exit assembly line i, xi.

One of the reasons that assembly lines are very productive is that stations on the same assembly line are generally in close proximity to one another, resulting in a very low cost of transferring an item from one station to the next on the same assembly line. When we have multiple lines in place, as Keisha has, there is a (possibly beneficial) cost of transferring an item from one line to another. Lets denote this cost by ti, j which represents the cost of transferring a partially completed item of clothing from line i after having gone through station Si, j (again, i is 1 or 2 and j = 1, 2, …, n).

The problem that Keisha would like solved is to determine which station to choose between lines 1 and 2 in order to minimize the total time it takes to produce an article of clothing.

Consider the following example:

Assembly Line Example with 3 Stations

Our goal is to get the clothing through the 3 states to produce a final product. What if we initially had the product take the route through station S2, 1 instead of station S1, 1? Lets assume that we make the decisions to send the article of clothing to stations S2, 2 and S2, 3 afterwards. This would result in a solution whose total cost is 3 + 8 + 4 + 6 + 3 = 24. Is this solution optimal (aka is this solution the minimum total time through the factory)? Lets consider what would happen if we had chosen station S1, 1 instead of S2, 1. The entry cost for line 1 is 1, the time required at station S1, 1 is 5 and the transfer time to go to assembly line 2 is 1. So the cost of this new solution is 1 + 5 + 1 + 4 + 6 + 3 = 20, which gives a cheaper solution.

This is called the principle of optimality (optimal substructure property) which states that in order for an overall solution to be optimal, the solution must also give the optimal solutions to every subproblem of the original problem. This problem of solving all subproblems may seem like a daunting task at first, but lets consider the example above again.

Initially, we have a new product and there are two options – either line one or line two. We will need these values in the future, so lets keep track of both choices in the form of a table.

Station 1
cost0 e1 + a1, 1
cost1 e2 + a2, 1

After this initial step, the question becomes given the current path to station j-1, which assembly line can best serve station j? This cam be computed for each j > 1 by
cost1(j) = min{cost1(j-1) + a1, j, cost2(j-1) + t2, j-1 + a1, j}
cost2(j) = min{cost2(j-1) + a2, j, cost1(j-1) + t1, j-1 + a2, j}

As you can see, the calculation of costi(j) relies on the computation of costi(j-1). By calculating these values from station 1 to to station n, we are able to simply look up the values in the table instead of having to recalculate these values.

These give optimal solutions to each of the subproblems. We repeat this same step for all stages j = 2, …, n then we arrive at the final step were we finish the job. Lets define total_cost to indicate the cost of the optimal solution.
total_cost = min{cost1(n) + x1, cost2(n) + x2}

We’d like to see which value minimizes total_cost. Then we can trace back to find the values that minimized cost1 or cost2 at each step depending on which assembly line was chosen. The following algorithm does just this, and stores the assembly line chosen at each state in the variable line.

For the above example, the table would be calculated as follows:

Station 1 Station 2 Station 3 Total Cost
cost1 6 13 18 21
cost2 11 11 17 20

We can reconstruct the optimal path through assembly lines by seeing that we finish by going through station S2, 3.
We arrive at station S2, 3 by going through the assembly line station S2, 2.
We arrive at station S2, 2 by going through the assembly line station S1, 1.

This is precisely the path that is highlighted in the image above.

The algorithm to construct these paths and compute the total_cost for such problems is given below.

Algorithm FastestWay(a, t, e, x, m)
cost1 [<-] e1 + a1, 1
cost2 [<-] e2 + a2, 1
for (j [<-] 2 to n)
if (cost1(j-1) + a1, j [<=] cost2(j-1) + t2, j-1 + a1, j
cost1(j) [<-] cost1(j-1) + a1, j
line1(j) [<-] 1
else
cost1(j) [<-] cost2(j-1) + t2, j-1 + a1, j
line1(j) [<-] 2

if (cost2(j-1) + a2, j [<=] cost1(j-1) + t1, j-1 + a2, j
cost2(j) [<-] cost2(j-1) + a2, j
line2(j) [<-] 1
else
cost2(j) [<-] cost1(j-1) + t1, j-1 + a2, j
line2(j) [<-] 2

if (cost1(n) + x1 [<=] cost2(n) + x2)
total_cost = cost1(n) + x1
final_line = 1
else
total_cost = cost2(n) + x2
final_line = 2

For more information please refer to My Assembly Line Scheduling Examples Page.

Note: I used Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein to help with this post.

A JavaScript Implentation of MapReduce’s WordCount

MapReduce WordCount

You can view a javascript implementation of the WordCound Program in Mapreduce at Javascript Implementation of Mapreduce WordCount

One of the big things in the world of Data Science and Cloud Computing is the map-reduce implementation of various algorithms. This is not always a straightforward procedure and so learning to think in terms of map-reduce implementations can be a challenging conversion from thinking in a functional programming frame of mind. In light of this I thought it would be convenient to try to help users “visualize” this concept. This is a challenging task because there are many concepts of cloud computing that I am unable to provide in this environment. However, just as many of the books on MapReduce provide pseudo-code on various implementations of algorithms in a Map-Reduce environment, I will attempt to show how data flows from the input to the mappers to the shuffle and sort phase to the reducers and finally to generate the output. I leave the users the task of actually putting these into the context of a Java MapReduce environment.

I want to first speak about the concept of (key, value) pairs which is a very important in MapReduce programming. I will speak about this in the context of a WordCount program. The purpose of a WordCount program is to count the number of occurrences of each word in a given file. First data is input to the mapper in (key, value) pairs. For our example, the key will be the line number of input (so each line of input will go to a different mapper) and the value will be the text present on that line. Once the mapper has the input, it will perform some operation on it and output data again in (key, value) pairs. In the WordCount example, the mappers will simply output each word that occurs as a new key on that line and the integer “1″ as the associated value (note that a single mapper can output multiple (key, value) pairs).

One of the main things to understand in a MapReduce is that there are a number of Mappers running on a given input file and these Mappers cannot interact with one another. Suppose we have two different mappers, lets call them Mapper1 and Mapper2 that are each working on two different lines of input from a file. If both lines of input have the word “apple”, there is no way for Mapper2 to know that Mapper1‘s line of input also has this word. In this setting that’s perfectly fine because the Shuffle and Sort phase is where all the (key, value) pairs that were output by the mappers, compares the keys to one another and if they are equal to one another combines their respective values into a list of values. Unequal keys are sorted.

So if both Mapper1 and Mapper2 contained the word “apple” in their line of text, then the (key, value) pair (apple, 1) will occur twice. So the Shuffle and Sort phase will notice this and output the (key, value) pair (apple, {1, 1}).

Each reducer is then given a key and a list of values that were output by the mappers. The goal will be to perform some operation and again output data in (key, value) pairs. In the WordCount example, we will use what is known as the sumReducer. It gets this name because its job is simply to sum the values in the list of values and output the (key, value) pair that is the original key and this sum of values.

You can view a javascript implementation of this at Javascript Implementation of Mapreduce WordCount

Polynomial Arithmetic

Polynomial Arithmetic Image

With students beginning to attend classes across the nation, I wanted to focus the site towards some of the things they’re going to be addressing. This latest page publicize some scripts that I wrote to help with polynomial arithmetic. Originally I wrote these as homework exercises for a class in programming, but I have found them useful ever since – both in teaching mathematics classes like college algebra, which spends a lot of attention on polynomials, and in my research life. Its funny (and sad) the number of simple errors that a person (mathematician or not) can make when performing simple arithmetic, so I found it very useful to have a calculator more advanced than the simple scientific calculators that are so easily available.

I’m not going to spend a lot of time discussing the importance of polynomials, or trying to justify their need. I will bring up some problems that I’d like to address in the future, that deal with polynomials. The first is finding the roots of the characteristic polynomial of a matrix. This is useful in research because these roots are the eigenvalues of the matrix and can give many properties of the matrix. There are also some data analysis tools like Singular Value Decomposition and Principal Component Analysis where I will probably build out from this initial set of instances.

The user interface for the scripts I’ve written generate two polynomials and ask the user what is to be done with those polynomials. The options are to add the two, subtract polynomial 2 from polynomial 1, multiply the two, divide polynomial 1 by polynomial 2, and divide polynomial 2 by polynomial 1. There is also the option to make the calculations more of a tutorial by showing the steps along the way. Users who want new problems can generate a new first or second polynomial and clear work.

For addition and subtraction, the program works by first ensuring that both polynomials have the same degree. This can be achieved by adding terms with zero coefficient to the lower degree polynomial. Once this has been accomplished, we simply add the terms that have the same exponent.

For multiplication, the program first builds a matrix A, where the element ai, i+j on row i and column i+j of the matrix A is achieved by multiplying the ith term of the first polynomial by the jth term of the second polynomial. If an was not given a value in the matrix, then we put a value of zero in that cell. Once this matrix is formed, we can sum the columns of the matrix to arrive at the final answer.

The division of two polynomials works first by dividing the first term of the numerator by the first term of the denominator. This answer is then multiplied by the denominator and subtracted from the numerator. Now, the first term in the numerator should cancel and we use the result as the numerator going froward. This process is repeated as long as the numerator’s degree is still equal to or greater than the denominator’s degree.

Check out the latest page on polynomial arithmetic and let me know what you think.

Hidden Markov Models: The Baum-Welch Algorithm

Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent (hidden) variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o1, o2, …, oT, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:

  • The N states of the model, defined by S = {S1, …, SN}
  • The M possible output symbols, defined by = {1, 2, …, M}
  • The State transition probability distribution A = {aij}, where aij is the probability that the state at time t+1 is Sj, given that the state at time t is Si.
  • The Observation symbol probability distribution B = {bj(k)} where bj(k) is the probability that the symbol k is emitted in state Sj.
  • The initial state distribution = {i}, where i is the probability that the model is in state Si at time t = 0.

    The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.

    • How often does the casino change dice?
      • 0) Dealer Repeatedly Uses Same Dice
      • 1) Dealer Uniformly Changes Die
      • 2) Dealer Rarely Uses Same Dice
    • Which sides on the loaded dice are more likely?
      • 0) Larger Numbers Are More Likely
      • 1) Numbers Are Randomly Likely
      • 2) Smaller Numbers Are More Likely
    How often does the casino change dice?
    Which sides on
    the loaded dice
    are more likely?
    (0, 0) (0, 1) (0, 2)
    (1, 0) (1, 1) (1, 2)
    (2, 0) (2, 1) (2, 2)

    One of the interesting problems associated with Hidden Markov Models is called the Learning Problem, which asks the question “How can I improve a HMM so that it would be more likely to have generated the sequence O = o1, o2, …, oT?

    The Baum-Welch algorithm answers this question using an Expectation-Maximization approach. It creates two auxiliary variables t(i) and t(i, j). The variable t(i) represents the probability of being in state i at time t, given the entire observation sequence. Likewise t(i, j) represents the joint probability of being in state i at time t and of being in state j at time t+1, given the entire observation sequence. They can be calculated by

    t(i) =
    (t(i) * t(i) )
    j = 1 to N(t(j) * t(j))

    and

    t(i, j) =
    (t(i) * ai, j * t+1(j) * bj(ot+1) )
    i’ = 1 to Nj’ = 1 to N(t(i’) * ai’, j’ * t+1(j’) * bj’(ot+1) )

    As you can see, these are a direct result of calculations of from the Forward algorithm and from the Backwards algorithm. Once we have calculated these variables, we can update the parameters of the model as follows:

    i = 1(i)

    i,j = t = 1 to T-1(t(i))
    t = t to T-1 (t(i, j))

    // [b bar]_{j, k} = Sigma_{t = 1 to T, o_t = o_k} gamma_{t, j} / Sigma_{t = 1 to T} gamma_{t, j}, 1 <= j <= N, 1 <= k <= M

    j(ok) =
    t = 1 to T-1, ot = ok t(j)
    t = 1 to T-1 t(j)

    We can iterate this procedure a finite number of times or until it converges. This will generate a new model, = {N, , , , }.

    There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Baum-Welch Algorithm.

    Some further reading on Hidden Markov Models:

Hidden Markov Models: The Viterbi Algorithm

I just finished working on LEARNINGlover.com: Hidden Marokv Models: The Viterbi Algorithm. Here is an introduction to the script.

Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o1, o2, …, oT, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:

  • The N states of the model, defined by S = {S1, …, SN}
  • The M possible output symbols, defined by = {1, 2, …, M}
  • The State transition probability distribution A = {aij}, where aij is the probability that the state at time t+1 is Sj, given that the state at time t is Si.
  • The Observation symbol probability distribution B = {bj(k)} where bj(k) is the probability that the symbol k is emitted in state Sj.
  • The initial state distribution = {i}, where i is the probability that the model is in state Si at time t = 0.

    The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.

    • How often does the casino change dice?
      • 0) Dealer Repeatedly Uses Same Dice
      • 1) Dealer Uniformly Changes Die
      • 2) Dealer Rarely Uses Same Dice
    • Which sides on the loaded dice are more likely?
      • 0) Larger Numbers Are More Likely
      • 1) All Numbers Are Equally Likely
      • 2) Smaller Numbers Are More Likely
    How often does the casino change dice?
    Which sides on
    the loaded dice
    are more likely?
    (0, 0) (0, 1) (0, 2)
    (1, 0) (1, 1) (1, 2)
    (2, 0) (2, 1) (2, 2)

    One of the interesting problems associated with Hidden Markov Models is called the Decoding Problem, which asks the question “What is the most likely sequence of states that the HMM would go through to generate the sequence O = o1, o2, …, oT?

    The Viterbi algorithm finds answers this question using Dynamic Programming. It creates an auxiliary variable t(i) which has the highest probability that the partial observation sequence o1, …, ot can have, given that the current state is i. This variable can be calculated by the following formula:

    t(i) = maxq1, …, qt-1 p{q1, …, qt-1, qt = i, o1, …, ot | }.

    1(j) = jbj(o1), for 1 j N.

    Once we have calculated t(j) we also keep a pointer to the max state. We can then find the optimal path by looking at arg max 1 j N T(j) and then backtrack the sequence of states using the pointer.

    There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Viterbi Algorithm.

    Some further reading on Hidden Markov Models: