Tag Archives: math

The A* Algorithm

As a child I remember traveling on road trips, sitting in the back of a car trying to do my best to keep myself busy for what would occupy the next six to ten hours of my life. One of the things I grew to like were the simple maze books that were sold on the magazine racks at some of the gas stations where we’d stop for food. There are two basic strategies I employed for solving these mazes: For simpler mazes, I could generally just take a look at the overall maze structure, decide upon a path through the maze, then write a path without any mistakes. For more complex mazes though, I would generally begin a route that looks the most promising. If that route reaches a point where I can see that it will be impossible to finish, then I’d go back to where the decision was made, exclude the route I had just tried, and select the “most promising” remaining route. I’d continue this process until I had completed the maze, or until it became simple enough for me to solve the maze using only my memory.

The A* Algorithm works in a similar manner to the second approach I just described. We begin at a starting point, and consider where to move next from that starting point. The set of possible options for this next move is determined by the neighbor function for a given cell. For each neighbor the algorithm estimates the length of the route through that cell by calculating the sum of two values, f(n) = g(n) + h(n), where

g(n) is the (known) cost to travel from the starting point to the cell n.
h(n) is the (approximate) cost to travel from the cell n to the final cell.

The sum g(n) + h(n) allows us to approximate the total cost of a route through the cell n.

The A* algorithm begins at the starting position of the maze. There are two sets we will be considering throughout the process of determining the optimal route, called the closedSet and the openSet. The elements of closedSet are the nodes whose total distance from the starting position has been calculated, whereas the elements of openSet represents nodes whose total distance is still under consideration. There is also a map called prev which is used to reconstruct the path. Below is how the algorithm operates:

A* Algorithm Pseudocode
closedSet is the empty set
openSet = {start}
prev is the empty set

While there are still elements in openSet,
     Find the element c* in openSet with the lowest value f(c).
     If c* is the target position
          Reconstruct the path.
     Else If c* is not the target position,
          Remove c* from openSet
          Add c* to closedSet
          For each neighbor n of c* that is not in closedSet,
               Calculate a temporary g value, temp_g(n) = g(c*) + dist(c*, n).
               If n is not in openSet, or if n is in openSet and temp_g(n) < g(n),
                    Set prev(n) = c*
                    Set g(n) = temp_g(n)
                    Set f(n) = g(n) + h(n).
                    if n is not in openSet
                         add n to openSet.
     End If
End While
End Algorithm

An important question becomes what makes a good heuristic function to approximate the distance to the goal. This can lead to an in depth discussion based on the word "good", but the necessary condition for any heuristic is that it NEVER over-estimates the cost of the path from the cell to the goal. Some examples of possible heuristics for mazes are the Euclidean Distance (the square root of the sum of the squares of the horizontal and vertical differences in distances, i.e. dE(x, y) = (i(xi – yi)2) and TaxiCab Distance (the sum of the differences in the horizontal and vertical dimensions, i.e. dT(x, y) = i|xi – yi|). Both of these are feasible metrics for heuristics on a maze. Other herusitics, like h(n) = 0 for all n are possible, but doing this would make the algorithm treat all cells equally and ignore the heuristic part of the A* algorithm, turning it into Dijkstra’s algorithm.

I’ve written a script that generates random mazes and uses the A* algorithm to find the optimal path through this maze. For this script, I used the taxicab distance heuristic.

Check it out and let me know what you think.

Naive Bayesian Classification

Suppose there is a sequence of events that took place e1, …, ene, with each event belonging to a certain classification group g1, …, gng. Then the problem of determining which of these groups a new event belongs is the classification problem.

A naive Bayes classifier will determine to which of the possible classification groups a new observation belongs with the (naive) assumption that every feature of this new observation has no relationship to any other feature of this observation.

This assumption of independence of the columns of the feature vector allows us to use Bayes’ Theorem to determine the probability that the new observation will belong to each of the observation groups. Bayes’ Theorem will then say that the probability this new observation belongs to a classification group, given the features is equal to the probability of the occurrence of that classification group in the observed data (i.e. P(C)) multiplied by the conditional probability of the joint distribution of the features given the same classification group P(F1, …, Fnf. The naive assumption allows us to quickly calculate the joint distribution of the features, given the classification group as the product of each feature given that same classification group.

This can be written as:

P(C | F1, …, Ffn) =
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)
= P(C) P(F1 | C) * … * P(Ffn | C
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)
= P(C) i = 1 to nfP(Fi | C)
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)

So suppose we have observations that give the following data:
P(F1 | N) = 0.286
P(F2 | N) = 0.143
P(F3 | N) = 0.429
P(F4 | N) = 0.429

P(F1 | Y) = 0.143
P(F2 | Y) = 0.571
P(F3 | Y) = 0.571
P(F4 | Y) = 0.571

P(Y) = 0.5
P(N) = 0.5

Then P(N | F1, F2, F3, F4) = (0.5 * 0.286 * 0.143 * 0.429 * 0.429)
= 0.008

Then P(Y | F1, F2, F3, F4) = (0.5 * 0.143 * 0.571 * 0.571 * 0.571)
= 0.001

After we normalize the two terms, we wind up with Then P(N | F1, F2, F3, F4) = .2204


Then P(Y | F1, F2, F3, F4) = .7796

So the naive Bayes classifier says that the likely classification group for this observation is Y.

Check out my examples page for more examples of the naive Bayes classifier.

Introduction to JavaScript Programming

Here is a link to my sample JavaScript code.

I received a lot of attention from friends interested in programming after my recent blog post entitled “Introduction to Python Programming”. While many found it interesting, the fact that Python is more useful to mathematicians hindered sine of my friends desire to learn it as their first language.

In out conversations, my recommendation for a first language was JavaScript. This is a powerful language in the sense that just about anybody who is involved with the internet knows it, and it’s likely to boost a person’s resume. It also has many similarities to more powerful languages like C++ and Java, so while not trivial, it could be a good launch pad into more advanced languages. But my favorite reason is that unlike many other programming languages that rely in an MS-DOS like command like approach for run time interaction, JavaScript’s basic interaction is with the standard internet browsers we use everyday. There isn’t even anything you need to download or install. Just create a basic HTML file in a text editor (like notepad, wordpad, or notepad++). This makes it easier to show off your creations which makes learning more fun.

The script I’ve finished provides examples on writing output, declaring variables, data types, conditionals, loops, and functions. Although I do not go into detail about all the events and objects on an HTML page, I do finish with three examples of more advanced JavaScript programs. Once you’ve selected a program, the code well be revealed in the text area. There is also a button that, when clicked, will execute that script on a new HTML tab.

I hope you enjoy, and let me know if you have any suggestions or comments.

With that being said, here is a link to my sample JavaScript code.

Simple Linear Regression

I finished a script that helps explain the concepts of simple linear regression.

We live in a world that is filled with patterns – patterns all around us just waiting to be discovered. Some of these patterns are not as easily discovered because of the existence of outside noise.

Consider for example an experiment where a set of people were each given the task of drinking a number of beers and having their blood alcohol level taken afterwards. Some noise factors in this could include the height and weight of the individual, the types of drinks, the amount of food eaten, and the time between drinks. Even with this noise, though, we can still see a correlation between the number of drinks and their blood alcohol level. Consider the following graph showing people’s blood-alcohol level after a given number of drinks. The x-axis represents the number of drinks and the y-axis is the corresponding blood alcohol level.

Example of Linear Regression

x 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4
y 0.10 0.03 0.19 0.12 0.04 0.095 0.070 0.060 0.020 0.050 0.070 0.10 0.085 0.090 0.010 0.050

We can definitely see a correlation, and although the data doesn’t quite fit on a straight line. It leads us to ask further questions like can we use this data to build a model that estimates a person’s blood-alcohol level and how strong is this model?

One of the tools we can use to model this problem is linear regression. A linear regression takes a two-dimensional data set, with the assumption that one column (generally represented by the x variable) is independent and the second column (generally represented by the y variable) being dependent on the first column. The assumption is that the relationship between the two columns is linear and can be represented by the linear equation

y = 0 + 1x + e.

The right hand side of the above equation has three terms. The first two (0 and 1) are the parameters of the linear equation (the y-intercept and slope respectively), while the third term of the right hand side of the above equation represents the error term. The error term represents the difference between this linear equation and the y values in the data provided. We are seeking a line that minimizes the error term. That is, we are seeking to minimize

D = i = 1 to n [yi - (0 + 1xi)]2

There are several ways one could approach this problem. In fact, there are several lines that one could use to build a linear model. The first line that one may use to model these points is the one generated by only mean of the y values of the points, called the horizontal line regression.

For the data set above, the mean of the y values can be calculated as = 0.0738, so we could build a linear model based on this mean that would be y = 0.738. This horizontal line regression model is a horizontal line that predicts the same score (the mean), regardless of the x value. This lack of adjustments means it is generally a poor fit for most models. But as we will see later, this horizontal line regression model does serve a purpose in determining how well the model we develop performs.

A second attempt at solving this problem would be to generate the least squares line. This is the line that minimizes the D value listed above. We can see that D is a multi-variable polynomial, and we can find the minimum of such a polynomial using calculus, partial derivatives and Gaussian elimination (I will omit the work here because it deters us from the main point of this blog post, however Steven J. Miller has a good write-up of this).

The calculus leads us to the following equations:

SXY = i = 1 to n(xy) –
(i = 1 to nx)(i = 1 to ny)

SXX = i = 1 to n(x2) –
(i = 1 to nx)2

1 =

0 = 1

To calculate the least squares line for this example, we first need to calculate a few values:
i = 1 to n(xy) = 6.98
i = 1 to n(x2) = 443
i = 1 to nx = 77
i = 1 to ny = 1.18
Sxx = 72.44
Sxy = 1.30

This lets us evaluate that
1 = 0.018
0 = -0.0127

So the resulting linear equation for this data is

= -0.0127 + 0.018*x

Below is a graph of the two attempts at building a linear model for this data.

Example of Models of Linear Regression

In the above image, the green line represents the horizontal line regression model and the blue line represents the least-squares line. As stated above, the horizontal line regression model is a horizontal line that does not adjust as the data changes. The least-squares line adjusts both the slope and y-intercept of this line according to the data provided to better fit the data provided. The question becomes how well does the least-squares line fit the data.

The Sum of Squares Error (SSE) sums the deviation at each point of our data from the least-squares line.

SSE = i = 1 to n(yii)2

A second metric that we are interested in is how well the horizontal line regression linear model estimates our data. This is called the Total Sum of Squares (SST).

SST = i = 1 to n(yi)2

The horizontal line regression model ignores the independent variable x from our data set and thus any line that takes this independent variable into account will be an improvement on the horizontal line regression model. Because of this, the SST sum is a worse case scenario of how poorly our model can perform.

Knowing now that SST is always greater than SSE, the regression sum of squares (SSR) is the difference between the total sum of squares and the sum of squares error.


This tells us how much of the total sum of squares is accounted for by the model.

Finally, the coefficient of determination (r2) is defined by

r2 = SSR / SST

This tells SSR as a percentage of SST, or the amount of the variation in the data that is explained by the model.

So, check out my script on simple linear regression and let me know what you think.

Learning About Truth Tables

Here is a link to my sample Truth Table Generator.

Truth Tables Generator Image

An important part of mathematics and computer programming is understanding conditional expressions. These are statements that generally read like “if [condition is true], then [execute a sequence of statements]“. A simple example of this is that if we wanted to print out every even number, our conditional would be if (2 divides into x with remainder 0) then print x, or in JavaScript

if (x % 2 == 0)

Conditional expressions belong to the world of Boolean logic. These are expressions that evaluate to true or false, depending on the values of the variables involved in this expression. When we are dealing with real world examples, this is generally a statement like “X is an even number” (for some number X) or “The element x is in the set Y” (for some element x and some set Y). Notice that both the statements can be evaluated as true or false statements. We are interested in understanding the Boolean logic behind combining a number of these expressions, and understanding how the evaluation of the simpler expressions help determine the values of the more complex formulas.

One way of doing this in mathematics is by constructing a truth table. A truth table is a table that shows how a Boolean expression’s value can be computed. The procedure in constructing a truth table is to first add a column to the table for each variable involved in the expression. Then we compute the value of each sub-expression of the expression in its own column until we have computed the entire expression in the final column.

There are four logical operators that we will be working with

– The negation operator (¬P), which returns true if the variable P is false, and returns false otherwise.
– the or operator (P ? Q), which returns true if P is true or Q is true, or if both are true, and returns false otherwise.
– the and operator (P ? Q), which returns true if both P and Q are true, and returns false otherwise.
– the implies operator (P ? Q), which returns false if P is true and Q is false, and returns true otherwise.

An example of this is below:

Suppose we have the following formula:
((Q ? P) ? (¬ (Q ? R)))

The truth table would then be:

P Q R (Q ? R) (¬ (Q ? R)) (Q ? P) ((Q ? P) ? (¬ (Q ? R)))

First, notice that the truth table has 8 rows. This corresponds to the 8 distinct possible combinations of values for the three variables. The number 8 is also = 23, which is not a mistake. In general, if an expression has n variables, its corresponding truth table will contain 2n rows. In this truth table, column 4 represents the sub-expression (Q ? R). Notice that the only times this expression evaluates to true is when both Q nd R are true. The next column is (¬ (Q ? R)), the negation of (Q ? R), so the values in this column are the opposite of those in the previous column. We follow that up with the column for the sub-expression (Q ? P), which has a value of true only when the variables Q and P both have values of true. And the final column is the expression we started with ((Q ? P) ? (¬ (Q ? R))), but we can now evaluate the expression based on the two previous columns. Notice that the values of true in this column only correspond to when at least one of the previous two columns evaluated to true.

Check out my script on truth tables to see more examples and learn more about truth tables.

Introduction to Python Programming

Here is a link to my sample Python code.

One of the somewhat unforeseen consequences of taking a career in applied mathematics, particularly in this day and age, is that you will eventually need to write computer programs that implement the mathematical algorithms. There are several languages in which one can do this, each with its own positives and negatives and you will find that things that are simple in some are difficult in another. After speaking with a number of people, both students and professionals who work with mathematics on a regular basis, I reasoned that it may be helpful to provide some source code examples to help mathematicians get started with programming in some of these languages. I decided to start with Python because its a powerful language, available for free, and its learning curve isn’t too steep.

I will be working with Python 2.7, which can be downloaded from https://www.python.org/download/releases/2.7/. I understand, however, that a limitation to coding is the required setup often necessary before one can even write their first line of code. So while I do encourage you to download Python, I will also provide a link to the online Python compiler at Compile Online, which should allow users to simply copy and paste the code into a new tab in their browser and by simply clicking the “Execute Script” command in the upper left corner, see the output of the code.

With that being said, here is a link to my sample Python code.

Dots and Boxes Game

Dots and Boxes Game

When I was in high school, one of my favorite ways to waste time in class (not recommended) was to play a game called dots and boxes (although at the time we just called it dots). I was very surprised to find later that this game belongs to a class of games called “Impartial Combinatorial Games”. These are games where the moves available to the player depend only on the position of the game, and not the player.

In a game of Dots and Boxes, we start with an initial grid with dots at each row and column intersection. At each player’s turn, they have the option of drawing either a horizontal or vertical line between two neighboring dots (depending on if the dots are in the same row or column). If a player fills in the last line on a box (the 4th side), we say that player “owns” the box. The game ends when there are no neighboring dots without a line between them. At the conclusion of the game, the player who owns the most dots is declared the winner.

The game is impartial because there is no restriction on which move a player can make other than the fact that a player cannot re-do a move that has already been made (a partial version of this game would be if player one could only move horizontally and player two could only move vertically).

I have implemented a javascript version of this game. Check it out and let me know what you think.

I also spoke earlier about the discovery that this game in particular was an active area of research. I wanted to provide a link to a paper entitled “Solving Dots and Boxes” by Joseph K. Barker and Richard E Korf that speaks about winning strategies for each player in a game of dots and boxes.

Nim Games

I enjoy going to schools to give talks. Generally, I try to focus these talks around mathematics that’s not generally taught in classrooms to try to connect to some of the inquisitive nature of the students. One of my favorite ways of doing this is through combinatorial games. These combinatorial games are generally two player sequential games (i.e. players alternate taking moves) where both players know all the information about the game before any moves are made. This is called a game of complete information. In addition, these games are deterministic, in that unlike a game of poker or dice there is no random element introduced into the game.

One of the most common ways of introducing students to combinatorial games is through the game of Nim (which is also called the Subtraction game). I’ve written a script here to help introduce this game. In the game of Nim, there are initially a number (p) of rocks in a pile. There is also an array of possible legal moves that each player can choose from on each turn. Players alternate removing a legal amount of stones from the pile until some player is unable to make a move, at which point the opposing player (the player who made the last move) is declared the winner.

So example a game of (1-2-3)-Nim could go as follows. Suppose initially there are 23 stones.

Stones Player Removed
23 1 3
20 2 1
19 1 2
17 2 2
15 1 1
14 2 3
11 1 2
9 2 1
8 1 3
5 2 2
3 1 1
0 2 1

In the above example, since player 2 removes the last stone, player 1 is unable to move so player 2 is declared the winner. Each move that a player makes is either removing 1, 2 or 3 stones as we initially stated in the rules of the game.

Because Nim is a game of perfect information, we know a lot about the game before any moves are made. In fact, we can determine who should win the game if it is played perfectly just by knowing the set of available moves and the number of stones in the pile. We can do this by considering a game with 0 stones and determining who would win this game (player 2), and increasing the number of stones in the pile one by one and at each new cell, determining who would be the winner. In this method, we can say that we are in a winning position if there is a feasible move that would put the opposing player into a losing position. Consider the following table for the (1-2-3)-Nim game:

Stones 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Winner 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1

We can analyze this table as follows. With 0 stones, there are no moves that any player can make, but since player 1 goes first, they cannot make a move and lose the game. When there is 1, 2, or 3 stones, then player 1 can remove all the stones in the pile and in all cases player 2 will be looking at a situation where there are no stones to remove. When there are 4 stones, no matter how many stones player 1 removes, player 2 will be able to remove the remaining stones to ensure that player 1 is looking at a situation with no stones. We can repeat this process with any number of stones and we arrive at a table similar to the one listed above.

I have a script at my Nim games page where the set of possible moves and the number of stones in the pile are generated randomly and users get to play against a computer. Check it out and let me know what you think.

Assembly Line Scheduling

I wanted to take a minute to help some users become more familiar with Dynamic Programming, so I decided to write a script on the Assembly Line Scheduling Problem.

To introduce the problem I want to tell you a story about a friend of mine. Keisha recently started a clothing company that uses two assembly lines to produce articles of clothing. She has separated the the process of manufacturing an item of clothing into n steps, so each assembly line is separated into n different stations, with each station performing a specific task (So for example station three’s job may be to add a right sleeve to shirts). The task of a specific station is independent of which line the station occurs on (so if station three’s job is to add a right sleeve to shirts, this will be true in both assembly line 1 and assembly line 2). Lets denote the jth station (with j = 1, 2, …, n) on line i (where i is 1 or 2) by Si, j. Although they’re doing the same jobs the time it takes the employee at station S1, j may be different from the time it takes the employee at station S2, j. We will denote the time required at station Si, j by ai, j. For each line, there is also an amount of time required for the article of clothing to enter assembly line i, ei; and an amount of time required for the article of clothing to exit assembly line i, xi.

One of the reasons that assembly lines are very productive is that stations on the same assembly line are generally in close proximity to one another, resulting in a very low cost of transferring an item from one station to the next on the same assembly line. When we have multiple lines in place, as Keisha has, there is a (possibly beneficial) cost of transferring an item from one line to another. Lets denote this cost by ti, j which represents the cost of transferring a partially completed item of clothing from line i after having gone through station Si, j (again, i is 1 or 2 and j = 1, 2, …, n).

The problem that Keisha would like solved is to determine which station to choose between lines 1 and 2 in order to minimize the total time it takes to produce an article of clothing.

Consider the following example:

Assembly Line Example with 3 Stations

Our goal is to get the clothing through the 3 states to produce a final product. What if we initially had the product take the route through station S2, 1 instead of station S1, 1? Lets assume that we make the decisions to send the article of clothing to stations S2, 2 and S2, 3 afterwards. This would result in a solution whose total cost is 3 + 8 + 4 + 6 + 3 = 24. Is this solution optimal (aka is this solution the minimum total time through the factory)? Lets consider what would happen if we had chosen station S1, 1 instead of S2, 1. The entry cost for line 1 is 1, the time required at station S1, 1 is 5 and the transfer time to go to assembly line 2 is 1. So the cost of this new solution is 1 + 5 + 1 + 4 + 6 + 3 = 20, which gives a cheaper solution.

This is called the principle of optimality (optimal substructure property) which states that in order for an overall solution to be optimal, the solution must also give the optimal solutions to every subproblem of the original problem. This problem of solving all subproblems may seem like a daunting task at first, but lets consider the example above again.

Initially, we have a new product and there are two options – either line one or line two. We will need these values in the future, so lets keep track of both choices in the form of a table.

Station 1
cost0 e1 + a1, 1
cost1 e2 + a2, 1

After this initial step, the question becomes given the current path to station j-1, which assembly line can best serve station j? This cam be computed for each j > 1 by
cost1(j) = min{cost1(j-1) + a1, j, cost2(j-1) + t2, j-1 + a1, j}
cost2(j) = min{cost2(j-1) + a2, j, cost1(j-1) + t1, j-1 + a2, j}

As you can see, the calculation of costi(j) relies on the computation of costi(j-1). By calculating these values from station 1 to to station n, we are able to simply look up the values in the table instead of having to recalculate these values.

These give optimal solutions to each of the subproblems. We repeat this same step for all stages j = 2, …, n then we arrive at the final step were we finish the job. Lets define total_cost to indicate the cost of the optimal solution.
total_cost = min{cost1(n) + x1, cost2(n) + x2}

We’d like to see which value minimizes total_cost. Then we can trace back to find the values that minimized cost1 or cost2 at each step depending on which assembly line was chosen. The following algorithm does just this, and stores the assembly line chosen at each state in the variable line.

For the above example, the table would be calculated as follows:

Station 1 Station 2 Station 3 Total Cost
cost1 6 13 18 21
cost2 11 11 17 20

We can reconstruct the optimal path through assembly lines by seeing that we finish by going through station S2, 3.
We arrive at station S2, 3 by going through the assembly line station S2, 2.
We arrive at station S2, 2 by going through the assembly line station S1, 1.

This is precisely the path that is highlighted in the image above.

The algorithm to construct these paths and compute the total_cost for such problems is given below.

Algorithm FastestWay(a, t, e, x, m)
cost1 [<-] e1 + a1, 1
cost2 [<-] e2 + a2, 1
for (j [<-] 2 to n)
if (cost1(j-1) + a1, j [<=] cost2(j-1) + t2, j-1 + a1, j
cost1(j) [<-] cost1(j-1) + a1, j
line1(j) [<-] 1
cost1(j) [<-] cost2(j-1) + t2, j-1 + a1, j
line1(j) [<-] 2

if (cost2(j-1) + a2, j [<=] cost1(j-1) + t1, j-1 + a2, j
cost2(j) [<-] cost2(j-1) + a2, j
line2(j) [<-] 1
cost2(j) [<-] cost1(j-1) + t1, j-1 + a2, j
line2(j) [<-] 2

if (cost1(n) + x1 [<=] cost2(n) + x2)
total_cost = cost1(n) + x1
final_line = 1
total_cost = cost2(n) + x2
final_line = 2

For more information please refer to My Assembly Line Scheduling Examples Page.

Note: I used Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein to help with this post.