Tag Archives: algorithm

ID3 Algorithm Decision Trees

ID3 Algorithm Page Image

As I grow LEARNINGlover.com, I’m always thinking of different ways to expose my own personality through the site. This is partially because it is easier for me to talk about subjects where I am already knowledgeable, but it is more-so being done to help make some of these algorithms and concepts I encode more understandable, and sometimes relating foreign concepts to everyday life makes them easier to understand.

Today, I’d like to write about decision trees, and the ID3 algorithm for generating decision trees in particular. This is a machine learning algorithm that builds a model from a training data set consisting of a feature vector and an outcome. Because our data set consists of an outcome element, this falls into the category of supervised machine learning.

The model that the ID3 algorithm builds is called a decision tree. It builds a tree based on the features, or columns of the data set with a possible decision corresponding to each value that the feature can have. The algorithm selects the next feature by asking “which feature tells me the most about our data set?” This question can be answered first by asking how much “information” is in the data set, and then comparing that result with the amount of information in each individual feature.

In order to execute this algorithm we need a way to measure both the amount the information in outcomes of the overall data set as well as how much each feature tells us about the data set. For the first, we will use entropy, which comes from the field of information theory and encoding. Entropy is based on the question of how many bits are necessary to encode the information in a set. The more information, the higher the entropy, and the more bits required to encode that information. Although we are not encoding, the correlation between high information and high entropy suits our purposes.

Entropy Formula

To understand how much each feature tells us about the outcomes of the data set we will build on the concept of entropy to define the information gain of a feature. Each feature has multiple options, so the dataset can be partitioned based on each possible value of this feature. Once we have this partition, we can calculate the entropy of each subset of the rows of data. We define the information gain of a feature as the sum over all possible outcomes of that feature can have of the entropy of that outcome multiplied by the probability of that outcome.

Information Gain Formula

To illustrate this algorithm, I decided to relate it to the question of whether we think of a character in a novel as a hero or villain. This is interesting because I try to read at least one book a month and as I’m reading, I often find myself asking this question about characters based on the traits of the characters as well as characters I’ve read about. In order to build an interactive script for this problem, I considered 25 possible character traits that could be present. A subset of these 25 character traits will be selected and a row will be generated grading a fictional character on a scale of 0 to 3 (0 meaning that they do not possess the trait at all, 3 meaning that the trait is very strong in their personality), and users will be asked whether they think a character with the given character traits should be listed as a hero or a villain. Then there is a button at the bottom of the script with the text “Build Tree” that executes the ID3 Algorithm and shows a decision tree that could be used to reach the set of decisions given by the user.

The possible features are:
Abstract, Adaptable, Aggressive, Ambition, Anxiety, Artistic, Cautious, Decisive, Honesty, Dutiful, Fitness, Intellect, Independent, Introverted, Lively, Open-minded, Orderly, Paranoid, Perfectionist, Romantic, Sensitive, Stable, Tension, Warmth and Wealthy

Once users select the option to build the tree, there will be several links outlining each step in the process to build this tree. These links will allow for users to expand the information relating to that step and minimize that information when done. Hopefully this will help users to understand each step more. I must say that as much fun as it has been writing this program, there were several questions when trying to explain it to others. Hopefully users get as much fun from using this tool as I had in creating it. As always feel free to contact me with any comments and or questions.

Ok, so here’s a link to the ID3 Algorithm Page. Please check it out and let me know what you think.

Clique Problem Puzzles

I still remember how I felt when I was first introduced to NP-Complete problems. Unlike the material I had learned up to that point, there seemed to be such mystery and intrigue and opportunity surrounding these problems. To use the example from Garey and Johnson’s book “Computers and Intractability: A Guide to the Theory of NP Completeness”, these were problems that not just one researcher found difficult, but that a number of researchers had been unable to find efficient algorithms to solve them. So what they did was show that the problems all had a special relationship with one another, and thus through this relationship if someone were to discover an algorithm to efficiently solve any one of these problems they would be able to efficiently solve all the problems in this class. This immediately got my mind working into a world where I, as a college student, would discover such an algorithm and be mentioned with the heavyweights of computer science like Lovelace, Babbage, Church, Turing, Cook, Karp and Dean.

Unfortunately I was a student so I did not have as much time to devote to this task as I would have liked. In my spare time though I would try to look at problems and see what kind of structure I found. One of my favorite problems was, The Clique Problem. This is a problem where we are given an undirected graph and seek to find a maximum subset of nodes in this graph that all have edges between them, i.e. a clique (Actually the NP-Complete version of this problem takes as input an undirected graph G and an integer k and asks if there is a clique in G of size k).

Although I now am more of the mindset that there do not exist efficient algorithms to solve NP-Complete problems, I thought it would be a nice project to see if I could re-create this feeling – both in myself and others. So I decided to write a program that generates a random undirected graph and asks users to try to find a maximum clique. To test users answers, I coded up an algorithm that works pretty well on smaller graphs, the Bron-Kerbosch Algorithm. This algorithm uses backtracking to find all maximal cliques, which then allows us to sort them by size and determine the largest.

Users should click on the numbers in the table below the canvas indicating the nodes they wish to select in their clique (purple indicates that the node is selected, gray indicates that it is not). Once they have a potential solution, they can press the “Check” button to see if their solution is optimal. If a user is having trouble and simply wishes to see the maximum clique, they can press the “Solve” button. And to generate a new problem, users can press the “New Problem” button.

So I hope users have fun with the clique problem puzzles, and who knows maybe someone will discover an algorithm that efficiently solves this problem and become world famous.

Unidirectional TSP Puzzles

As we’ve entered the late spring into early summer season, I’ve found myself wanting to go out more to sit and enjoy the weather. One of these days recently I sat in the park with a good book. On this occurrence, I decided not to go with a novel as I had just finished “Incarceron“, “The Archer’s Tale“, and “14 Stones” – all of which were good reads, but I felt like taking a break from the novels.

Just as a side note, 14 Stones is a free book available on smashwords.com and I’ve now read about 6 books from smashwords.com and haven’t been disappointed yet. My favorite is still probably “The Hero’s Chamber” because of the imagery of the book, but there are some well written ebooks available there by some good up and coming writers for a reasonable price, with some being free.


So with the desire to read, but not being in the mood for novels I decided to pick up one of my non-text but still educational books that make me think. This day it was “Programming Challenges“. I browsed through the book until I found one that I could lay back, look at the water, and think about how to solve it.


The programming puzzle the peaked my interest was called “Unidirectional TSP”. We are given a grid with m rows and n columns, with each cell showing the cost of using that cell. The user is allowed to begin in any cell in the first column and is asked to reach any cell in the last column using some minimum cost path. There is an additional constraint that once a cell is selected in a column, a cell in the next column can only be chosen from the row directly above, the same row, or the row directly below. There is a javascript version of this puzzle available here.

Fundamentally, the problem is asking for a path of shortest length. Many shortest length problems have a greedy structure, but this one gained my interest because the greedy solution is not always optimal in this case. So I took a moment to figure out the strategy behind these problems. Once I had that solution, I decided that it would be a good program to write up as a puzzle.


In this puzzle version, users will click the cells they wish to travel in each column in which case they will turn green (clicking again will turn them white again). Once the user clicks on a cell in the last column, they will be notified of whether or not they have chosen the minimum path. Or if users are unable to solve a puzzle, the “Solution” button can be pressed to show the optimal path and its cost. 

The Bridge Crossing Problem

Most puzzles are fun in their own right. Some puzzles are so fun that they have the added benefit that they are likely to come up in unexpected places, like maybe in a job interview. I was recently reading a paper by Günter Rote entitled “Crossing the Bridge at Night” where Rote analyzes such a puzzle. Upon finishing the paper, I decided to write a script so that users could see the general form of this puzzle.

The problem can be stated as follows: There is a set of people, lets make the set finite by saying that there are exactly n people, who wish to cross a bridge at night. There are a few restrictions that make crossing this bridge somewhat complicated.

  • Each person has a travel time across the bridge.
  • No more than two people can cross the bridge at one time.
  • If two people are on the bridge together, they must travel at the pace of the slower person.
  • There is only one flashlight and no party (of one or two people) can travel across the bridge without the flashlight.
  • The flashlight cannot be thrown across the bridge, and nobody can go to the store to purchase another flashlight

The image above shows the optimal solution when the 4 people have travel times of 1, 2, 5, and 10. The script I have written allows users to work with different numbers of people with random travel times. Give it a try and see if you can spot the patterns in the solution.

Magical Squares Game

Whether introduced as children in elementary school, as adults in the workplace, or somewhere in between, the concept of magic squares has fascinated people for centuries; The Wikipedia article has discoveries of magic squares dating back to 650 B.C. in China.

Magic Squares of size n (for n >= 3) are n by n grids where the numbers 1, 2, …, n2 are specially arranged such that the sum of each row, column, and diagonal all sum to the same number. Below are two examples of magic squares of size 3 and 4.

8 3 4
1 5 9
6 2 7
4 1 16 13
15 11 2 6
10 14 7 3
5 8 9 12

Notice first, that in the first square all the numbers 1, 2, 3, .., 9 are used in the square. Likewise, the numbers 1, 2, …, 15, 16 are used in the second square. Second notice that each row, column and diagonal sums to 15 in the first square and 34 in the second square.

I have recently published a puzzle that is based on the concept of magic squares. There are some slight differences though.
– First, instead of using the numbers 1, 2, …, n2 a random set of numbers are generated.
– Second, the rows and columns each have a desired sum that we would like the numbers in the row/column to sum to.

These two changes allow for a puzzle concept to be formulated based on the magic square concept. Users take turns swapping elements until the numbers in each row and column sum to the number in their goal cell, which is located in the last column or row of the grid. Above is an image of a solved puzzle.

Users can determine their progress the numbers in the next-to-last column, which tells the current sum of the numbers in that respective row or column.

To swap two numbers, first click on a cell with one of the two numbers in it. The cell should then turn red. Then click on the cell with the other number in it and the numbers should swap. If you click on the same cell twice no action should take place (except for the cell to turn red and then blank again).

Take a moment to check out the puzzles and let me know what you think.

The A* Algorithm

As a child I remember traveling on road trips, sitting in the back of a car trying to do my best to keep myself busy for what would occupy the next six to ten hours of my life. One of the things I grew to like were the simple maze books that were sold on the magazine racks at some of the gas stations where we’d stop for food. There are two basic strategies I employed for solving these mazes: For simpler mazes, I could generally just take a look at the overall maze structure, decide upon a path through the maze, then write a path without any mistakes. For more complex mazes though, I would generally begin a route that looks the most promising. If that route reaches a point where I can see that it will be impossible to finish, then I’d go back to where the decision was made, exclude the route I had just tried, and select the “most promising” remaining route. I’d continue this process until I had completed the maze, or until it became simple enough for me to solve the maze using only my memory.

The A* Algorithm works in a similar manner to the second approach I just described. We begin at a starting point, and consider where to move next from that starting point. The set of possible options for this next move is determined by the neighbor function for a given cell. For each neighbor the algorithm estimates the length of the route through that cell by calculating the sum of two values, f(n) = g(n) + h(n), where

g(n) is the (known) cost to travel from the starting point to the cell n.
h(n) is the (approximate) cost to travel from the cell n to the final cell.

The sum g(n) + h(n) allows us to approximate the total cost of a route through the cell n.

The A* algorithm begins at the starting position of the maze. There are two sets we will be considering throughout the process of determining the optimal route, called the closedSet and the openSet. The elements of closedSet are the nodes whose total distance from the starting position has been calculated, whereas the elements of openSet represents nodes whose total distance is still under consideration. There is also a map called prev which is used to reconstruct the path. Below is how the algorithm operates:

A* Algorithm Pseudocode
closedSet is the empty set
openSet = {start}
prev is the empty set

While there are still elements in openSet,
     Find the element c* in openSet with the lowest value f(c).
     If c* is the target position
          Reconstruct the path.
     Else If c* is not the target position,
          Remove c* from openSet
          Add c* to closedSet
          For each neighbor n of c* that is not in closedSet,
               Calculate a temporary g value, temp_g(n) = g(c*) + dist(c*, n).
               If n is not in openSet, or if n is in openSet and temp_g(n) < g(n),                     Set prev(n) = c*                     Set g(n) = temp_g(n)                     Set f(n) = g(n) + h(n).                     if n is not in openSet                          add n to openSet.      End If End While End Algorithm An important question becomes what makes a good heuristic function to approximate the distance to the goal. This can lead to an in depth discussion based on the word "good", but the necessary condition for any heuristic is that it NEVER over-estimates the cost of the path from the cell to the goal. Some examples of possible heuristics for mazes are the Euclidean Distance (the square root of the sum of the squares of the horizontal and vertical differences in distances, i.e. dE(x, y) = (i(xi – yi)2) and TaxiCab Distance (the sum of the differences in the horizontal and vertical dimensions, i.e. dT(x, y) = i|xi – yi|). Both of these are feasible metrics for heuristics on a maze. Other herusitics, like h(n) = 0 for all n are possible, but doing this would make the algorithm treat all cells equally and ignore the heuristic part of the A* algorithm, turning it into Dijkstra’s algorithm.

I’ve written a script that generates random mazes and uses the A* algorithm to find the optimal path through this maze. For this script, I used the taxicab distance heuristic.

Check it out and let me know what you think.

Naive Bayesian Classification

Suppose there is a sequence of events that took place e1, …, ene, with each event belonging to a certain classification group g1, …, gng. Then the problem of determining which of these groups a new event belongs is the classification problem.

A naive Bayes classifier will determine to which of the possible classification groups a new observation belongs with the (naive) assumption that every feature of this new observation has no relationship to any other feature of this observation.

This assumption of independence of the columns of the feature vector allows us to use Bayes’ Theorem to determine the probability that the new observation will belong to each of the observation groups. Bayes’ Theorem will then say that the probability this new observation belongs to a classification group, given the features is equal to the probability of the occurrence of that classification group in the observed data (i.e. P(C)) multiplied by the conditional probability of the joint distribution of the features given the same classification group P(F1, …, Fnf. The naive assumption allows us to quickly calculate the joint distribution of the features, given the classification group as the product of each feature given that same classification group.

This can be written as:

P(C | F1, …, Ffn) =
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)
= P(C) P(F1 | C) * … * P(Ffn | C
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)
= P(C) i = 1 to nfP(Fi | C)
P(C) P(F1, …, Ffn | C)
P(F1, …, Ffn)

So suppose we have observations that give the following data:
P(F1 | N) = 0.286
P(F2 | N) = 0.143
P(F3 | N) = 0.429
P(F4 | N) = 0.429

P(F1 | Y) = 0.143
P(F2 | Y) = 0.571
P(F3 | Y) = 0.571
P(F4 | Y) = 0.571

P(Y) = 0.5
P(N) = 0.5

Then P(N | F1, F2, F3, F4) = (0.5 * 0.286 * 0.143 * 0.429 * 0.429)
= 0.008

Then P(Y | F1, F2, F3, F4) = (0.5 * 0.143 * 0.571 * 0.571 * 0.571)
= 0.001

After we normalize the two terms, we wind up with Then P(N | F1, F2, F3, F4) = .2204


Then P(Y | F1, F2, F3, F4) = .7796

So the naive Bayes classifier says that the likely classification group for this observation is Y.

Check out my examples page for more examples of the naive Bayes classifier.

Hierarchical Clustering

Hierarchical Clustering algorithms give a nice introduction for computer science students to unsupervised machine learning. I say this because the bottom-up approach to Hierarchical clustering (which I have implemented here) is very similar to Kruskal’s algorithm for finding the minimum spanning tree of a graph.

In Kruskal’s algorithm, we begin by creating a forest, or a set of trees where each node is its own tree. The algorithm then selects the two trees that are closest together (closest being defined as the minimum cost edge between two distinct trees) and merges those trees together. This process of merging the closest two trees is then repeated until there is only one tree remaining, which is a minimum spanning tree of the graph.

Similarly, bottom-up hierarchical clustering of a group of points begins by saying that each point is its own cluster. Then the clusters are compared to one another to check if two clusters will be merged into one. Generally, there will be some stopping criteria, , saying that we do not want to merge two clusters together if their distance is greater than . So if the minimum distance between two clusters is less than we will proceed as in Kruskal’s algorithm by merging these two clusters together into one cluster. We repeat this process of merging the closest two clusters together until we find that the minimum distance between two clusters is greater than or equal to , in which case we can stop and the result is a partition of our data set into distinct clusters.

Hierarchical clustering is comparable to K-Means Clustering. Here are some differences between the two approaches:

  1. K-Means Clustering requires an initial number of desired clusters, while Hierarchical clustering does not.
  2. A run of K-Means Clustering will always give K clusters, whereas Hierarchical Clustering can give more or less, depending on our tolerance .
  3. K-Means can undo previous mistakes (assignments of an element to the wrong cluster), while Hierarchical Clustering cannot.

So, here is a link to my page on Hierarchical Clustering. Hope you enjoy.

Dots and Boxes Game

Dots and Boxes Game

When I was in high school, one of my favorite ways to waste time in class (not recommended) was to play a game called dots and boxes (although at the time we just called it dots). I was very surprised to find later that this game belongs to a class of games called “Impartial Combinatorial Games”. These are games where the moves available to the player depend only on the position of the game, and not the player.

In a game of Dots and Boxes, we start with an initial grid with dots at each row and column intersection. At each player’s turn, they have the option of drawing either a horizontal or vertical line between two neighboring dots (depending on if the dots are in the same row or column). If a player fills in the last line on a box (the 4th side), we say that player “owns” the box. The game ends when there are no neighboring dots without a line between them. At the conclusion of the game, the player who owns the most dots is declared the winner.

The game is impartial because there is no restriction on which move a player can make other than the fact that a player cannot re-do a move that has already been made (a partial version of this game would be if player one could only move horizontally and player two could only move vertically).

I have implemented a javascript version of this game. Check it out and let me know what you think.

I also spoke earlier about the discovery that this game in particular was an active area of research. I wanted to provide a link to a paper entitled “Solving Dots and Boxes” by Joseph K. Barker and Richard E Korf that speaks about winning strategies for each player in a game of dots and boxes.