Tag Archives: entropy

ID3 Algorithm Decision Trees

ID3 Algorithm Page Image

As I grow LEARNINGlover.com, I’m always thinking of different ways to expose my own personality through the site. This is partially because it is easier for me to talk about subjects where I am already knowledgeable, but it is more-so being done to help make some of these algorithms and concepts I encode more understandable, and sometimes relating foreign concepts to everyday life makes them easier to understand.

Today, I’d like to write about decision trees, and the ID3 algorithm for generating decision trees in particular. This is a machine learning algorithm that builds a model from a training data set consisting of a feature vector and an outcome. Because our data set consists of an outcome element, this falls into the category of supervised machine learning.

The model that the ID3 algorithm builds is called a decision tree. It builds a tree based on the features, or columns of the data set with a possible decision corresponding to each value that the feature can have. The algorithm selects the next feature by asking “which feature tells me the most about our data set?” This question can be answered first by asking how much “information” is in the data set, and then comparing that result with the amount of information in each individual feature.

In order to execute this algorithm we need a way to measure both the amount the information in outcomes of the overall data set as well as how much each feature tells us about the data set. For the first, we will use entropy, which comes from the field of information theory and encoding. Entropy is based on the question of how many bits are necessary to encode the information in a set. The more information, the higher the entropy, and the more bits required to encode that information. Although we are not encoding, the correlation between high information and high entropy suits our purposes.

Entropy Formula

To understand how much each feature tells us about the outcomes of the data set we will build on the concept of entropy to define the information gain of a feature. Each feature has multiple options, so the dataset can be partitioned based on each possible value of this feature. Once we have this partition, we can calculate the entropy of each subset of the rows of data. We define the information gain of a feature as the sum over all possible outcomes of that feature can have of the entropy of that outcome multiplied by the probability of that outcome.

Information Gain Formula

To illustrate this algorithm, I decided to relate it to the question of whether we think of a character in a novel as a hero or villain. This is interesting because I try to read at least one book a month and as I’m reading, I often find myself asking this question about characters based on the traits of the characters as well as characters I’ve read about. In order to build an interactive script for this problem, I considered 25 possible character traits that could be present. A subset of these 25 character traits will be selected and a row will be generated grading a fictional character on a scale of 0 to 3 (0 meaning that they do not possess the trait at all, 3 meaning that the trait is very strong in their personality), and users will be asked whether they think a character with the given character traits should be listed as a hero or a villain. Then there is a button at the bottom of the script with the text “Build Tree” that executes the ID3 Algorithm and shows a decision tree that could be used to reach the set of decisions given by the user.

The possible features are:
Abstract, Adaptable, Aggressive, Ambition, Anxiety, Artistic, Cautious, Decisive, Honesty, Dutiful, Fitness, Intellect, Independent, Introverted, Lively, Open-minded, Orderly, Paranoid, Perfectionist, Romantic, Sensitive, Stable, Tension, Warmth and Wealthy

Once users select the option to build the tree, there will be several links outlining each step in the process to build this tree. These links will allow for users to expand the information relating to that step and minimize that information when done. Hopefully this will help users to understand each step more. I must say that as much fun as it has been writing this program, there were several questions when trying to explain it to others. Hopefully users get as much fun from using this tool as I had in creating it. As always feel free to contact me with any comments and or questions.

Ok, so here’s a link to the ID3 Algorithm Page. Please check it out and let me know what you think.

Visualizing Huffman Coding Trees

Huffman Coding
Image of Huffman Tree

Here is a link to a script I finished to help visualize the Huffman Coding Algorithm.

What would you do if you wanted to transfer a message, say one written in English but you only had a limited set of characters. Suppose these characters are 0 and 1. The only way of doing this is by writing some type of procedure to transfer from our 26 letter alphabet to the 0-1 binary alphabet. There are several ways of developing these encoding functions, but we will focus on those that attempt to translate each individual character into a sequence of 0s and 1s. One of the more popular such codes today is the ASCII code, which maps each character to a binary string (of 0s and 1s) of length 8. For example, here is the ASCII code for the upper and lower case alphanumeric characters.

ASCII English
01100001 a
01100010 b
01100011 c
01100100 d
01100101 e
01100110 f
01100111 g
01101000 h
01101001 i
01101010 j
01101011 k
01101100 l
01101101 m
01101110 n
01101111 o
01110000 p
01110001 q
01110010 r
01110011 s
01110100 t
01110101 u
01110110 v
01110111 w
01111000 x
01111001 y
01111010 z

What you notice from this is that each of these encodings beings with “011”, which amounts to a lot of wasted space. ASCII code doesn’t care about this because the fixed length of each binary string allows for easy lookup of particular characters (i,e, you can start almost anywhere in the string with your decomposition as long as you start at a multiple of 8).

But what if we were interested in minimizing the total bits used by the encoded string? This is where the Huffman coding algorithm gains its fame. Unlike the ASCII coding scheme, Huffman codes assign shorter codes to the more frequently occurring characters in your string. Huffman was able to prove this tactic would guarantee the shortest possible encoding.

The Huffman Coding procedure operates as follows:
1. Input string to be encoded -> Input
2. For each character in the input string, calculate the frequency of that character (i.e. the number of times it occurs in the input)
3. Sort the array of characters in the input by their decreasing frequencies
4. Place the array of characters into the queue with each one represented by a node.
5. While there are two or more nodes remaining in the queue.
6. Remove the nodes representing the two characters with the lowest frequency from the queue.
7. Create a node which points to the two nodes just removed from the queue (node -> left points to one node; node -> right points to the other).
8. Insert this new node into the queue, with the frequency equal to the sum of the frequencies of the nodes it points to.
9. If the length of the queue is greater than 1, then goto 5.

Other Blogs that have covered this topic:
Techno Nutty
billatnapier