# Visualizing Huffman Coding Trees

Here is a link to a script I finished to help visualize the Huffman Coding Algorithm.

What would you do if you wanted to transfer a message, say one written in English but you only had a limited set of characters. Suppose these characters are 0 and 1. The only way of doing this is by writing some type of procedure to transfer from our 26 letter alphabet to the 0-1 binary alphabet. There are several ways of developing these encoding functions, but we will focus on those that attempt to translate each individual character into a sequence of 0s and 1s. One of the more popular such codes today is the ASCII code, which maps each character to a binary string (of 0s and 1s) of length 8. For example, here is the ASCII code for the upper and lower case alphanumeric characters.

 ASCII English 01100001 a 01100010 b 01100011 c 01100100 d 01100101 e 01100110 f 01100111 g 01101000 h 01101001 i 01101010 j 01101011 k 01101100 l 01101101 m 01101110 n 01101111 o 01110000 p 01110001 q 01110010 r 01110011 s 01110100 t 01110101 u 01110110 v 01110111 w 01111000 x 01111001 y 01111010 z

What you notice from this is that each of these encodings beings with “011”, which amounts to a lot of wasted space. ASCII code doesn’t care about this because the fixed length of each binary string allows for easy lookup of particular characters (i,e, you can start almost anywhere in the string with your decomposition as long as you start at a multiple of 8).

But what if we were interested in minimizing the total bits used by the encoded string? This is where the Huffman coding algorithm gains its fame. Unlike the ASCII coding scheme, Huffman codes assign shorter codes to the more frequently occurring characters in your string. Huffman was able to prove this tactic would guarantee the shortest possible encoding.

The Huffman Coding procedure operates as follows:
1. Input string to be encoded -> Input
2. For each character in the input string, calculate the frequency of that character (i.e. the number of times it occurs in the input)
3. Sort the array of characters in the input by their decreasing frequencies
4. Place the array of characters into the queue with each one represented by a node.
5. While there are two or more nodes remaining in the queue.
6. Remove the nodes representing the two characters with the lowest frequency from the queue.
7. Create a node which points to the two nodes just removed from the queue (node -> left points to one node; node -> right points to the other).
8. Insert this new node into the queue, with the frequency equal to the sum of the frequencies of the nodes it points to.
9. If the length of the queue is greater than 1, then goto 5.

Other Blogs that have covered this topic:
Techno Nutty
billatnapier

# Understanding Bayes’ Theorem

I’ve finished a script that helps understand Bayes’ Theorem.

If we have a set of mutually exclusive (aka non-overlapping) sets Bi for i {0, 1, 2, …, n} for some integer n, then the union of these sets forms a sample space. Lets call the sample space S. Suppose that we also have some set (also known as an event) A which is also a subset of S. Bayes’ Theorem considers the probability that one of these mutually exclusive events (one of the Bi‘s) caused the observed event (A).

This probability can be calculated by the formula

 Pr(Bj | A) = Pr(Bj) Pr(A | Bj) Pr(Bi) Pr(A | Bi)

The theorem helps us determine the the probability of the event Bj given A, or in more plain English, the probability that the event Bj is the cause that gives rise to the observed event A. The numerator is given by the product of of the probability of the causal event (Pr(Bj) times the conditional probability of the observed event given the causal event (Pr(A | Bj)). This numerator could be replaced by its equivalent statement of the set A Bj. Likewise, the denominator the sum (over all the causal events) of the probaility of each causal event times the conditional probability of the observed event given that particular causal event. Each term in this denominator could be replaced b its equivalent staetment A Bi, which when summed give the total probability of A because each pair of the Bj‘s is mutually exclusive. So we are able to replace the probability of A with Pr(Bi) Pr(A | Bi) because of the fundamental law of probability.

An example that would use Bayes’ Theorem is analyzing the results of an election. The set of mutually exclusive events could be membership in a political party (Democrat, Republican, or Independent). The observed event could be the election of an individual. And the conditional distributions could be the percentage of each party that voted for this individual. If we want to calculate how significant each party was to the individual’s election, we’d use Bayes’ Theorem.

The script I’ve written to help understand Bayes’ Theorem works as follows:
– A set of mutually exclusive sets is randomly generated (the number of sets also varies). These sets are called Bi for i (0, …, n}.
– A set A is randomly generated from the union of the Bi‘s.
– A table is displayed showing:
Pr(Bi) for each i on line 1.
Pr(A | Bi) for each i on line 2.

– The user is given the option to select which of the mutually exclusive sets they would like to use to calculate the probability that this set caused the event A.
– Once a set is chosen, the user clicks the “Calculate Conditional” button and Bayes’ Theorem gives the result.
– If the “show work” checkbox was checked, then the steps used in this calculation are also shown.
– All work is done using fractions to give an idea of where the numbers come from.

Other Blogs that have covered this topic:
Better Explained
Bayes’ Theorem-qed