Tag Archives: learning

Hierarchical Clustering

Hierarchical Clustering algorithms give a nice introduction for computer science students to unsupervised machine learning. I say this because the bottom-up approach to Hierarchical clustering (which I have implemented here) is very similar to Kruskal’s algorithm for finding the minimum spanning tree of a graph.

In Kruskal’s algorithm, we begin by creating a forest, or a set of trees where each node is its own tree. The algorithm then selects the two trees that are closest together (closest being defined as the minimum cost edge between two distinct trees) and merges those trees together. This process of merging the closest two trees is then repeated until there is only one tree remaining, which is a minimum spanning tree of the graph.

Similarly, bottom-up hierarchical clustering of a group of points begins by saying that each point is its own cluster. Then the clusters are compared to one another to check if two clusters will be merged into one. Generally, there will be some stopping criteria, , saying that we do not want to merge two clusters together if their distance is greater than . So if the minimum distance between two clusters is less than we will proceed as in Kruskal’s algorithm by merging these two clusters together into one cluster. We repeat this process of merging the closest two clusters together until we find that the minimum distance between two clusters is greater than or equal to , in which case we can stop and the result is a partition of our data set into distinct clusters.

Hierarchical clustering is comparable to K-Means Clustering. Here are some differences between the two approaches:

  1. K-Means Clustering requires an initial number of desired clusters, while Hierarchical clustering does not.
  2. A run of K-Means Clustering will always give K clusters, whereas Hierarchical Clustering can give more or less, depending on our tolerance .
  3. K-Means can undo previous mistakes (assignments of an element to the wrong cluster), while Hierarchical Clustering cannot.

So, here is a link to my page on Hierarchical Clustering. Hope you enjoy.

Introduction to JavaScript Programming

Here is a link to my sample JavaScript code.

I received a lot of attention from friends interested in programming after my recent blog post entitled “Introduction to Python Programming”. While many found it interesting, the fact that Python is more useful to mathematicians hindered sine of my friends desire to learn it as their first language.

In out conversations, my recommendation for a first language was JavaScript. This is a powerful language in the sense that just about anybody who is involved with the internet knows it, and it’s likely to boost a person’s resume. It also has many similarities to more powerful languages like C++ and Java, so while not trivial, it could be a good launch pad into more advanced languages. But my favorite reason is that unlike many other programming languages that rely in an MS-DOS like command like approach for run time interaction, JavaScript’s basic interaction is with the standard internet browsers we use everyday. There isn’t even anything you need to download or install. Just create a basic HTML file in a text editor (like notepad, wordpad, or notepad++). This makes it easier to show off your creations which makes learning more fun.

The script I’ve finished provides examples on writing output, declaring variables, data types, conditionals, loops, and functions. Although I do not go into detail about all the events and objects on an HTML page, I do finish with three examples of more advanced JavaScript programs. Once you’ve selected a program, the code well be revealed in the text area. There is also a button that, when clicked, will execute that script on a new HTML tab.

I hope you enjoy, and let me know if you have any suggestions or comments.

With that being said, here is a link to my sample JavaScript code.

Learning About Truth Tables

Here is a link to my sample Truth Table Generator.

Truth Tables Generator Image

An important part of mathematics and computer programming is understanding conditional expressions. These are statements that generally read like “if [condition is true], then [execute a sequence of statements]“. A simple example of this is that if we wanted to print out every even number, our conditional would be if (2 divides into x with remainder 0) then print x, or in JavaScript

if (x % 2 == 0)
{
document.write(“x”);
}

Conditional expressions belong to the world of Boolean logic. These are expressions that evaluate to true or false, depending on the values of the variables involved in this expression. When we are dealing with real world examples, this is generally a statement like “X is an even number” (for some number X) or “The element x is in the set Y” (for some element x and some set Y). Notice that both the statements can be evaluated as true or false statements. We are interested in understanding the Boolean logic behind combining a number of these expressions, and understanding how the evaluation of the simpler expressions help determine the values of the more complex formulas.

One way of doing this in mathematics is by constructing a truth table. A truth table is a table that shows how a Boolean expression’s value can be computed. The procedure in constructing a truth table is to first add a column to the table for each variable involved in the expression. Then we compute the value of each sub-expression of the expression in its own column until we have computed the entire expression in the final column.

There are four logical operators that we will be working with

– The negation operator (¬P), which returns true if the variable P is false, and returns false otherwise.
– the or operator (P ? Q), which returns true if P is true or Q is true, or if both are true, and returns false otherwise.
– the and operator (P ? Q), which returns true if both P and Q are true, and returns false otherwise.
– the implies operator (P ? Q), which returns false if P is true and Q is false, and returns true otherwise.

An example of this is below:

Suppose we have the following formula:
((Q ? P) ? (¬ (Q ? R)))

The truth table would then be:

P Q R (Q ? R) (¬ (Q ? R)) (Q ? P) ((Q ? P) ? (¬ (Q ? R)))
F F F F T F T
F F T F T F T
F T F F T F T
F T T T F F T
T F F F T F F
T F T F T F F
T T F F T T T
T T T T F T T

First, notice that the truth table has 8 rows. This corresponds to the 8 distinct possible combinations of values for the three variables. The number 8 is also = 23, which is not a mistake. In general, if an expression has n variables, its corresponding truth table will contain 2n rows. In this truth table, column 4 represents the sub-expression (Q ? R). Notice that the only times this expression evaluates to true is when both Q nd R are true. The next column is (¬ (Q ? R)), the negation of (Q ? R), so the values in this column are the opposite of those in the previous column. We follow that up with the column for the sub-expression (Q ? P), which has a value of true only when the variables Q and P both have values of true. And the final column is the expression we started with ((Q ? P) ? (¬ (Q ? R))), but we can now evaluate the expression based on the two previous columns. Notice that the values of true in this column only correspond to when at least one of the previous two columns evaluated to true.

Check out my script on truth tables to see more examples and learn more about truth tables.

Introduction to Python Programming

Here is a link to my sample Python code.

One of the somewhat unforeseen consequences of taking a career in applied mathematics, particularly in this day and age, is that you will eventually need to write computer programs that implement the mathematical algorithms. There are several languages in which one can do this, each with its own positives and negatives and you will find that things that are simple in some are difficult in another. After speaking with a number of people, both students and professionals who work with mathematics on a regular basis, I reasoned that it may be helpful to provide some source code examples to help mathematicians get started with programming in some of these languages. I decided to start with Python because its a powerful language, available for free, and its learning curve isn’t too steep.

I will be working with Python 2.7, which can be downloaded from https://www.python.org/download/releases/2.7/. I understand, however, that a limitation to coding is the required setup often necessary before one can even write their first line of code. So while I do encourage you to download Python, I will also provide a link to the online Python compiler at Compile Online, which should allow users to simply copy and paste the code into a new tab in their browser and by simply clicking the “Execute Script” command in the upper left corner, see the output of the code.

With that being said, here is a link to my sample Python code.

Hidden Markov Models: The Backwards Algorithm

I just finished working on LEARNINGlover.com: Hidden Marokv Models: The Backwards Algorithm. Here is an introduction to the script.

Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the number of loaded die are the latent variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o1, o2, …, oT, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:

  • The N states of the model, defined by S = {S1, …, SN}
  • The M possible output symbols, defined by = {1, 2, …,M}
  • The State transition probability distribution A = {aij}, where aij is the probability that the state at time t+1 is Sj, given that the state at time t is Si.
  • The Observation symbol probability distribution B = {bj(k)} where bj(k) is the probability that the symbol k is emitted in state Sj.
  • The initial state distribution = {i}, where i is the probability that the model is in state Si at time t = 0.

The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.

  • How often does the casino change dice?
    • 0) Dealer Repeatedly Uses Same Dice
    • 1) Dealer Uniformly Changes Die
    • 2) Dealer Rarely Uses Same Dice
  • Which sides on the loaded dice are more likely?
    • 0) Larger Numbers Are More Likely
    • 1) All Numbers Are Randomly Likely
    • 2) Smaller Numbers Are More Likely
How often does the casino change dice?
Which sides on
the loaded dice
are more likely?
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

One of the interesting problems associated with Hidden Markov Models is called the Evaluation Problem, which asks the question “What is the probability that the given sequence of observations O = o1, o2, …, oT are generated by the HMM . In general, this calculation, p{O | }, can be calculated by simple probability. However because of the complexity of that calculation, there are more efficient methods.

The backwards algorithm is one such method (as is the forward algorithm). It creates an auxiliary variable t(i) which is the probability that the model has generated the partially observed sequence ot+1, …, oT, where 1 t T. This variable can be calculated by the following formula:

t(i) = j = 1 to N(t+1(j) * aij * bj(ot+1))

We also need that T(i) = 1, for 1 i N.

Once we have calculated the t(j) variables, we can solve the evaluation problem by p{O | } i = 1 to N1(i)

There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Backwards Algorithm.

Some further reading on Hidden Markov Models:

Approximating the Set Cover Problem

Set Cover Problem Instance

I just finished my weekly task of shopping for groceries. This can be a somewhat daunting task because I generally have a list of things that I’ll need which cannot all be purchased at a single location. What often happens is that I find that many of the items on my list are ONLY offered at certain stores – generic brands of certain items for example. My goal then changes from minimizing the total amount of money spent to minimizing the number of stores that I must visit to purchase all of my items.

To formulate this as a mathematical problem, suppose that I have a grocery list of items I would like to buy, represented by the lists item1, item2, …, itemn, where n represents the number of items I have on this list. Suppose also that there are stores Store1, Store2, …, Storem (each one distinct) that offer some combination of items I have on my list. What I would like to do is minimize the number of stores I have to visit to purchase these items.

The problem I just described is famous because it is one that many people face on a regular basis. In a more general form, it is so famous that it has a name for it, called the Set Cover Problem (or the Minimum Set Cover Problem). In the general form of this problem, we replace the grocery list with a set of items called our universe. The lists of items offered at each store are the collections of subsets of the universe. In the problem, as in the example above, we would like to select enough subsets from this collection that we are able to obtain every element in our universe. We would like to do this with as low a number of sets as possible.

In my previous post, I described the 21 problems that Karp proved were NP-Complete. Set Cover was one of those problems, showing that this is a hard problem to solve. What I will do is introduce three ways to reach a near-optimal solution relatively quickly.

Greedy Method

One of the first approaches one may take to solve this problem is to repeatedly select the subset that contains the most new items. That’s how the greedy approach to set cover operates. The method knows to terminate when all elements belong to one of the selected sets. In the shopping example above, this would be accomplished by visiting the store that had the most items on my list and purchasing those items at this store. Once this is done, the items that have been purchased can be crossed off my list and we can visit the store with the most items on my remaining list, stopping when the list is empty.

Linear Programming Relaxation

Instead of stating the set cover problem with words, there is a way of describing the situation with mathematical inequalities. For instance, suppose that the soap I like to purchase is only available at stores Store1, Store4 and Store9. Then I could introduce a variable xi for each store i and the requirement that I purchase this soap can be restated as :

x1 + x4 + x9 greater than or equals 1

Because we can either purchase some items or not purchase these items, each variable xi is 0 or 1 (called a binary variable). We can introduce similar constraints for each element in our universe (or on our grocery list). These inequalities (called constraints) have the form:

for each element e in U, sumi | e in Si xi greater than or equals 1

Our goal of minimizing the number of sets chosen (stores visited) can be stated by the objective function:
minimize sum1 less than or equals i less than or equals n xi

So the mathematical formulation for this problem can be stated as

minimize sum1 less than or equals i less than or equals n xi
Subject to
for each element e in U, sumi | e in Si xi greater than or equals 1
for each set i, xi in {0, 1}.

Formulations of this type, where variables are restricted to a finite set (in this case the x variables being either 0 or 1) are called integer programs. Unfortunately, there is no easy way to solve these formulations either. However, there is a related problem which can be solved quickly.

Instead of restricting the x variables to the values of 0 or 1, we could allow them to take on any value within this range, i.e. 0 less than or equals xi less than or equals 1 for each set Si. Doing this converts the problem from an integer programming problem into a linear programming problem (called the LP-Relaxation), which can be solved quickly. The issue with this method though is that the solution obtained by an LP-Relaxation is not guaranteed to be an integer. In this case, how do we interpret the values xi?

Randomized Rounding Method

One approach to dealing with a non-integer solution to the LP-Relaxation is to treat the xi values as probabilities. We can say that xi is the probability that we select set i. This works because each value of xi is in the range of 0 to 1, which is necessary for a probability. We need to repeatedly select sets with their associated probabilities until all elements in our universe are covered. Selecting our sets based on this procedure is the randomized rounding approach.

Deterministic Rounding Method

A second approach to dealing with a non-integer solution to the LP-Relaxation is to base our solution on the most occurring element. If we let f be this frequency (i.e.the number of sets that the most occurring element occurs in), then we can define a solution by selecting set i if the LP=Relaxation solution gives the variable xi a value of at least (1/f).

None of these three approaches is guaranteed to give an optimal solution to an instance of this problem. I will not go into it in this post, but these can all be shown to be within some guaranteed range of the optimal solution, thus making them approximation algorithms.

You can see how the three algorithms compare on random problem instances here.

Hope you enjoy.

Triangle Trigonometry

Triangle Script Image

I haven’t forgotten about my pledge to focus more content here towards some of the areas I’ve been asked to tutor on recently. This latest one is designed to help users understand the properties of triangles. It is based on two laws that we learn in trigonometry: the law of sines and the law of cosines. Assume that we have a triangle with sides of lengths a, b, and c and respective angles A, B and C (where the angle A does not touch the side a, the angle B does not touch the side b, and the angle C does not touch the side c). These laws are as stated as follows:

Law of Sines

a
sin(A)
=
b
sin(B)
=
c
sin(C)

Law of Cosines
c2 = a2 + b2 – 2*a*b*cos(C)

We can use these laws to determine the sides of a triangle given almost any combination of sides and angles of that triangle (the only one we cannot determine properties from is if we are given all three angles, as this leads to many solutions).

The script generates random triangles, with different combinations of sides and angles revealed and the user’s job is to try to determine the missing sides. There is a button to reveal the solution, or if you’d like to see how we arrive at these values, you can check the “Show work” box.

Hope you enjoy.

Other Blogs covering this topic:
Mathematical!
Algebra 2 Trig

Sudoku Program Updates

Picture of Sudoku Page

Here in DC, we recently had an unexpected snow day. By the word unexpected, I don’t mean that the snow wasn’t forecast – it was definitely forecast. It just never came. However due to the forecast I decided to avoid traffic just in case the predictions were correct. So while staying at home, I began thinking about some things that I’ve been wanting to update on the site and one thing that came up was an update to my Sudoku program. Previously, it contained about 10000 sample puzzles of varying difficulty. However, I told myself that I would return to the idea of generating my own Sudoku puzzles. I decided to tackle that task last week.

The question was how would I do this. The Sudoku solver itself works through the dancing links algorithm which uses backtracking, so this was the approach that figured as most likely to get me a profitable result in generating new puzzles (I have also seen alternative approaches discussed where people start with an initial Sudoku and swap rows and columns to generate a new puzzle). The next question was how to actually implement this method.

Here is an overview of the algorithm. I went from cell to cell (left to right, and top to bottom starting in the top left corner) attempting to place a random value in that cell. If that value can be a part of a valid Sudoku (meaning that there exists a solution with the current cells filled in as is), then we continue and fill in the next cell. Otherwise, we will try to place a different value in the current cell. This process is continued until all cells are filled in.

The next step was to create a puzzle out of a filled in Sudoku. The tricky about this step is that if too many cells are removed then we wind up generating a puzzle that has multiple solutions. If too few cells are removed though, then the puzzle will be too easy to solve. Initially, I went repeatedly removed cells from the locations that were considered the most beneficial. This generally results in a puzzle with about 35-40 values remaining. To remove additional cells, I considered each of the remaining values and questioned whether hiding the cell would result in the puzzle having multiple solutions. If this was the case, then the cell value was not removed. Otherwise it was. As a result I now have a program that generates Sudoku puzzles that generally have around 25 hints.

You should give it a try.

The PageRank Algorithm

I think one of the best recent examples of the importance of mathematics is the rise of the search engine Google. I remember the world of search engines before Google and it was dominated by names like AltaVista, Yahoo, WebCrawler, Excite, and the likes. The standard way these search engines ranked the order that pages would be listed on a search query was basically to count the number of times that query appeared on pages in their database. The pages with the most listings were considered the most important, the second most listings were second most important and so on and so forth.

This sounds like a feasible way of doing things but let me show you an example of how this can be tricked. Suppose I wrote my first web page and it looked like the following:

That’s a basic web page that may not garner much attention, and it wouldn’t rank highly in most search engines as no work appears more than once. Suppose that, this being a math web page, I wanted it to rank higher on the query “math”. Then I could just edit the source code of the page to be as follows:

This second page says not much more than the first, but the fact that the word math appears 9 additional times would increase the ranking of this page among math pages. This is a very simple example, but it shows how these search engine rankings did not have a useful metric for determining the important sites on the web.

Enter Google.

The way Google solved this problem of determining the importance of a web page is basically by counting the number of links into a web page – the theory being that the more important a web page is, the more people will be talking about it and thus linking to it. Also, the more important the people talking about (linking to) a web site, the more important that site is. This can be expressed mathematically by the following formula:

In the above formula, the variable d is called the damping factor, which helps to capture some of the random nature of the internet by saying that every site should have at least some minimal worth because of the idea that a random surfer could still get to these sites.

I have written a script to implement the algorithm here.

Other Blogs that have covered this topic
Blue Onion

The Risk of Competition

The Risk of Competition

I’m not a competitive person. Let me correct that. I try not to be a competitive person. I’ve recently been playing some of my favorite games from childhood like Monopoly, Chess, Spades, and Madden and I’ve been reminded of the competitive streak in me that hates losing. This streak has been relaxed in much of my adult life, and I tend to think that’s been to my benefit. Two pieces come to mind as I write this. One is a piece written by Slim Jackson of Single Black Male on the importance of asking questions. The second is what I heard on WAMU’s “Tell Me More” while driving home from work regarding the competitive nature generally assumed by (or dictated to) men.

These two concepts go hand in hand in my opinion because I’ve found that at times where I see a person as “my competition” I’m less likely to seek advice or help from that person. Doing this limits my resources and the set of people who can help me. I know that one of the hardest things to do while playing a game like Madden is asking my competition “how’d you just make that play” or listen to this competition explain how he knew to make such a play. However, its just this ability to swallow my pride and ask these questions that I’ve gotten better in Madden. I will admit that I’m not the most humble person on the face of the earth and so there are some things that I haven’t asked how to do. Generally in those things, I’ve found myself repeatedly playing the game trying to figure things out for myself.

There are applications of both sides of this to the real world. Some who love competition will say that it brings out the best in us, and they’d be sure to point out the feeling of satisfaction we get by working independently on a problem. As I look through my lists of inspirational quotes, I’m reminded of this with often repeated statements like “failure is the key to success”.

As rewarding as a competition can be, there’s also an important saying that we don’t need to re-invent the wheel. And what often gets lost in the do-it-yourself nature of competition is the ability to utilize all possible resources available to us. In particular, the skills of how to acknowledge the things we do not understand and how to formulate questions aimed at gaining understanding.

This brings us to is the other side of the competitive spirit. As rewarding as it is to be able to say, “wow, I can’t believe I was able to figure that out on my own”, it is also a stressful situation and there are many who are never able to say those words. Should these people be satisfied with failure?

I do not ask this in some devil’s advocate type of way. I ask coming from the point of view of a mathematician, an educator, and as a former student. I had a pretty dark moment in graduate school where I realized that my mere “love” of mathematics would not get me through qualifying exams. It wouldn’t suddenly make text books and academic papers instantly understandable. Suddenly I was placed in an uncomfortable position. Instead of always being the one who was the first to get the concept and who was leading the study groups on it, I’d be the one asking the questions. Looking at this in a “competitive” frame of mind (as I did then), I felt like I was losing the game.

This same moment though, is where my thinking was really changed. There was one thing, and one thing only that I would consider a failure and that was not finishing. I view everything else as a matter of swallowing my pride and readjusting my thought process to help get to that point.

Unfortunately though, many others do not get to this point. Many get lost in the scramble of the competition and do the equivalent of folding your hand in poker. They realize that at their current pace there is little to no chance that they’d win and so they just leave the game. And this is a real risk that we’re running with competition, particularly as STEM fields are becoming more and more important and we’re trying to encourage students to focus on these areas. What may be necessary to bring this about is a more cooperative approach to these things.