Tag Archives: learning

Hidden Markov Models: The Backwards Algorithm

I just finished working on LEARNINGlover.com: Hidden Marokv Models: The Backwards Algorithm. Here is an introduction to the script.

Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the number of loaded die are the latent variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables o1, o2, …, oT, where T is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:

  • The N states of the model, defined by S = {S1, …, SN}
  • The M possible output symbols, defined by = {1, 2, …,M}
  • The State transition probability distribution A = {aij}, where aij is the probability that the state at time t+1 is Sj, given that the state at time t is Si.
  • The Observation symbol probability distribution B = {bj(k)} where bj(k) is the probability that the symbol k is emitted in state Sj.
  • The initial state distribution = {i}, where i is the probability that the model is in state Si at time t = 0.

The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.

  • How often does the casino change dice?
    • 0) Dealer Repeatedly Uses Same Dice
    • 1) Dealer Uniformly Changes Die
    • 2) Dealer Rarely Uses Same Dice
  • Which sides on the loaded dice are more likely?
    • 0) Larger Numbers Are More Likely
    • 1) All Numbers Are Randomly Likely
    • 2) Smaller Numbers Are More Likely
How often does the casino change dice?
Which sides on
the loaded dice
are more likely?
(0, 0) (0, 1) (0, 2)
(1, 0) (1, 1) (1, 2)
(2, 0) (2, 1) (2, 2)

One of the interesting problems associated with Hidden Markov Models is called the Evaluation Problem, which asks the question “What is the probability that the given sequence of observations O = o1, o2, …, oT are generated by the HMM . In general, this calculation, p{O | }, can be calculated by simple probability. However because of the complexity of that calculation, there are more efficient methods.

The backwards algorithm is one such method (as is the forward algorithm). It creates an auxiliary variable t(i) which is the probability that the model has generated the partially observed sequence ot+1, …, oT, where 1 t T. This variable can be calculated by the following formula:

t(i) = j = 1 to N(t+1(j) * aij * bj(ot+1))

We also need that T(i) = 1, for 1 i N.

Once we have calculated the t(j) variables, we can solve the evaluation problem by p{O | } i = 1 to N1(i)

There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Backwards Algorithm.

Some further reading on Hidden Markov Models:

Approximating the Set Cover Problem

Set Cover Problem Instance

I just finished my weekly task of shopping for groceries. This can be a somewhat daunting task because I generally have a list of things that I’ll need which cannot all be purchased at a single location. What often happens is that I find that many of the items on my list are ONLY offered at certain stores – generic brands of certain items for example. My goal then changes from minimizing the total amount of money spent to minimizing the number of stores that I must visit to purchase all of my items.

To formulate this as a mathematical problem, suppose that I have a grocery list of items I would like to buy, represented by the lists item1, item2, …, itemn, where n represents the number of items I have on this list. Suppose also that there are stores Store1, Store2, …, Storem (each one distinct) that offer some combination of items I have on my list. What I would like to do is minimize the number of stores I have to visit to purchase these items.

The problem I just described is famous because it is one that many people face on a regular basis. In a more general form, it is so famous that it has a name for it, called the Set Cover Problem (or the Minimum Set Cover Problem). In the general form of this problem, we replace the grocery list with a set of items called our universe. The lists of items offered at each store are the collections of subsets of the universe. In the problem, as in the example above, we would like to select enough subsets from this collection that we are able to obtain every element in our universe. We would like to do this with as low a number of sets as possible.

In my previous post, I described the 21 problems that Karp proved were NP-Complete. Set Cover was one of those problems, showing that this is a hard problem to solve. What I will do is introduce three ways to reach a near-optimal solution relatively quickly.

Greedy Method

One of the first approaches one may take to solve this problem is to repeatedly select the subset that contains the most new items. That’s how the greedy approach to set cover operates. The method knows to terminate when all elements belong to one of the selected sets. In the shopping example above, this would be accomplished by visiting the store that had the most items on my list and purchasing those items at this store. Once this is done, the items that have been purchased can be crossed off my list and we can visit the store with the most items on my remaining list, stopping when the list is empty.

Linear Programming Relaxation

Instead of stating the set cover problem with words, there is a way of describing the situation with mathematical inequalities. For instance, suppose that the soap I like to purchase is only available at stores Store1, Store4 and Store9. Then I could introduce a variable xi for each store i and the requirement that I purchase this soap can be restated as :

x1 + x4 + x9 greater than or equals 1

Because we can either purchase some items or not purchase these items, each variable xi is 0 or 1 (called a binary variable). We can introduce similar constraints for each element in our universe (or on our grocery list). These inequalities (called constraints) have the form:

for each element e in U, sumi | e in Si xi greater than or equals 1

Our goal of minimizing the number of sets chosen (stores visited) can be stated by the objective function:
minimize sum1 less than or equals i less than or equals n xi

So the mathematical formulation for this problem can be stated as

minimize sum1 less than or equals i less than or equals n xi
Subject to
for each element e in U, sumi | e in Si xi greater than or equals 1
for each set i, xi in {0, 1}.

Formulations of this type, where variables are restricted to a finite set (in this case the x variables being either 0 or 1) are called integer programs. Unfortunately, there is no easy way to solve these formulations either. However, there is a related problem which can be solved quickly.

Instead of restricting the x variables to the values of 0 or 1, we could allow them to take on any value within this range, i.e. 0 less than or equals xi less than or equals 1 for each set Si. Doing this converts the problem from an integer programming problem into a linear programming problem (called the LP-Relaxation), which can be solved quickly. The issue with this method though is that the solution obtained by an LP-Relaxation is not guaranteed to be an integer. In this case, how do we interpret the values xi?

Randomized Rounding Method

One approach to dealing with a non-integer solution to the LP-Relaxation is to treat the xi values as probabilities. We can say that xi is the probability that we select set i. This works because each value of xi is in the range of 0 to 1, which is necessary for a probability. We need to repeatedly select sets with their associated probabilities until all elements in our universe are covered. Selecting our sets based on this procedure is the randomized rounding approach.

Deterministic Rounding Method

A second approach to dealing with a non-integer solution to the LP-Relaxation is to base our solution on the most occurring element. If we let f be this frequency (i.e.the number of sets that the most occurring element occurs in), then we can define a solution by selecting set i if the LP=Relaxation solution gives the variable xi a value of at least (1/f).

None of these three approaches is guaranteed to give an optimal solution to an instance of this problem. I will not go into it in this post, but these can all be shown to be within some guaranteed range of the optimal solution, thus making them approximation algorithms.

You can see how the three algorithms compare on random problem instances here.

Hope you enjoy.

Triangle Trigonometry

Triangle Script Image

I haven’t forgotten about my pledge to focus more content here towards some of the areas I’ve been asked to tutor on recently. This latest one is designed to help users understand the properties of triangles. It is based on two laws that we learn in trigonometry: the law of sines and the law of cosines. Assume that we have a triangle with sides of lengths a, b, and c and respective angles A, B and C (where the angle A does not touch the side a, the angle B does not touch the side b, and the angle C does not touch the side c). These laws are as stated as follows:

Law of Sines


Law of Cosines
c2 = a2 + b2 – 2*a*b*cos(C)

We can use these laws to determine the sides of a triangle given almost any combination of sides and angles of that triangle (the only one we cannot determine properties from is if we are given all three angles, as this leads to many solutions).

The script generates random triangles, with different combinations of sides and angles revealed and the user’s job is to try to determine the missing sides. There is a button to reveal the solution, or if you’d like to see how we arrive at these values, you can check the “Show work” box.

Hope you enjoy.

Other Blogs covering this topic:
Algebra 2 Trig

Sudoku Program Updates

Picture of Sudoku Page

Here in DC, we recently had an unexpected snow day. By the word unexpected, I don’t mean that the snow wasn’t forecast – it was definitely forecast. It just never came. However due to the forecast I decided to avoid traffic just in case the predictions were correct. So while staying at home, I began thinking about some things that I’ve been wanting to update on the site and one thing that came up was an update to my Sudoku program. Previously, it contained about 10000 sample puzzles of varying difficulty. However, I told myself that I would return to the idea of generating my own Sudoku puzzles. I decided to tackle that task last week.

The question was how would I do this. The Sudoku solver itself works through the dancing links algorithm which uses backtracking, so this was the approach that figured as most likely to get me a profitable result in generating new puzzles (I have also seen alternative approaches discussed where people start with an initial Sudoku and swap rows and columns to generate a new puzzle). The next question was how to actually implement this method.

Here is an overview of the algorithm. I went from cell to cell (left to right, and top to bottom starting in the top left corner) attempting to place a random value in that cell. If that value can be a part of a valid Sudoku (meaning that there exists a solution with the current cells filled in as is), then we continue and fill in the next cell. Otherwise, we will try to place a different value in the current cell. This process is continued until all cells are filled in.

The next step was to create a puzzle out of a filled in Sudoku. The tricky about this step is that if too many cells are removed then we wind up generating a puzzle that has multiple solutions. If too few cells are removed though, then the puzzle will be too easy to solve. Initially, I went repeatedly removed cells from the locations that were considered the most beneficial. This generally results in a puzzle with about 35-40 values remaining. To remove additional cells, I considered each of the remaining values and questioned whether hiding the cell would result in the puzzle having multiple solutions. If this was the case, then the cell value was not removed. Otherwise it was. As a result I now have a program that generates Sudoku puzzles that generally have around 25 hints.

You should give it a try.

The PageRank Algorithm

I think one of the best recent examples of the importance of mathematics is the rise of the search engine Google. I remember the world of search engines before Google and it was dominated by names like AltaVista, Yahoo, WebCrawler, Excite, and the likes. The standard way these search engines ranked the order that pages would be listed on a search query was basically to count the number of times that query appeared on pages in their database. The pages with the most listings were considered the most important, the second most listings were second most important and so on and so forth.

This sounds like a feasible way of doing things but let me show you an example of how this can be tricked. Suppose I wrote my first web page and it looked like the following:

That’s a basic web page that may not garner much attention, and it wouldn’t rank highly in most search engines as no work appears more than once. Suppose that, this being a math web page, I wanted it to rank higher on the query “math”. Then I could just edit the source code of the page to be as follows:

This second page says not much more than the first, but the fact that the word math appears 9 additional times would increase the ranking of this page among math pages. This is a very simple example, but it shows how these search engine rankings did not have a useful metric for determining the important sites on the web.

Enter Google.

The way Google solved this problem of determining the importance of a web page is basically by counting the number of links into a web page – the theory being that the more important a web page is, the more people will be talking about it and thus linking to it. Also, the more important the people talking about (linking to) a web site, the more important that site is. This can be expressed mathematically by the following formula:

In the above formula, the variable d is called the damping factor, which helps to capture some of the random nature of the internet by saying that every site should have at least some minimal worth because of the idea that a random surfer could still get to these sites.

I have written a script to implement the algorithm here.

Other Blogs that have covered this topic
Blue Onion

The Risk of Competition

The Risk of Competition

I’m not a competitive person. Let me correct that. I try not to be a competitive person. I’ve recently been playing some of my favorite games from childhood like Monopoly, Chess, Spades, and Madden and I’ve been reminded of the competitive streak in me that hates losing. This streak has been relaxed in much of my adult life, and I tend to think that’s been to my benefit. Two pieces come to mind as I write this. One is a piece written by Slim Jackson of Single Black Male on the importance of asking questions. The second is what I heard on WAMU’s “Tell Me More” while driving home from work regarding the competitive nature generally assumed by (or dictated to) men.

These two concepts go hand in hand in my opinion because I’ve found that at times where I see a person as “my competition” I’m less likely to seek advice or help from that person. Doing this limits my resources and the set of people who can help me. I know that one of the hardest things to do while playing a game like Madden is asking my competition “how’d you just make that play” or listen to this competition explain how he knew to make such a play. However, its just this ability to swallow my pride and ask these questions that I’ve gotten better in Madden. I will admit that I’m not the most humble person on the face of the earth and so there are some things that I haven’t asked how to do. Generally in those things, I’ve found myself repeatedly playing the game trying to figure things out for myself.

There are applications of both sides of this to the real world. Some who love competition will say that it brings out the best in us, and they’d be sure to point out the feeling of satisfaction we get by working independently on a problem. As I look through my lists of inspirational quotes, I’m reminded of this with often repeated statements like “failure is the key to success”.

As rewarding as a competition can be, there’s also an important saying that we don’t need to re-invent the wheel. And what often gets lost in the do-it-yourself nature of competition is the ability to utilize all possible resources available to us. In particular, the skills of how to acknowledge the things we do not understand and how to formulate questions aimed at gaining understanding.

This brings us to is the other side of the competitive spirit. As rewarding as it is to be able to say, “wow, I can’t believe I was able to figure that out on my own”, it is also a stressful situation and there are many who are never able to say those words. Should these people be satisfied with failure?

I do not ask this in some devil’s advocate type of way. I ask coming from the point of view of a mathematician, an educator, and as a former student. I had a pretty dark moment in graduate school where I realized that my mere “love” of mathematics would not get me through qualifying exams. It wouldn’t suddenly make text books and academic papers instantly understandable. Suddenly I was placed in an uncomfortable position. Instead of always being the one who was the first to get the concept and who was leading the study groups on it, I’d be the one asking the questions. Looking at this in a “competitive” frame of mind (as I did then), I felt like I was losing the game.

This same moment though, is where my thinking was really changed. There was one thing, and one thing only that I would consider a failure and that was not finishing. I view everything else as a matter of swallowing my pride and readjusting my thought process to help get to that point.

Unfortunately though, many others do not get to this point. Many get lost in the scramble of the competition and do the equivalent of folding your hand in poker. They realize that at their current pace there is little to no chance that they’d win and so they just leave the game. And this is a real risk that we’re running with competition, particularly as STEM fields are becoming more and more important and we’re trying to encourage students to focus on these areas. What may be necessary to bring this about is a more cooperative approach to these things.


Arithmetic Sequences

Arithmetic Sequences

I’ve added a script which helps to understand arithmetic sequences.

At a previous job of mine, there was a policy of holding a dinner party for the company each time we hired a new employee. At these dinners, each employee was treated to a $20 dinner at the expense of the company. There was also a manager responsible for keeping track of the costs of these dinners.

In computing the costs, the manager noticed that each time there is a new dinner, it was $20 more expensive than the last one. So if we let a1 represent the cost of the first dinner, and let ai represent the cost of the ith dinner, then we see that ai = ai-1 + 20. Sequences like this, where t arise quite often in practice and are called arithmetic sequences. An arithmetic sequence is a list of numbers where the difference between any two consecutive numbers is constant.

For the example above, the term an will represent the cost of dinner after the nth employee has joined the company (assuming that no employees have left the company over this time period). Also the term Sn will represent the total cost the company has paid towards these dinners.

Before we continue with this example, consider the following table which lists the first five terms of an arithmetic sequence as well as the common difference and the first five sums of this sequence.

term number term value diff sum number sum value
a1 4 3 S1 4
a2 7 3 S2 11
a3 10 3 S3 21
a4 13 3 S4 34
a5 16 3 S5 50

One of the beauties of arithmetic sequences is that if we know the first term (a1) and the common difference (d), then we can easily calculate the terms an and Sn for any n with the following formulas:

an = a1 + d*(n – 1), where d is the common difference.
Sn = n*(a1 + an)/2

We can use these formulas to derive more information about the sequence. For example, if my manager wanted to estimate the cost of dinners once we had added 30 new employees, this would be term a30 of the sequence, which we can evaluate with the above formula by a30 = a1 + d*(n – 1) = 0 + 20*(30 – 1) = 0 + 20 * 29 = 580.

The script is available at http://www.learninglover.com/examples.php?id=33.

Other Blogs that have covered this topic:
Study Math Online

Learn to Solve Single Variable Linear Equations

In keeping with my new years resolution of making this site more accessible to my nieces and nephews, I’ve added a script that generates random single variable linear equations that ask the user to solve for x. The script also has a “Compute” button that will give the answer, as well as an option to to show the step by step procedure that is used to reach this solution.

In many instances, solving single variable linear equations are a person’s first introduction into variables, or solving for unknown values. This concept, though, remains very important in our daily lives, with questions such as “how many paychecks do I need to save before I can afford to buy a new car?” or “how long will it take me to get home?”

Both of these questions can be represented by a linear equation. For example if each of my paychecks is for $500, and the car I wish to purchase has a listed price of $4500, then the question of “how many paychecks do I need to save before I can afford to buy a new car” can be represented by the following equation, which we would like to solve for x:

500 x = 4500

As a second example, suppose that out of each $500 paycheck $100 must go towards paying my bills. Suppose also that I have already begun saving and have an initial amount of $1300 saved already. Then the question of “how many paychecks do I need to save before I can afford to buy a new car can be represented by the following equation, which we’d also like to solve for x:

500 x + 1300 = 4500 - 100 x

The goal when solving an equation of this form is to isolate the variable, which in the case of these examples, means to get the x on one side of the equals sign ( = ) by itself and some number on the other side of the equals sign. We can do this by remembering a simple rule, Whatever you do to change one side of the equation, you must make the same changes on the other side of the equation. We do this because initially if we have two things being equal, the only way that they can stay equal is if we do the same things to both these things. We decide what to do by looking at what has already been done and performing the inverse operation of that action.

If there was initially addition of some number, we will subtract that same number.
If there was initially subtraction of some number, we will add that same number.
If there was initially multiplication by some number, we will divide by that same number.
If there was initially division by some number, we will multiply by that same number.

While there is no rule as to what the first step should be, it is generally easiest to try to keep reducing the number of terms in the equation, while trying to place all the terms with an x on one side of the equation and all the terms without an x on the other side of the equation. For instance, in the example earlier that states

500 x + 1300 = 4500 - 100 x

We notice that on the left hand side (LHS), x has been multiplied by 500 and the term 1300 has been added. So as a first step, we could either divide both sides by 500 or subtract 1300 from both sides. Likewise, on the right hand side (RHS), x has been multiplied by (-100) and the number 4500 has been added to it. So we could also divide both sides by by (-100) or subtract 4500 from both sides of the equation. We could also add 100 x or 500 x to both sides of the equation.

Initially there are four terms in this equation and division (by 500 or -100) will keep the number of terms in the equation at 4, whereas subtraction of 1300 or 4500 or 500 x or (-100 x) will reduce the number of terms to three. So we choose to do one of these actions. Lets go with subtraction of 1300 from both sides of the equation.

It can now be represented by

500 x + 1300 - 1300 = 4500 - 100 x - 1300

This simplifies to

500 x = 3200 - 100 x

Again we are faced with many options regarding what to do next. However, to get all the terms with an x on one side of the equation, we can add 100 x to both sides of this equation, which then becomes

500 x + 100 x = 3200 - 100 x + 100 x

This simplifies to
600 x = 3200

Now we notice that x is being multiplied by 600. So to solve for x, we only need to divide both sides by 600 which gives us that

x = 3200 / 600

Relating this to fraction arithmetic and Euclid’s Algorithm, we can reduce this fraction by noticing that 3200 = 200 * 16 and 600 = 200 * 3, so 3200 / 600 can be reduced to 16 / 3.

So x = 16 / 3

Since we’re looking for the number of paychecks I would need to save, this needs to be a whole number. 16 / 3 is greater than 5, so 5 paychecks will not be enough. So in order to make this purchase, I would need 6 paychecks.

The script also gives the option to generate a new problem to gain more practice with these types of problems.

Fraction Arithmetic

Fraction Arithmetic

I hope everyone had a good holiday season. I certainly enjoyed mine. Over this season, I had a chance to speak with some youth and their parents. Funny that whenever we discuss that I have a PhD in applied mathematics, the topics of the children struggling in mathematics and the possibility of tutoring their children always seem to come up. I have no problem with tutoring and I actively participate in such sessions in my spare time. However I will say that it is sometimes a difficult task to do this job over such a short time period. Needless to say, I felt bad that I couldn’t have been of more assistance.

So, this being the holiday season and all, I decided to make somewhat of a new years resolution to focus this site more towards some of the things that the youth struggle with to hopefully be able to answer some of their questions.

With that being said, the first area that I decided to look at was fractions. This is one of the first areas where the youth begin to dislike mathematics. I feel like regardless of how much teachers and professors speak of the importance of understanding these processes, many students simply never grasp the procedures involved, partially because they never get used to the rules associated with these matters.

In this first script on fractions, I’ve focused on four types of problems corresponding to the four basic operations of arithmetic: Addition, Subtraction, Multiplication and Division.

To add two fractions of the form


We use the formula

num1*den2 + num2*den1

Lets take a moment to consider where this formula comes from. In order to be able to add fractions we first need to obtain a common denominator for the two fractions. One way that always works to obtain a common denominator is to multiply the denominators of the two fractions. So in the formula above, the denominator on the right hand side of the equals sign is the product of the two denominators on the left hand side. Once we have a common denominator, we need to rewrite each of the two fractions in terms of this common denominator.


The formula for subtracting fractions is similar, with the notable difference of a subtraction in the place of addition.

num1*den2 – num2*den1

To multiply two fractions (also known as taking the product of two fractions, the resulting numerator is the product of the two initial numerators, and likewise the resulting denominator is the product of the two initial denominators.


Finally, remembering that division is the inverse of multiplication, we can derive the formula to divide two fractions by multiplying by the inverse of the fractions:


The next step in each of these operations is to reduce the fraction to lowest terms. One way of doing this is by considering Euclid’s GCD algorithm which is available here.

The script is available to practice your work on fractions at


Learn Duality in Linear Programming

I have just completed a script to help learn about duality in linear programming.

Many real world problems can be formulated as Linear Programming problems. There are often many different ways to formulate a single problem. Some of these alternative formulations can easily be proven to be equal via simple algebra and arithmetic. For example, one person may see a problem as maximizing profit while another may see the same problem as minimizing losses. The relationship between the two alternative formulations can then be shown to be that one simply has the negative objective function value of the other.

Sometimes, though, this relationship between alternative formulations is not as easy to detect. Two alternative formulations that arise regularly in linear programming problems are a primal problem and a dual problem. The dual of a linear programming problem is the problem of finding the best bound on the objective function in terms of the constraints. This dual is formulated so that every feasible solution to the dual is a bound on the primal objective function. The Weak Duality Theorem for Linear Programming says that the optimal solution for a minimization problem is always an upper bound for its dual (which is a corresponding maximization problem). Correspondingly, the optimal solution for a maximization problem is always a lower bound for its dual. This leads to the Strong Duality Theorem for Linear Programming which says that if we can find feasible solutions to the primal and dual problems with matching objective function values, then both of these solutions must be optimal for their respective problems.

This shows an importance of duality. What my script provides is a means of formulating the dual of a given problem to help understand the concept.

The rules for constructing a dual linear program are as follows:

Primal: Dual:
The ith constraint is [<=] Variable yi is [>=] 0
The ith constraint is [>=] Variable yi is is [<=] 0
The ith constraint is = Variable yi is unbounded
Variable xj is [>=] 0 The jth constraint is [>=]
Variable xj is [<=] 0 The jth constraint is [<=]
Variable xj is unbounded The jth constraint is =

Other Blogs that have covered this topic:
A Narrow Margin
Optimization and data mining