Suppose you are at a table at a casino and notice that things don’t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of ‘loaded’ dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent (hidden) variables, and would like to determine which of these, if any is more likely to be using.

First lets go over a few things.

We will call each roll of the dice an observation. The observations will be stored in variables *o _{1}, o_{2}, …, o_{T}*, where

*T*is the number of total observations.

To generate a hidden Markov Model (HMM) we need to determine 5 parameters:

- The N states of the model, defined by S = {S
_{1}, …, S_{N}} - The M possible output symbols, defined by = {
_{1},_{2}, …,_{M}} - The State transition probability distribution A = {a
_{ij}}, where a_{ij}is the probability that the state at time t+1 is S_{j}, given that the state at time t is S_{i}. - The Observation symbol probability distribution B = {b
_{j}(_{k})} where b_{j}(_{k}) is the probability that the symbol_{k}is emitted in state S_{j}. - The initial state distribution = {
_{i}}, where_{i}is the probability that the model is in state S_{i}at time t = 0.The HMMs we’ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions.

- How often does the casino change dice?
- 0) Dealer Repeatedly Uses Same Dice
- 1) Dealer Uniformly Changes Die
- 2) Dealer Rarely Uses Same Dice

- Which sides on the loaded dice are more likely?
- 0) Larger Numbers Are More Likely
- 1) Numbers Are Randomly Likely
- 2) Smaller Numbers Are More Likely

How often does the casino change dice? Which sides on

the loaded dice

are more likely?(0, 0) (0, 1) (0, 2) (1, 0) (1, 1) (1, 2) (2, 0) (2, 1) (2, 2) One of the interesting problems associated with Hidden Markov Models is called the Learning Problem, which asks the question “How can I improve a HMM so that it would be more likely to have generated the sequence O = o

_{1}, o_{2}, …, o_{T}?The Baum-Welch algorithm answers this question using an Expectation-Maximization approach. It creates two auxiliary variables

_{t}(i) and_{t}(i, j). The variable_{t}(i) represents the probability of being in state i at time t, given the entire observation sequence. Likewise_{t}(i, j) represents the joint probability of being in state i at time t and of being in state j at time t+1, given the entire observation sequence. They can be calculated by_{t}(i) =( _{t}(i) *_{t}(i) )_{j = 1 to N}(_{t}(j) *_{t}(j))and

_{t}(i, j) =( _{t}(i) * a_{i, j}*_{t+1}(j) * b_{j}(o_{t+1}) )_{i’ = 1 to N}_{j’ = 1 to N}(_{t}(i’) * a_{i’, j’}*_{t+1}(j’) * b_{j’}(o_{t+1}) )As you can see, these are a direct result of calculations of from the Forward algorithm and from the Backwards algorithm. Once we have calculated these variables, we can update the parameters of the model as follows:

_{i}=_{1}(i)_{i,j}=_{t = 1 to T-1}(_{t}(i))_{t = t to T-1}(_{t}(i, j))// [b bar]_{j, k} = Sigma_{t = 1 to T, o_t = o_k} gamma_{t, j} / Sigma_{t = 1 to T} gamma_{t, j}, 1 <= j <= N, 1 <= k <= M

_{j}(o_{k}) =_{t = 1 to T-1, ot = ok}_{t}(j)_{t = 1 to T-1}_{t}(j)We can iterate this procedure a finite number of times or until it converges. This will generate a new model, = {N, , , , }.

There is more on this example at LEARNINGlover.com: Hidden Marokv Models: The Baum-Welch Algorithm.

Some further reading on Hidden Markov Models:

- How often does the casino change dice?

pretty good! The formulae were first written as part of a tutorial on HMMs by Rabiner in 1989.

and you should really use MathJax because obviously! :)

Also, you should have mentioned that the BW algorithm will not give you a global maximum (max probability for observing a particular sequence), it gives only a local maximum. But one can alleviate this problem by having many initial randomly distributed initial probabilities. The things get a little bit more interesting when we move toward the continuous state space hidden markovian processes. ;)

Anyhow, I love your site and the BW algorithm is simply awesome! Have you ever used it in real life applications? if yes, u should also help me get a job in US ^_^

Thanks. I’ll definitely look into MathJax as well as updating the posts with the information you provided.