Also, you should have mentioned that the BW algorithm will not give you a global maximum (max probability for observing a particular sequence), it gives only a local maximum. But one can alleviate this problem by having many initial randomly distributed initial probabilities. The things get a little bit more interesting when we move toward the continuous state space hidden markovian processes. ;)

