{"id":635,"date":"2013-08-20T22:53:55","date_gmt":"2013-08-21T02:53:55","guid":{"rendered":"http:\/\/learninglover.com\/blog\/?p=635"},"modified":"2014-06-16T09:10:08","modified_gmt":"2014-06-16T13:10:08","slug":"hidden-markov-models-the-baum-welch-algorithm","status":"publish","type":"post","link":"https:\/\/learninglover.com\/blog\/index.php\/2013\/08\/20\/hidden-markov-models-the-baum-welch-algorithm\/","title":{"rendered":"Hidden Markov Models: The Baum-Welch Algorithm"},"content":{"rendered":"<p>Suppose you are at a table at a casino and notice that things don&#8217;t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of &#8216;loaded&#8217; dice that the casino is using and how these dice are loaded. To accomplish this you set up a number of Hidden Markov Models, where the loaded die are the latent (hidden) variables, and would like to determine which of these, if any is more likely to be using.<\/p>\n<p>First lets go over a few things. <\/p>\n<p>We will call each roll of the dice an observation. The observations will be stored in variables <i>o<sub>1<\/sub>, o<sub>2<\/sub>, &#8230;, o<sub>T<\/sub><\/i>, where <i>T<\/i> is the number of total observations. <\/p>\n<p>To generate a hidden Markov Model (HMM) <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/lambda.gif\"> we need to determine 5 parameters:<\/p>\n<ul>\n<li> The N states of the model, defined by S = {S<sub>1<\/sub>, &#8230;, S<sub>N<\/sub>}\n<li> The M possible output symbols, defined by <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"> = {<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>1<\/sub>, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>2<\/sub>, &#8230;, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>M<\/sub>}\n<li> The State transition probability distribution A = {a<sub>ij<\/sub>}, where a<sub>ij<\/sub> is the probability that the state at time t+1 is S<sub>j<\/sub>, given that the state at time t is S<sub>i<\/sub>.\n<li> The Observation symbol probability distribution B = {b<sub>j<\/sub>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>k<\/sub>)} where b<sub>j<\/sub>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>k<\/sub>) is the probability that the symbol <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\"><sub>k<\/sub> is emitted in state S<sub>j<\/sub>.\n<li> The initial state distribution <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/pi.gif\"> = {<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/pi.gif\"><sub>i<\/sub>}, where <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/pi.gif\"><sub>i<\/sub> is the probability that the model is in state S<sub>i<\/sub> at time t = 0.\n<\/ol>\n<p>The HMMs we&#8217;ve generated are based on two questions. For each question, you have provided 3 different answers which leads to 9 possible HMMs. Each of these models has its corresponding state transition and emission distributions. <\/p>\n<ul>\n<li>How often does the casino change dice?\n<ul>\n<li>0) Dealer Repeatedly Uses Same Dice\n<li>1) Dealer Uniformly Changes Die\n<li>2) Dealer Rarely Uses Same Dice\n<\/ul>\n<li>Which sides on the loaded dice are more likely?\n<ul>\n<li>0) Larger Numbers Are More Likely\n<li>1) Numbers Are Randomly Likely\n<li>2) Smaller Numbers Are More Likely\n<\/ul>\n<\/ul>\n<table>\n<tr>\n<td><\/td>\n<td>How often does the casino change dice?<\/td>\n<\/tr>\n<tr>\n<td>Which sides on <br \/>the loaded dice <br \/>are more likely?<\/td>\n<td>\n<table>\n<tr>\n<td>(0, 0)<\/td>\n<td>(0, 1)<\/td>\n<td>(0, 2)<\/td>\n<\/tr>\n<tr>\n<td>(1, 0)<\/td>\n<td>(1, 1)<\/td>\n<td>(1, 2)<\/td>\n<\/tr>\n<tr>\n<td>(2, 0)<\/td>\n<td>(2, 1)<\/td>\n<td>(2, 2)<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<p>One of the interesting problems associated with Hidden Markov Models is called the Learning Problem, which asks the question &#8220;How can I improve a HMM <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/lambda.gif\"> so that it would be more likely to have generated the sequence O = o<sub>1<\/sub>, o<sub>2<\/sub>, &#8230;, o<sub>T<\/sub>? <\/p>\n<p>The Baum-Welch algorithm answers this question using an Expectation-Maximization approach. It creates two auxiliary variables <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(i) and <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/xi.gif\"><sub>t<\/sub>(i, j). The variable <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(i) represents the probability of being in state i at time t, given the entire observation sequence. Likewise <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/xi.gif\"><sub>t<\/sub>(i, j) represents the joint probability of being in state i at time t and of being in state j at time t+1, given the entire observation sequence. They can be calculated by<\/p>\n<table style=\"width:auto;line-height:normal;\">\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(i) = <\/td>\n<td>\n<table>\n<tr>\n<td>\n(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/alpha.gif\"><sub>t<\/sub>(i) * <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/beta.gif\"><sub>t<\/sub>(i) ) <\/td>\n<\/tr>\n<tr>\n<td> <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>j = 1 to N<\/sub>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/alpha.gif\"><sub>t<\/sub>(j) * <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/beta.gif\"><sub>t<\/sub>(j))<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<p>and<\/p>\n<table style=\"width:auto;line-height:normal;\">\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/xi.gif\"><sub>t<\/sub>(i, j) = <\/td>\n<td>\n<table>\n<tr>\n<td>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/alpha.gif\"><sub>t<\/sub>(i) * a<sub>i, j<\/sub> * <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/beta.gif\"><sub>t+1<\/sub>(j) * b<sub>j<\/sub>(o<sub>t+1<\/sub>) )<\/td>\n<\/tr>\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>i&#8217; = 1 to N<\/sub><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>j&#8217; = 1 to N<\/sub>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/alpha.gif\"><sub>t<\/sub>(i&#8217;) * a<sub>i&#8217;, j&#8217;<\/sub> * <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/beta.gif\"><sub>t+1<\/sub>(j&#8217;) * b<sub>j&#8217;<\/sub>(o<sub>t+1<\/sub>) )<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<p>As you can see, these are a direct result of calculations of <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/alpha.gif\"> from <a href=\"http:\/\/learninglover.com\/blog\/?p=562\" title=\"The Forward Algorithm\">the Forward algorithm<\/a> and <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/beta.gif\"> from <a href=\"http:\/\/learninglover.com\/blog\/?p=570\" title=\"The Backwards Algorithm\">the Backwards algorithm<\/a>. Once we have calculated these variables, we can update the parameters of the model as follows:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/pibar.jpg\"><sub>i<\/sub> = <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>1<\/sub>(i)<\/p>\n<table style=\"width:auto;line-height:normal;\">\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/abar.jpg\"><sub>i,j<\/sub> = <\/td>\n<td>\n<table>\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>t = t to T-1<\/sub> (<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/xi.gif\"><sub>t<\/sub>(i, j))<\/tr>\n<\/tr>\n<tr><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>t = 1 to T-1<\/sub>(<img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(i))<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<p>\/\/\t[b bar]_{j, k} = Sigma_{t = 1 to T, o_t = o_k} gamma_{t, j} \/ Sigma_{t = 1 to T} gamma_{t, j}, 1 <= j <= N, 1 <= k <= M\n\n\n<table style=\"width:auto;line-height:normal;\">\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/bbar.jpg\"><sub>j<\/sub>(o<sub>k<\/sub>) = <\/td>\n<td>\n<table>\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>t = 1 to T-1, o<sub>t<\/sub> = o<sub>k<\/sub><\/sub> <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(j)<\/td>\n<\/tr>\n<tr>\n<td><img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/csigma.gif\"><sub>t = 1 to T-1<\/sub> <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/gamma.gif\"><sub>t<\/sub>(j)<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<p>We can iterate this procedure a finite number of times or until it converges. This will generate a new model, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/lambdabar.jpg\"> = {N, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/sigma.gif\">, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/pibar.jpg\">, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/abar.jpg\">, <img decoding=\"async\" src=\"http:\/\/www.learninglover.com\/chars\/bbar.jpg\">}.<\/p>\n<p>There is more on this example at <a href=\"http:\/\/www.learninglover.com\/examples.php?id=48\">LEARNINGlover.com: Hidden Marokv Models: The Baum-Welch Algorithm<\/a>. <\/p>\n<p>Some further reading on Hidden Markov Models: <\/p>\n<ul>\n<li><a href=\"http:\/\/pure.rhul.ac.uk\/portal\/files\/1446635\/Hidden_Markov_Models_Theory_and_Applications.pdf\">Hidden Markov Models, Theory and Applications<\/a>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Suppose you are at a table at a casino and notice that things don&#8217;t look quite right. Either the casino is extremely lucky, or things should have averaged out more than they have. You view this as a pattern recognition problem and would like to understand the number of &#8216;loaded&#8217; dice that the casino is [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-635","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/635","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/comments?post=635"}],"version-history":[{"count":2,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions"}],"predecessor-version":[{"id":881,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/posts\/635\/revisions\/881"}],"wp:attachment":[{"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/media?parent=635"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/categories?post=635"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/learninglover.com\/blog\/index.php\/wp-json\/wp\/v2\/tags?post=635"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}