This is not very common, but below is a numerical example for a Layered Hidden Markov Model (HMM) with two layers (apologies for the formatting errors with the matrix code):
We have two layers of hidden states, with the following properties:
Layer 1 has 3 hidden states: S1, S2, and S3.
Layer 2 has 2 hidden states: A and B.
We also have a set of observable states, which we can represent by the letters O1, O2, and O3.
The transition probabilities between the hidden states are as follows:
Layer 1 transition matrix:
S1 S2 S3
S1 0.4 0.3 0.3
S2 0.2 0.5 0.3
S3 0.1 0.2 0.7
We can check this matrix by adding the horizontal rows:
Row 1: (0.4)+(0.3)+(0.3) = 1.0
Row 2: (0.2)+(0.5)+(0.3) = 1.0
Row 3: (0.1)+(0.2)+(0.7) = 1.0
Layer 2 transition matrix:
A B
A 0.7 0.3
B 0.4 0.6
Check matrix:
Row 1: (0.7)+(0.3) = 1.0
Row 2: (0.4)+(0.6) = 1.0
The emission probabilities, which determine the likelihood of observing a particular observable state given a particular hidden state, are below:
Layer 1 emission probabilities:
O1 O2 O3
S1 0.2 0.5 0.3
S2 0.4 0.1 0.5
S3 0.3 0.3 0.4
Check:
Row 1: (0.2)+(0.5)+(0.3)=1.0
Row 2: (0.4)+(0.1)+(0.5)=1.0
Row 3: (0.3)+(0.3)+(0.4)=1.0
Layer 2 emission probabilities:
O1 O2 O3
A 0.4 0.4 0.2
B 0.1 0.5 0.4
Check:
Row 1: (0.4)+(0.4)+(0.2)=1.0
Row 2: (0.1)+(0.5)+(0.4)=1.0
Let's say we observe a sequence of observable states O1, O2, O3. We want to use the layered HMM to infer the most likely sequence of hidden states that generated this observation.
Of course, this may or may not be the actual sequence of hidden states.
Using the Viterbi algorithm, we can compute the most likely sequence of hidden states:
Layer 1 hidden state sequence: S2, S2, S3
Layer 2 hidden state sequence: A, A, B
This means that the observation O1, O2, O3 was most likely generated by the sequence of hidden states S2, S2, S3 in Layer 1, and the sequence of hidden states A, A, B in Layer 2.
Suvarna Narendra Yes. Because each layer is a Hidden Markov Model (HMM), the output of layer 1 will become the input for layer 2. Now, we may not know the output of layer 1 in advance, but generally we can assume that it will be a sequence of hidden states representing an underlying process (which could be speech, text, or something else). Now that we have the sequence of hidden states, we can use this as an input in the second layer, which will then generate a new sequence of hidden states (representing a higher-level process).
Modeling this relationship between layers can be done in different ways. One common method is through the use of transition probabilities between hidden states in layer 1 and layer 2. These probabilities determine how the hidden state sequence in the second layer is influenced by the hidden state sequence in the first layer.
This brings me to a key question: Why use LHMM over a single-layer HMM? One main reason is that is captures complex dependencies between the underlying processes being modeled (text, speech, etc.). Taking speech recognition as an example, layer one could model the phonemes in a word, in which case layer 2 could model the word itself. This would improve the accuracy of the speech recognition system as the model would be utilizing knowledge of the phonemes as well.