To be able to understand Hidden Markov models we need a short introduction of Markov chains. In Markov chains there is in general a finite amount of states (e.g. A, B, C) which are observable with a certain transition probability between each pair of states (A->A 0.4, A->B 0.1, A->C 0.5) stored in a transition matrix P (sometimes A but since A is a state in my example I’ll stick to P). Usually there is a sequence of such states (e.g. AABACBA). When we have the complete transition matrix P (I only showed the first row) we can compute how likely such a sequence is or also generate a sequence.
In a hidden Markov model the sequence of states is “hidden”. We can not directly observe the state sequence AABACBA but each state emits something with a certain probability. We can have more, less or the same amount of different observations as states (e.g. states are A, B, C and possible stats are R, S, T, U). The length of the unobserved and observed sequences are the same though, because there is one observed state for each unobserved state. Most of the time we say that each state is an observation at a certain point in time, but since Markov models are also used in text analysis each state can also be a letter, word, sentence, … in a longer sequence of letters, words, sentences, ...
Hidden Markov models can be used when we can observe part of something but want to make inferences about something we can’t observe. In the above example we could observed RRUSTUR and want to know the sequence of A, B and Cs. From previous data we might know the emission probabilities (A->R, B->R, … how likely is it to observed R when there is A), … From this matrix E (also sometimes called O), the transition matrix P and the initial probabilities for each state A, B, C (often stored in a vector p) we can compute the most likely hidden state sequence.
What does this mean in reality? I’ll give you an example.
1) Let’s say we stay at home all the time and we only observe our cat as she comes in through the cat door. We see whether the cat’s fur is dry or wet but we don’t know whether the weather is sunny or raining. We can only conclude from the cat’s fur. We know that there are other reasons why the cat might be wet so we can not be sure it’s raining just because the cat is wet. From historical weather data we know that a transition matrix for rainy/sunny looks like this:
You might notice that the rows sum up to 1 but the columns don’t. Think about it ?
We know the emission/observation probabilities.
This means that when it is sunny, there is a 10% chance that the cat has wet fur. When it is rainy, there is a 99% chance she is wet (in 1% of the time she was sleeping in her house right outside the cat door before she came in ).
If we observe the cat for a few hours (for simplification we say she comes in once per hour) and we observe this pattern: DDDWWDDWDD
There could be several possibilities how the weather changed on this day. Maybe it didn’t change at all and it was just chance that my cat’s fur got dry/wet.
To find out the most likely weather sequence we need the Viterbi algorithm
Here you can see how a HMM can be represented.
So far we assumed that we know the parameters (transition matrix P, emission matrix E, initial state probabilities p) but if we don’t we can learn them using the Baum-Welch algorithm
This was the most basic introduction I could think of. There are many extensions for Hidden Markov models and Markov chain to make them work for non-discrete times (so far the time between each observation is the same) or to work with distributions of observations instead of concrete values.
You can find more here
- speech recognition
- handwriting recognition
- gesture recognition
- 2D structure predictions (proteins)
Python libraries & tutorials
- https://github.com/hmmlearn/hmmlearn (so... official)
R packages & tutorials
I haven't done much with Hidden Markov models in R yet but there are lots of packages. From what I've read the best one seems to be markovchain. (the second link)
A list of packages in other languages on quora
One of the first publications on HMM and also one of the most referenced: Rabiner et al
- Estimate parameters from a dataset and create a visualising graph (like in this image but with actual values)
- Train your model on a text (e.g. Wikipedia page) and generate text
- find patterns in text / images /...
If you want to find a dataset for yourself you can look for data that fits any of the above topics mentioned under “Typical applications”. You’ll find lots of datasets online.
- language recognition
- kaggle.com religious and philosophical texts
- kaggle.com PokomenGO spawns (this looks interesting to me, I’ll probably use this )
- MNIST handwritten digits (I think that'd be good for beginners and I also could provide help as I have already worked with this dataset)