Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Activity: Hidden Markov models
Hidden Markov Models


To be able to understand Hidden Markov models we need a short introduction of Markov chains. In Markov chains there is in general a finite amount of states (e.g. A, B, C) which are observable with a certain transition probability between each pair of states (A->A 0.4, A->B 0.1, A->C 0.5) stored in a transition matrix P (sometimes A but since A is a state in my example I’ll stick to P). Usually there is a sequence of such states (e.g. AABACBA). When we have the complete transition matrix P (I only showed the first row) we can compute how likely such a sequence is or also generate a sequence.

In a hidden Markov model the sequence of states is “hidden”. We can not directly observe the state sequence AABACBA but each state emits something with a certain probability. We can have more, less or the same amount of different observations as states (e.g. states are A, B, C and possible stats are R, S, T, U). The length of the unobserved and observed sequences are the same though, because there is one observed state for each unobserved state. Most of the time we say that each state is an observation at a certain point in time, but since Markov models are also used in text analysis each state can also be a letter, word, sentence, … in a longer sequence of letters, words, sentences, ...
Hidden Markov models can be used when we can observe part of something but want to make inferences about something we can’t observe. In the above example we could observed RRUSTUR and want to know the sequence of A, B and Cs. From previous data we might know the emission probabilities (A->R, B->R, … how likely is it to observed R when there is A), … From this matrix E (also sometimes called O), the transition matrix P and the initial probabilities for each state A, B, C (often stored in a vector p) we can compute the most likely hidden state sequence.

What does this mean in reality? I’ll give you an example.

1) Let’s say we stay at home all the time and we only observe our cat as she comes in through the cat door. We see whether the cat’s fur is dry or wet but we don’t know whether the weather is sunny or raining. We can only conclude from the cat’s fur. We know that there are other reasons why the cat might be wet so we can not be sure it’s raining just because the cat is wet. From historical weather data we know that a transition matrix for rainy/sunny looks like this:


You might notice that the rows sum up to 1 but the columns don’t. Think about it ?
We know the emission/observation probabilities.


This means that when it is sunny, there is a 10% chance that the cat has wet fur. When it is rainy, there is a 99% chance she is wet (in 1% of the time she was sleeping in her house right outside the cat door before she came in Wink).

If we observe the cat for a few hours (for simplification we say she comes in once per hour) and we observe this pattern: DDDWWDDWDD
There could be several possibilities how the weather changed on this day. Maybe it didn’t change at all and it was just chance that my cat’s fur got dry/wet.

To find out the most likely weather sequence we need the Viterbi algorithm

Here you can see how a HMM can be represented.


So far we assumed that we know the parameters (transition matrix P, emission matrix E, initial state probabilities p) but if we don’t we can learn them using the Baum-Welch algorithm

This was the most basic introduction I could think of. There are many extensions for Hidden Markov models and Markov chain to make them work for non-discrete times (so far the time between each observation is the same) or to work with distributions of observations instead of concrete values.
You can find more here

Typical applications

- speech recognition
- handwriting recognition
- gesture recognition
- bioinformatics
   - 2D structure predictions (proteins)

Python libraries & tutorials (so... official)

R packages & tutorials
- depmixS4

I haven't done much with Hidden Markov models in R yet but there are lots of packages. From what I've read the best one seems to be markovchain. (the second link)

A list of packages in other languages on quora


One of the first publications on HMM and also one of the most referenced: Rabiner et al

Possible tasks

- Estimate parameters from a dataset and create a visualising graph (like in this image but with actual values)
- Train your model on a text (e.g. Wikipedia page) and generate text
- find patterns in text / images /...


If you want to find a dataset for yourself you can look for data that fits any of the above topics mentioned under “Typical applications”. You’ll find lots of datasets online.

language recognition religious and philosophical texts
- PokomenGO spawns (this looks interesting to me, I’ll probably use this Smile)
MNIST handwritten digits (I think that'd be good for beginners and I also could provide help as I have already worked with this dataset)
You can follow my learning club progress and get R tips here.
Here is a publication that was published at my university: Hidden Markov models for spectral similarity of songs (I haven't read it yet)
You can follow my learning club progress and get R tips here.
Today we had our hangout and I talked a little bit about the exercise and I showed a demo.
Here is the code I used for the demo.

This is the package I recommend in R: seqHMMseqHMM Vignette

In the demo I showed secondary structure prediction (of proteins) which is a very typical problem in bioinformatics. Here is some background information.

I used this dataset.
You can follow my learning club progress and get R tips here.
I found a new publications that shows how to analyze clickstreams (on a website) using Markov chains in R
You can follow my learning club progress and get R tips here.
Thanks for "hosting" this new activity, Verena!

Club members: Like with other learning club activities, if you have resources that helped with your learning on this topic, please share them in this thread!
The Becoming a Data Scientist Podcast Data Science Learning Club is now sponsored by Data CampSee this thread for more info and a coupon. (must be logged-in to view)

Forum Jump:

Users browsing this thread: 1 Guest(s)

About Becoming A Data Scientist is a blog created by Renee Teate to track her path from "SQL Data Analyst pursuing an Engineering Master's Degree" to "Data Scientist". She created this club so participants can work together and help one another learn data science. See her other site DataSciGuide for more learning resources.

Sponsored by DataCamp!