Kindly mention the step wise approach for the applicability of HMM for the classification of a disease. Lets suppose the dataset is having 200 patients and 150 healthy people.
Ok, if I understand your data, it is something like
attr1, attr2, ... attrN, disease
... , .... , ... , ...
Unless you are more interested in HMMs than the classification itself, I do not recommend the use of HMM in this case.
First, to achieve a minimum precision, your model will have a huge state space; second, your precision will not be as good as some data mining algorithms, e.g., Random Forests.
I suggest you to explore some classification algorithms. You can use Weka, it is user friendly and they have a good documentation.
http://www.cs.waikato.ac.nz/ml/weka/
...
However If you need a stochastic model, try use something more robust, like SANs.
You can directly generate models using this tool: ( http://joaquim.pro.br/Software_acad.htm#SANGE )
The main paper, about the tool, is here ( http://joaquim.pro.br/pdf/SANGE.pdf )
Do not hesitate to ask me any questions in this concern.
First, I recommend to try to classify the patients based on only one attribute in order to determine which attribute are the most relevant. Then, pick up a subset of the attribute that better specify the presence of the disease.
I don't a Markov method will be very useful unless you specify what you use Markov for? What are the cirrelation you'd like to model here? I would rather think about a multimodal classification problem. For this purpose, please refer to Thierry Desnoeux works on the subject.
Hello. I share the view of many of the respondents. I am not sure that HMM is the right framework for classification since your system is not really "dynamical", i.e. you do not really need to model time dependence and therefore don't need a Markov model. On the other hand, the EM framework of the HMM can be used to learn a mixture model of your data that can supply classification probabilities. Or you can try any one of a number of other established classification methods, like artificial neural networks, SVM, decision trees/random forests, etc. Given your quite small data set (200 patients), you might want to try to extract the essential features and minimise the dimensionality of the learning problem. You also need to be careful about the weight assigned to each patient class as it seems you have 3 times as many healthy patients as unhealthy ones.