Hidden Markov models are, by definition, generative models; which means that they will produce predicted state sequences for missing observations within a sequence. The missingness only impacts the way the likelihood is evaluation during forward recursion (for a good summary, see the attached link; the documentation for an R package that handles HMMs). No special training or optimization methods are necessary.
That said, these results will only be valid if the missingness mechanism is truly ignorable (i.e. missing completely at random). Otherwise, results may be biased, especially when the transition probabilities are high. The other link I've attached discusses this in more detail.
I assume by missing observation you mean an observation with unknown value. I think you probably can model the missing observation with another hidden state. You probably need to use an ergodic model with one special state that is connected to every other state. You probably can use use nonparametric HMMs to discover the needed states to model missing observations (it does not need to be ergodic check my publication section)