It seems interesting to compare and contrast lip reading with spacial-and-temporal pattern recognition and classification. So the core question to ask here is that, to which extent, these two are considered similar or in common in terms of data pre-processing, feature extraction, and modelling.